知识图谱 | graphiti | Forrest’s 博客

type

status

date

slug

summary

知识图谱 (Knowledge Graph) 构建与应用全流程

构建知识图谱就像是为机器编纂一部结构化的“世界百科全书”。它的核心目标是把现实世界中的事物（实体）和它们之间的关系，用计算机能够理解的方式组织起来。

知识表示 (Knowledge Representation)

这是起点，决定了知识图谱的“骨架”和“语法”。我们需要先定义好用什么“语言”来描述知识。

核心模型: 最主流的两种表示模型是 RDF (资源描述框架) 和 属性图 (Property Graph)。

RDF三元组: RDF将所有知识都拆解成一个个的“主语-谓语-宾语”三元组。例如：(周杰伦) - [职业] -> (歌手)。这是语义网的标准，严谨且适合数据链接。

属性图: 属性图由节点和关系构成，节点和关系都可以拥有自己的属性。例如，一个“周杰伦”节点，可以有{生日: "1979-01-18"}的属性；他和“方文山”的“合作”关系上，可以有{起始年份: 2000}的属性。这种模型更灵活，在工业界应用广泛。

Schema层: 在表示知识前，通常会先定义一个模式层（Schema）或本体（Ontology），规定图谱中可以有哪些类型的实体（如“人物”、“公司”）、哪些类型的关系（如“毕业于”、“投资”），以及它们的约束（如“人物”一定有“生日”属性）。

知识存储 (Knowledge Storage)

选择合适的“仓库”来存放表示好的知识。

核心工具: 主要使用图数据库 (Graph Database)。

三元组库 (Triple Store): 专门为存储RDF三元组设计的数据库，如 GraphDB, Virtuoso。查询语言是 SPARQL。

原生图数据库: 专为属性图模型设计的数据库，如 Neo4j, JanusGraph。查询语言通常是 Cypher 或 Gremlin。它们在处理多跳查询（如“我朋友的朋友”）时性能极高。

知识抽取 (Knowledge Extraction)

这是构建知识图谱最核心、最困难的环节之一，目标是从海量的、非结构化的文本（如网页、新闻、文档）中自动“榨取”出结构化的知识。

命名实体识别 (Named Entity Recognition, NER): 从文本中识别出有特定意义的实体。例如，在“周杰伦出生于台湾省新北市”中，识别出“周杰-伦(人物)”、“台湾省新北市(地点)”。

关系抽取 (Relation Extraction, RE): 识别出实体之间的关系。例如，从上面那句话中抽取出关系：(周杰伦) - [出生地] -> (台湾省新北市)。

属性抽取 (Attribute Extraction): 从文本中抽取出实体的某个属性值。例如，从“《青花瓷》由方文山作词”中，为实体“青花瓷”抽取属性：{作词人: "方文山"}。

事件抽取 (Event Extraction): 抽取更复杂的事件结构，包括事件的触发词、参与者和它们扮演的角色。例如，从“2022年佩洛西窜访台湾”中，抽取出“窜访”事件，参与者有“佩洛西(人物)”和“台湾(地点)”。

知识融合 (Knowledge Fusion)

从不同来源抽取的知识可能会有冲突、重复或模糊之处，这一步的目标是进行清理和整合，提升知识图谱的质量。

实体链接 (Entity Linking): 将文本中提到的实体（如“杰伦”）链接到知识库中唯一的、正确的实体上（ID为person_007的“周杰伦”），解决“同名异义”（指代不明）和“异名同义”（别名）问题。

知识合并 (Knowledge Merging): 多个数据源都描述了同一个实体，需要将这些信息合并。例如，来源A说周杰伦的职业是“歌手”，来源B说是“导演”，合并后周杰伦的职业属性应该是["歌手", "导演"]。

冲突检测 (Conflict Detection): 如果来源A说周杰伦生日是1月18日，来源B说是1月19日，就需要一个机制来判断哪个信息更可信，或者都保留并标注来源。

知识推理 (Knowledge Reasoning)

让知识图谱能够“举一反三”，基于已有的知识发现新的、隐含的知识。

逻辑推理: 基于预先定义的规则进行推理。例如，如果定义了一条规则 ?x 是 ?y 的儿子 且 ?y 是 ?z 的儿子，那么可以推理出 ?x 是 ?z 的孙子。

表示学习推理 (Embedding-based Reasoning): 将知识图谱中的实体和关系学习成向量，然后通过计算向量来预测可能存在但缺失的关系。例如，向量("北京") - 向量("中国") + 向量("法国") 的计算结果，会非常接近 向量("巴黎")，从而可以预测“巴黎”是“法国”的首都。

知识问答 (Question Answering)

让用户可以用自然语言向知识图谱提问，并获得精准答案。这是知识图谱最常见的应用出口。

语义解析 (Semantic Parsing): 这是主流方法之一。将用户的自然语言问题（如“周杰伦的妻子是谁？”）翻译成图数据库可以执行的查询语句（如 MATCH (p1:Person {name:"周杰伦"})-[:妻子是]->(p2:Person) RETURN p2.name）。

信息检索与阅读理解: 先通过关键词检索找到相关的子图或文本片段，再利用深度学习模型（类似阅读理解）从这些片段中找到并抽取出最终答案。

知识分析 (Knowledge Analysis)

在构建好的知识图谱上进行数据挖掘和分析，发现宏观的规律和模式。

图计算与社区发现: 分析网络结构，发现紧密连接的实体群体（社区），比如在投资网络中发现“某系资本”。

路径发现: 寻找两个实体之间的所有可能路径，用于金融风控（如发现隐藏的担保链条）或推荐系统（如发现你和某个商品之间的潜在联系）。

中心性分析: 计算网络中最重要的节点，如在社交网络中识别关键意见领袖（KOL）。

graphiti

https://github.com/getzep/graphiti

Core Components & Their Relationships

1. Episodic (Documents/Text Input)

Definition: Raw text documents, articles, conversations, or any textual content

Role: The input source that feeds the entire system

Example: A news article, chat conversation, PDF document

2. Node (Entities)

Definition: Structured entities extracted from episodic content

Types: People, places, concepts, organizations, etc.

Role: The building blocks of knowledge

Example: "Kamala Harris", "California", "Attorney General"

3. Edge (Relationships)

Definition: Connections between different components

Two main types:

MENTIONS: Links episodic content to entities it mentions
RELATES_TO: Links entities to other entities with semantic relationships

4. Community (Clusters)

Definition: Groups of closely related nodes that form semantic clusters

Role: Higher-level organizational structure for knowledge discovery

Example: A community might contain all nodes related to "US Politics"

Visual Relationship Map

Detailed Flow Example

Input: News Article (Episodic)

Step 1: Entity Extraction → Nodes

Step 2: Create MENTIONS Edges

Step 3: Create RELATES_TO Edges

Step 4: Form Communities

The Complete Knowledge Graph Structure

Key Relationships Summary

Component	Primary Function	Connects To	Purpose
Episodic	Input/Source	→ Nodes (via MENTIONS)	Raw knowledge input
Node	Knowledge Units	← Episodes, ↔ Other Nodes	Structured entities
Edge	Relationships	Episodes ↔ Nodes, Nodes ↔ Nodes	Knowledge connections
Community	Clustering	Groups of related Nodes	Knowledge organization

Search & Retrieval Flow

When you search the system:

Document Search: Find episodes mentioning specific entities

Entity Search: Find specific nodes and their properties

Relationship Search: Find edges connecting entities

Community Search: Find clusters of related knowledge

Combined Results: Merge all types for comprehensive answers

Real-World Analogy

Think of it like a smart library system:

Episodic = Books/Documents on shelves

Nodes = Important topics/people mentioned in books

Edges = Cross-references and citations between topics

Communities = Subject categories that group related topics

Search = Intelligent librarian that can find information across all these layers

This multi-layered structure allows Graphiti to provide both document retrieval (like traditional search) and knowledge reasoning (like asking a domain expert) in a unified system.