属性图索引

使用属性图谱索引

属性图谱是带标签（即元数据）节点属性（即实体类别、文本标签等）的知识集合，通过关系连接在一起形成结构化路径。

在 LlamaIndex 中，PropertyGraphIndex 构建了下面两个的属性图谱编排能力：

构建图谱
查询图谱

用法

只需导入属性图谱类，便可找到使用这个类的基本用法：

Python

from llama_index.core import PropertyGraphIndex

# 创建知识图谱索引
index = PropertyGraphIndex.from_documents(


# 使用这个索引
retriever = index.as_retriever(
    include_text=True,  # include source chunk with matching paths
    similarity_top_k=2,  # top k for vector kg node retrieval
)
nodes = retriever.retrieve("Test")

query_engine = index.as_query_engine(
    include_text=True,  # include source chunk with matching paths
    similarity_top_k=2,  # top k for vector kg node retrieval
)
response = query_engine.query("Test")

# save and load
index.storage_context.persist(persist_dir="./storage")

from llama_index.core import StorageContext, load_index_from_storage

index = load_index_from_storage(
    StorageContext.from_defaults(persist_dir="./storage")
)

# loading from existing graph store (and optional vector store)
# load from existing graph/vector store
index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store, vector_store=vector_store, ...
)

构造属性图谱

LlamaIndex 中的属性图谱的构造是通过对每个chunk执行一系列 kg_extractors 操作，并将实体和关系作为元数据附加到每个 llama-index 节点来实现。您可以在此处使用任意数量的属性图，它们都会得到应用。

如果您已经在摄取管道中使用过转换或元数据提取器，那么这将非常熟悉（并且这些 kg_extractors 与摄取管道兼容）！

使用适当的参数 kwarg 设置提取器：

Python

index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[extractor1, extractor2, ...],
)

# insert additional documents / nodes
index.insert(document)
index.insert_nodes(nodes)

若未提供知识图谱提取器，则默认值为 SimpleLLMPathExtractor 和 ImplicitPathExtractor 这两个知识图谱提取器一起使用。

接下来将对 kg_extractors 进行详细说明。

SimpleLLMPathExtractor (默认使用)

使用 LLM 提取简短语句来提示和解析格式为 (entity1, relation, entity2) 的单跳路径

Python

from llama_index.core.indices.property_graph import SimpleLLMPathExtractor

kg_extractor = SimpleLLMPathExtractor(
    llm=llm,
    max_paths_per_chunk=10,
    num_workers=4,
    show_progress=False,
)

如果您愿意，您还可以自定义提示 prompt 和用于解析路径的函数。

这是一个简单的例子：

Python

prompt = (
    "Some text is provided below. Given the text, extract up to "
    "{max_paths_per_chunk} "
    "knowledge triples in the form of `subject,predicate,object` on each line. Avoid stopwords.\n"
)


def parse_fn(response_str: str) -> List[Tuple[str, str, str]]:
    lines = response_str.split("\n")
    triples = [line.split(",") for line in lines]
    return triples


kg_extractor = SimpleLLMPathExtractor(
    llm=llm,
    extract_prompt=prompt,
    parse_fn=parse_fn,
)

ImplicitPathExtractor （默认使用）

使用每个 llama-index 节点对象上的属性提取路径 node.relationships。

该提取器不需要 LLM 或嵌入模型来运行，因为它仅仅解析 llama-index 节点对象上已经存在的属性。

Python

from llama_index.core.indices.property_graph import ImplicitPathExtractor

kg_extractor = ImplicitPathExtractor()

DynamicLLMPathExtractor

将根据可选的允许实体类型和关系类型列表提取路径（包括实体类型！）。如果没有提供，则 LLM 将根据其认为合适的方式分配类型。如果提供了，它将有助于指导 LLM，但不会强制执行这些类型。

Python

from llama_index.core.indices.property_graph import DynamicLLMPathExtractor

kg_extractor = DynamicLLMPathExtractor(
    llm=llm,
    max_triplets_per_chunk=20,
    num_workers=4,
    allowed_entity_types=["POLITICIAN", "POLITICAL_PARTY"],
    allowed_relation_types=["PRESIDENT_OF", "MEMBER_OF"],
)

SchemaLLMPathExtractor

按照允许的实体、关系以及哪些实体可以连接到哪些关系的严格模式提取路径。

使用 pydantic、LLM 的结构化输出和一些巧妙的验证，我们可以动态指定模式并验证每个路径的提取。

Python

from typing import Literal
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

# recommended uppercase, underscore separated
entities = Literal["PERSON", "PLACE", "THING"]
relations = Literal["PART_OF", "HAS", "IS_A"]
schema = {
    "PERSON": ["PART_OF", "HAS", "IS_A"],
    "PLACE": ["PART_OF", "HAS"],
    "THING": ["IS_A"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=schema,
    strict=True,  # if false, will allow triplets outside of the schema
    num_workers=4,
    max_triplets_per_chunk=10,
)

这个提取器是高度可定制的，并且具有可定制的选项 - 模式的各个方面（如上所示） - extract_prompt 与 strict=False，strict=True 允许模式之外的三元组或不允许 -kg_schema_cls 如果您是 pydantic 专业人士并想创建自己的带有自定义验证的 pydantic 类，则可以传递您自己的自定义。

检索和查询

带标签的属性图谱可以通过多种方式查询以检索节点和路径。在 LlamaIndex 中，我们可以同时组合多种节点检索方法！

Python

# create a retriever
retriever = index.as_retriever(sub_retrievers=[retriever1, retriever2, ...])

# create a query engine
query_engine = index.as_query_engine(
    sub_retrievers=[retriever1, retriever2, ...]
)

如果没有提供子检索器，则默认值为 LLMSynonymRetriever 和 VectorContextRetriever（如果启用了嵌入）。

所有检索器目前包括： * LLMSynonymRetriever 根据 LLM 生成的关键字/同义词进行检索 * VectorContextRetriever 根据嵌入的图谱节点进行检索 * TextToCypherRetriever 要求 LLM 根据属性图谱的模式生成密码 * CypherTemplateRetriever 使用由 LLM 推断出的带有参数的密码模板 * CustomPGRetriever 易于子类化和实现自定义检索逻辑

通常，您可以定义一个或多个这样的子检索器并将它们传递给PGRetriever：

Python

from llama_index.core.indices.property_graph import (
    PGRetriever,
    VectorContextRetriever,
    LLMSynonymRetriever,
)

sub_retrievers = [
    VectorContextRetriever(index.property_graph_store, ...),
    LLMSynonymRetriever(index.property_graph_store, ...),
]

retriever = PGRetriever(sub_retrievers=sub_retrievers)

nodes = retriever.retrieve("<query>")

请继续阅读下文来了解有关所有猎犬的更多详细信息。

`LLMSynonymRetriever` (默认使用)

接受 LLMSynonymRetriever 查询，并尝试生成关键字和同义词来检索节点（以及连接到这些节点的路径）。

明确声明检索器允许您自定义多个选项。以下是默认设置：

Python

from llama_index.core.indices.property_graph import LLMSynonymRetriever

prompt = (
    "Given some initial query, generate synonyms or related keywords up to {max_keywords} in total, "
    "considering possible cases of capitalization, pluralization, common expressions, etc.\n"
    "Provide all synonyms/keywords separated by '^' symbols: 'keyword1^keyword2^...'\n"
    "Note, result should be in one-line, separated by '^' symbols."
    "----\n"
    "QUERY: {query_str}\n"
    "----\n"
    "KEYWORDS: "
)


def parse_fn(self, output: str) -> list[str]:
    matches = output.strip().split("^")

    # capitalize to normalize with ingestion
    return [x.strip().capitalize() for x in matches if x.strip()]


synonym_retriever = LLMSynonymRetriever(
    index.property_graph_store,
    llm=llm,
    # include source chunk text with retrieved paths
    include_text=False,
    synonym_prompt=prompt,
    output_parsing_fn=parse_fn,
    max_keywords=10,
    # the depth of relations to follow after node retrieval
    path_depth=1,
)

retriever = index.as_retriever(sub_retrievers=[synonym_retriever])

`VectorContextRetriever` (如果支持的话则默认使用)

根据向量相似性检索 VectorContextRetriever 节点，然后获取连接到这些节点的路径。

如果您的图存储支持向量，那么您只需要管理该图存储即可。否则，除了图存储之外，您还需要提供一个向量存储（默认情况下，使用内存SimpleVectorStore）。

Python

from llama_index.core.indices.property_graph import VectorContextRetriever

vector_retriever = VectorContextRetriever(
    index.property_graph_store,
    # only needed when the graph store doesn't support vector queries
    # vector_store=index.vector_store,
    embed_model=embed_model,
    # include source chunk text with retrieved paths
    include_text=False,
    # the number of nodes to fetch
    similarity_top_k=2,
    # the depth of relations to follow after node retrieval
    path_depth=1,
    # can provide any other kwargs for the VectorStoreQuery class
    ...,
)

retriever = index.as_retriever(sub_retrievers=[vector_retriever])

`TextToCypherRetriever`

使用 TextToCypherRetriever 图形存储模式、您的查询和文本到密码的提示模板来生成和执行密码查询。

注意：由于它 SimplePropertyGraphStore 实际上不是一个图形数据库，因此它不支持密码查询。

您可以使用检查架构 index.property_graph_store.get_schema_str()。

Python

from llama_index.core.indices.property_graph import TextToCypherRetriever

DEFAULT_RESPONSE_TEMPLATE = (
    "Generated Cypher query:\n{query}\n\n" "Cypher Response:\n{response}"
)
DEFAULT_ALLOWED_FIELDS = ["text", "label", "type"]

DEFAULT_TEXT_TO_CYPHER_TEMPLATE = (
    index.property_graph_store.text_to_cypher_template,
)


cypher_retriever = TextToCypherRetriever(
    index.property_graph_store,
    # customize the LLM, defaults to Settings.llm
    llm=llm,
    # customize the text-to-cypher template.
    # Requires `schema` and `question` template args
    text_to_cypher_template=DEFAULT_TEXT_TO_CYPHER_TEMPLATE,
    # customize how the cypher result is inserted into
    # a text node. Requires `query` and `response` template args
    response_template=DEFAULT_RESPONSE_TEMPLATE,
    # an optional callable that can clean/verify generated cypher
    cypher_validator=None,
    # allowed fields in the resulting
    allowed_output_field=DEFAULT_ALLOWED_FIELDS,
)

注意：执行任意密码有风险。请确保采取必要措施（只读角色、沙盒环境等）以确保在生产环境中安全使用。

CypherTemplateRetriever

这是的更受约束的版本TextToCypherRetriever。我们可以提供一个密码模板，让 LLM 填写空白，而不是让 LLM 自由生成任何密码语句。

为了说明这是如何工作的，这里有一个小例子：

Python

# NOTE: current v1 is needed
from pydantic import BaseModel, Field
from llama_index.core.indices.property_graph import CypherTemplateRetriever

# write a query with template params
cypher_query = """
MATCH (c:Chunk)-[:MENTIONS]->(o)
WHERE o.name IN $names
RETURN c.text, o.name, o.label;
"""


# create a pydantic class to represent the params for our query
# the class fields are directly used as params for running the cypher query
class TemplateParams(BaseModel):
    """Template params for a cypher query."""

    names: list[str] = Field(
        description="A list of entity names or keywords to use for lookup in a knowledge graph."
    )


template_retriever = CypherTemplateRetriever(
    index.property_graph_store, TemplateParams, cypher_query
)

存储

目前，支持的属性图的图形存储包括：

内存中	本机嵌入支持	异步	基于服务器还是基于磁盘？
简单属性图存储	✅	❌	❌
Neo4jPropertyGraphStore	❌	✅	❌
NebulaPropertyGraphStore	❌	❌	❌
TiDBPropertyGraphStore	❌	✅	❌
FalkorDBPropertyGraphStore	❌	✅	❌

保存到磁盘/从磁盘保存

默认属性图存储 SimplePropertyGraphStore 将所有内容存储在内存中并持久保存并从磁盘加载。

以下是使用默认图形存储保存/加载索引的示例：

Python

from llama_index.core import StorageContext, load_index_from_storage
from llama_index.core.indices import PropertyGraphIndex

# create
index = PropertyGraphIndex.from_documents(documents)

# save
index.storage_context.persist("./storage")

# load
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

使用集成保存和加载

集成通常会自动保存。有些图形存储会支持向量，有些则可能不支持。您也可以始终将图形存储与外部向量数据库相结合。

此示例显示如何使用 Neo4j 和 Qdrant 保存/加载属性图索引。

注意：如果未传入 qdrant，neo4j 将自行存储和使用嵌入。此示例说明了除此之外的灵活性。

Bash

pip install llama-index-graph-stores-neo4j llama-index-vector-stores-qdrant

Python

from llama_index.core import StorageContext, load_index_from_storage
from llama_index.core.indices import PropertyGraphIndex
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient, AsyncQdrantClient

vector_store = QdrantVectorStore(
    "graph_collection",
    client=QdrantClient(...),
    aclient=AsyncQdrantClient(...),
)

graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="<password>",
    url="bolt://localhost:7687",
)

# creates an index
index = PropertyGraphIndex.from_documents(
    documents,
    property_graph_store=graph_store,
    # optional, neo4j also supports vectors directly
    vector_store=vector_store,
    embed_kg_nodes=True,
)

# load from existing graph/vector store
index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    # optional, neo4j also supports vectors directly
    vector_store=vector_store,
    embed_kg_nodes=True,
)

直接使用 `Property Graph Store`

属性图的基本存储类是 PropertyGraphStore。这些属性图存储使用不同类型的对象构建LabeledNode，并使用对象连接Relation。

我们可以自己创造这些，也可以自己插入！

Python

from llama_index.core.graph_stores import (
    SimplePropertyGraphStore,
    EntityNode,
    Relation,
)
from llama_index.core.schema import TextNode

graph_store = SimplePropertyGraphStore()

entities = [
    EntityNode(name="llama", label="ANIMAL", properties={"key": "val"}),
    EntityNode(name="index", label="THING", properties={"key": "val"}),
]

relations = [
    Relation(
        label="HAS",
        source_id=entities[0].id,
        target_id=entities[1].id,
        properties={},
    )
]

graph_store.upsert_nodes(entities)
graph_store.upsert_relations(relations)

# optionally, we can also insert text chunks
source_chunk = TextNode(id_="source", text="My llama has an index.")

# create relation for each of our entities
source_relations = [
    Relation(
        label="HAS_SOURCE",
        source_id=entities[0].id,
        target_id="source",
    ),
    Relation(
        label="HAS_SOURCE",
        source_id=entities[1].id,
        target_id="source",
    ),
]
graph_store.upsert_llama_nodes([source_chunk])
graph_store.upsert_relations(source_relations)

图形存储上的其他有用方法包括： - graph_store.get(ids=[]) 根据 id 获取节点 - graph_store.get(properties={"key": "val"}) 根据匹配属性获取节点 - graph_store.get_rel_map([entity_node], depth=2) 获取达到一定深度的三元组 - graph_store.get_llama_nodes(['id1']) 获取原始文本节点 - graph_store.delete(ids=['id1']) 根据 id 删除 - graph_store.delete(properties={"key": "val"}) 根据属性删除 - graph_store.structured_query("<cypher query>") 运行密码查询（假设图形存储支持它）

此外，所有这些都存在用于异步支持的版本（即aget，adelete等等）。

高级定制

与 LlamaIndex 中的所有组件一样，您可以对模块进行子类化并自定义其工作方式，以便完全按照您的需要工作，或者尝试新的想法并研究新的模块！

子类提取器

LlamaIndex 中的图形提取器是该类的子TransformComponent类。如果您以前使用过摄取管道，那么您会很熟悉它，因为它是同一个类。

提取器的要求是将图形数据插入到节点的元数据中，然后稍后由索引进行处理。

以下是通过子类化创建自定义提取器的一个小例子：

Python

from llama_index.core.graph_store.types import (
    EntityNode,
    Relation,
    KG_NODES_KEY,
    KG_RELATIONS_KEY,
)
from llama_index.core.schema import BaseNode, TransformComponent


class MyGraphExtractor(TransformComponent):
    # the init is optional
    # def __init__(self, ...):
    #     ...

    def __call__(
        self, llama_nodes: list[BaseNode], **kwargs
    ) -> list[BaseNode]:
        for llama_node in llama_nodes:
            # be sure to not overwrite existing entities/relations

            existing_nodes = llama_node.metadata.pop(KG_NODES_KEY, [])
            existing_relations = llama_node.metadata.pop(KG_RELATIONS_KEY, [])

            existing_nodes.append(
                EntityNode(
                    name="llama", label="ANIMAL", properties={"key": "val"}
                )
            )
            existing_nodes.append(
                EntityNode(
                    name="index", label="THING", properties={"key": "val"}
                )
            )

            existing_relations.append(
                Relation(
                    label="HAS",
                    source_id="llama",
                    target_id="index",
                    properties={},
                )
            )

            # add back to the metadata

            llama_node.metadata[KG_NODES_KEY] = existing_nodes
            llama_node.metadata[KG_RELATIONS_KEY] = existing_relations

        return llama_nodes

    # optional async method
    # async def acall(self, llama_nodes: list[BaseNode], **kwargs) -> list[BaseNode]:
    #    ...

子检索器

检索器比提取器稍微复杂一些，并且有自己的特殊类，以帮助更容易地进行子类化。

检索的返回类型非常灵活。它可以是 - 字符串 - 一个 TextNode - 一个 NodeWithScore 上述之一的列表

以下是创建自定义检索器的子类化的一个小例子：

Python

from llama_index.core.indices.property_graph import (
    CustomPGRetriever,
    CUSTOM_RETRIEVE_TYPE,
)


class MyCustomRetriever(CustomPGRetriever):
    def init(self, my_option_1: bool = False, **kwargs) -> None:
        """Uses any kwargs passed in from class constructor."""
        self.my_option_1 = my_option_1
        # optionally do something with self.graph_store

    def custom_retrieve(self, query_str: str) -> CUSTOM_RETRIEVE_TYPE:
        # some some operation with self.graph_store
        return "result"

    # optional async method
    # async def acustom_retrieve(self, query_str: str) -> str:
    #     ...


custom_retriever = MyCustomRetriever(graph_store, my_option_1=True)

retriever = index.as_retriever(sub_retrievers=[custom_retriever])

对于更复杂的定制和用例，建议检查源代码并直接进行子类化BasePGRetriever。

示例

下面，您可以找到一些示例笔记本，展示了PropertyGraphIndex

基本用法
使用 Neo4j
使用 Nebula
Neo4j 和本地模型的高级用法
使用属性图存储
创建自定义图形检索器
比较 KG 萃取器

属性图索引

使用属性图谱索引

用法

构造属性图谱

SimpleLLMPathExtractor (默认使用)

ImplicitPathExtractor （默认使用）

DynamicLLMPathExtractor

SchemaLLMPathExtractor

检索和查询

LLMSynonymRetriever (默认使用)

VectorContextRetriever (如果支持的话则默认使用)

TextToCypherRetriever

CypherTemplateRetriever

存储

保存到磁盘/从磁盘保存

使用集成保存和加载

直接使用 Property Graph Store

高级定制

子类提取器

子检索器

示例

`LLMSynonymRetriever` (默认使用)

`VectorContextRetriever` (如果支持的话则默认使用)

`TextToCypherRetriever`

直接使用 `Property Graph Store`