jina.types.document.graph module

class jina.types.document.graph.GraphDocument(document=None, copy=False, force_undirected=False, **kwargs)[source]

Bases: jina.types.document.Document

GraphDocument is a data type created based on Jina primitive data type Document.

It adds functionality that lets you work with a Document as a directed graph where all its chunks are the nodes in the graph.

It exposes functionality to access and manipulate graph related info from the DocumentProto such as adjacency and edge features.

Warning

  • It assumes that every chunk of a document is a node of a graph.

Parameters
  • document (Optional[~DocumentSourceType]) – the document to construct from. If bytes is given then deserialize a DocumentProto; dict is given then parse a DocumentProto from it; str is given, then consider it as a JSON string and parse a DocumentProto from it; finally, one can also give DocumentProto directly, then depending on the copy, it builds a view or a copy from it.

  • copy (bool) – when document is given as a DocumentProto object, build a view (i.e. weak reference) from it or a deep copy from it.

  • force_undirected (bool) – indicates if the actual proto object represented by this GraphDocument must be updated to set its undirected property to True. Otherwise, the value providing by the document source or the default value is mantained. This parameter is called force_undirected and not undirected to make sure that if a valid DocumentSourceType is provided with an undirected flag set, it can be respected and not silently overriden by a missleading default. This is specially needed when a GraphDocument is distributed to Executors.

  • kwargs – further key value arguments

  • document – the document to construct from. If bytes is given then deserialize a DocumentProto; dict is given then parse a DocumentProto from it; str is given, then consider it as a JSON string and parse a DocumentProto from it; finally, one can also give DocumentProto directly, then depending on the copy, it builds a view or a copy from it.

  • copy – when document is given as a DocumentProto object, build a view (i.e. weak reference) from it or a deep copy from it.

  • field_resolver – a map from field names defined in document (JSON, dict) to the field names defined in Protobuf. This is only used when the given document is a JSON string or a Python dict.

  • kwargs – other parameters to be set _after_ the document is constructed

Note

When document is a JSON string or Python dictionary object, the constructor will only map the values from known fields defined in Protobuf, all unknown fields are mapped to document.tags. For example,

d = Document({'id': '123', 'hello': 'world', 'tags': {'good': 'bye'}})

assert d.id == '123'  # true
assert d.tags['hello'] == 'world'  # true
assert d.tags['good'] == 'bye'  # true
add_node(node)[source]

Add a a node to the graph

Parameters

node (Document) – the node to be added to the graph

remove_node(node)[source]

Remove a node from the graph along with the edges that may contain it

Parameters

node (Document) – the node to be removed from the graph

add_edge(doc1, doc2, features=None)[source]

Add an edge to the graph connecting doc1 with doc2

Parameters
  • doc1 (Document) – the starting node for this edge

  • doc2 (Document) – the ending node for this edge

  • features (Optional[Dict]) – Optional features dictionary to be added to this new created edge

remove_edge(doc1, doc2)[source]

Remove a node from the graph along with the edges that may contain it

Parameters
  • doc1 (Document) – the starting node for this edge

  • doc2 (Document) – the ending node for this edge

property edge_features

The dictionary of edge features, indexed by edge_id in the edge list

Return type

StructView

property adjacency

The adjacency list for this graph.

Return type

SparseNdArray

property undirected

The undirected flag of this graph.

Return type

bool

property num_nodes

The number of nodes in the graph

Return type

int

property num_edges

The number of edges in the graph

Return type

int

get_out_degree(doc)[source]

The out degree of the doc node

Parameters

doc (Document) – the document node from which to extract the outdegree.

Return type

int

get_in_degree(doc)[source]

The in degree of the doc node

Parameters

doc (Document) – the document node from which to extract the indegree.

Return type

int

property nodes

The nodes list for this graph

Return type

ChunkArray

get_outgoing_nodes(doc)[source]

Get all the outgoing edges from doc

Parameters

doc (Document) – the document node from which to extract the outgoing nodes.

Return type

Optional[ChunkArray]

get_incoming_nodes(doc)[source]

Get all the outgoing edges from doc

Parameters

doc (Document) – the document node from which to extract the incoming nodes.

Return type

Optional[ChunkArray]

static load_from_dgl_graph(dgl_graph)[source]

Construct a GraphDocument from of graph with type DGLGraph

Parameters

dgl_graph (DGLGraph) – the graph from which to construct a GraphDocument.

Warning

  • This method only deals with the graph structure (nodes and conectivity) graph

    features that are task specific are ignored.

  • This method has no way to know id the origin dgl_graph is an undirected graph, and therefore the property undirected will by False by default. If you want you can set the property manually.

Return type

GraphDocument

to_dgl_graph()[source]

Construct a dgl.DGLGraph from a GraphDocument instance.

  • This method only deals with the graph structure (nodes and conectivity) graph features that are task specific are ignored.

Return type

DGLGraph