docarray.document package

Subpackages

Submodules

Module contents

class docarray.document.Document(adjacency: Optional[int] = None, blob: Optional[ArrayType] = None, buffer: Optional[bytes] = None, chunks: Optional[Iterable[Document]] = None, embedding: Optional[ArrayType] = None, granularity: Optional[int] = None, id: Optional[str] = None, location: Optional[Sequence[float]] = None, matches: Optional[Iterable[Document]] = None, mime_type: Optional[str] = None, modality: Optional[str] = None, offset: Optional[float] = None, parent_id: Optional[str] = None, tags: Optional[Union[Dict, docarray.simple.struct.StructView]] = None, text: Optional[str] = None, uri: Optional[str] = None, weight: Optional[float] = None, **kwargs)[source]

Bases: docarray.document.mixins.AllMixins, docarray.base.BaseProtoView

Document is one of the primitive data type in Jina.

It offers a Pythonic interface to allow users access and manipulate jina.docarray_pb2.DocumentProto object without working with Protobuf itself.

To create a Document object, simply:

from jina import Document
d = Document()
d.text = 'abc'

Jina requires each Document to have a string id. You can set a custom one, or if non has been set a random one will be assigned.

To access and modify the content of the document, you can use text, blob, and buffer. Each property is implemented with proper setter, to improve the integrity and user experience. For example, assigning doc.blob or doc.embedding can be simply done via:

import numpy as np

# to set as content
d.content = np.random.random([10, 5])

# to set as embedding
d.embedding = np.random.random([10, 5])

MIME type is auto set/guessed when setting content and uri

Document also provides multiple way to build from existing Document. You can build Document from docarray_pb2.DocumentProto, bytes, str, and Dict. You can also use it as view (i.e. weak reference when building from an existing docarray_pb2.DocumentProto). For example,

a = DocumentProto()
b = Document(a, copy=False)
a.text = 'hello'
assert b.text == 'hello'

You can leverage the convert_a_to_b() interface to convert between content forms.

Parameters
  • obj (Union[ForwardRef, ForwardRef, None]) – the document to construct from. If bytes is given then deserialize a DocumentProto; dict is given then parse a DocumentProto from it; str is given, then consider it as a JSON string and parse a DocumentProto from it; finally, one can also give DocumentProto directly, then depending on the copy, it builds a view or a copy from it.

  • copy (bool) – when document is given as a DocumentProto object, build a view (i.e. weak reference) from it or a deep copy from it.

  • field_resolver (Dict[str, str]) – a map from field names defined in JSON, dict to the field names defined in Document.

  • kwargs – other parameters to be set _after_ the document is constructed

Note

When document is a JSON string or Python dictionary object, the constructor will only map the values from known fields defined in Protobuf, all unknown fields are mapped to document.tags. For example,

d = Document({'id': '123', 'hello': 'world', 'tags': {'good': 'bye'}})

assert d.id == '123'  # true
assert d.tags['hello'] == 'world'  # true
assert d.tags['good'] == 'bye'  # true
property weight: float
Return type

float

Returns

the weight of the document

property modality: str
Return type

str

Returns

the modality of the document.

property tags: docarray.simple.struct.StructView

Return the tags field of this Document as a Python dict

Return type

StructView

Returns

a Python dict view of the tags.

property id: str

The document id in string.

Return type

str

Returns

the id of this Document

property parent_id: str

The document’s parent id in string.

Return type

str

Returns

the parent id of this Document

property blob: ArrayType

Return blob, one of the content form of a Document.

Note

Use content to return the content of a Document

This property will return the blob of the Document as a Dense or Sparse array depending on the actual proto instance stored. In the case where the blob stored is sparse, it will return them as a coo matrix.

Return type

ArrayType

Returns

the blob content of thi Document

property embedding: ArrayType

Return embedding of the content of a Document.

Note

This property will return the embedding of the Document as a Dense or Sparse array depending on the actual proto instance stored. In the case where the embedding stored is sparse, it will return them as a coo matrix.

Return type

ArrayType

Returns

the embedding of this Document

property matches: MatchArray

Get all matches of the current document.

Return type

MatchArray

Returns

the array of matches attached to this document

property chunks: ChunkArray

Get all chunks of the current document.

Return type

ChunkArray

Returns

the array of chunks of this document

property buffer: bytes

Return buffer, one of the content form of a Document.

Note

Use content to return the content of a Document

Return type

bytes

Returns

the buffer bytes from this document

property text: str

Return text, one of the content form of a Document.

Note

Use content to return the content of a Document

Return type

str

Returns

the text from this document content

property uri: str

Return the URI of the document.

Return type

str

Returns

the uri of this Document

property mime_type: str

Get MIME type of the document

Return type

str

Returns

the mime_type of this Document

property granularity: int

Return the granularity of the document.

Return type

int

Returns

the granularity of this Document

property adjacency: int

Return the adjacency of the document.

Return type

int

Returns

the adjacency of this Document

property scores

Return the scores of the document.

Returns

the scores attached to this document as :class:NamedScoreMapping

property evaluations: docarray.simple.map.NamedScoreMap

Return the evaluations of the document.

Return type

NamedScoreMap

Returns

the evaluations attached to this document as :class:NamedScoreMapping

property location: Tuple[float]

Get the location information.

Return type

Tuple[float]

Returns

location info in a tuple.

property offset: float

Get the offset information of this Document.

Return type

float

Returns

the offset