jina.types.arrays.document module

class jina.types.arrays.document.DocumentArray(docs=None)[source]

Bases: jina.types.arrays.traversable.TraversableSequence, collections.abc.MutableSequence, jina.types.arrays.document.DocumentArrayGetAttrMixin, jina.types.arrays.neural_ops.DocumentArrayNeuralOpsMixin, jina.types.arrays.search_ops.DocumentArraySearchOpsMixin, collections.abc.Iterable, jina.types.arrays.abstract.AbstractDocumentArray

DocumentArray is a mutable sequence of Document. It gives an efficient view of a list of Document. One can iterate over it like a generator but ALSO modify it, count it, get item, or union two ‘DocumentArray’s using the ‘+’ and ‘+=’ operators.

It is supposed to act as a view containing a pointer to a RepeatedContainer of DocumentProto while offering Document Jina native types when getting items or iterating over it

Parameters

docs (Optional[~DocumentArraySourceType]) – the document array to construct from. One can also give DocumentArrayProto directly, then depending on the copy, it builds a view or a copy from it. It also can accept a List

insert(index, doc)[source]

Insert :param:`doc.proto` at :param:`index` into the list of :class:`DocumentArray .

Parameters
  • index (int) – Position of the insertion.

  • doc (Document) – The doc needs to be inserted.

Return type

None

append(doc)[source]

Append :param:`doc` in DocumentArray.

Parameters

doc (Document) – The doc needs to be appended.

extend(iterable)[source]

Extend the DocumentArray by appending all the items from the iterable.

Parameters

iterable (Iterable[Document]) – the iterable of Documents to extend this array with

Return type

None

clear()[source]

Clear the data of DocumentArray

reverse()[source]

In-place reverse the sequence.

sort(key, top_k=None, reverse=False)[source]

Sort the items of the DocumentArray in place.

Parameters
  • key (Callable) – key callable to sort based upon

  • top_k (Optional[int]) – make sure that the first topk elements are correctly sorted rather than sorting the entire list

  • reverse (bool) – reverse=True will sort the list in descending order. Default is False

save(file, file_format='json')[source]

Save array elements into a JSON, a binary file or a CSV file.

Parameters
  • file (Union[str, TextIO, BinaryIO]) – File or filename to which the data is saved.

  • file_format (str) – json or binary or csv. JSON and CSV files are human-readable, but binary format gives much smaller size and faster save/load speed. Note that, CSV file has very limited compatability, complex DocumentArray with nested structure can not be restored from a CSV file.

Return type

None

classmethod load(file, file_format='json')[source]

Load array elements from a JSON or a binary file, or a CSV file.

Parameters
  • file (Union[str, TextIO, BinaryIO]) – File or filename to which the data is saved.

  • file_format (str) – json or binary or csv. JSON and CSV files are human-readable, but binary format gives much smaller size and faster save/load speed. CSV file has very limited compatability, complex DocumentArray with nested structure can not be restored from a CSV file.

Return type

DocumentArray

Returns

the loaded DocumentArray object

save_binary(file)[source]

Save array elements into a binary file.

Comparing to save_json(), it is faster and the file is smaller, but not human-readable.

Parameters

file (Union[str, BinaryIO]) – File or filename to which the data is saved.

Return type

None

save_json(file)[source]

Save array elements into a JSON file.

Comparing to save_binary(), it is human-readable but slower to save/load and the file size larger.

Parameters

file (Union[str, TextIO]) – File or filename to which the data is saved.

Return type

None

save_csv(file, flatten_tags=True)[source]

Save array elements into a CSV file.

Parameters
  • file (Union[str, TextIO]) – File or filename to which the data is saved.

  • flatten_tags (bool) – if set, then all fields in Document.tags will be flattened into tag__fieldname and stored as separated columns. It is useful when tags contain a lot of information.

Return type

None

classmethod load_json(file)[source]

Load array elements from a JSON file.

Parameters

file (Union[str, TextIO]) – File or filename to which the data is saved.

Return type

DocumentArray

Returns

a DocumentArray object

classmethod load_binary(file)[source]

Load array elements from a binary file.

Parameters

file (Union[str, BinaryIO]) – File or filename to which the data is saved.

Return type

DocumentArray

Returns

a DocumentArray object

classmethod load_csv(file)[source]

Load array elements from a binary file.

Parameters

file (Union[str, BinaryIO]) – File or filename to which the data is saved.

Return type

DocumentArray

Returns

a DocumentArray object

property embeddings: numpy.ndarray

Return a np.ndarray stacking all the embedding attributes as rows.

Warning

This operation assumes all embeddings have the same shape and dtype. All dtype and shape values are assumed to be equal to the values of the first element in the DocumentArray / DocumentArrayMemmap

Warning

This operation currently does not support sparse arrays.

Return type

ndarray

Returns

embeddings stacked per row as np.ndarray.

property tags: List[jina.types.struct.StructView]

Get the tags attribute of all Documents

Return type

List[StructView]

Returns

List of tags attributes for all Documents

property texts: List[str]

Get the text attribute of all Documents

Return type

List[str]

Returns

List of text attributes for all Documents

property buffers: List[bytes]

Get the buffer attribute of all Documents

Return type

List[bytes]

Returns

List of buffer attributes for all Documents

property blobs: numpy.ndarray

Return a np.ndarray stacking all the blob attributes.

The blob attributes are stacked together along a newly created first dimension (as if you would stack using np.stack(X, axis=0)).

Warning

This operation assumes all blobs have the same shape and dtype. All dtype and shape values are assumed to be equal to the values of the first element in the DocumentArray / DocumentArrayMemmap

Warning

This operation currently does not support sparse arrays.

Return type

ndarray

Returns

blobs stacked per row as np.ndarray.

class jina.types.arrays.document.DocumentArrayGetAttrMixin[source]

Bases: object

A mixin that provides attributes getter in bulk

get_attributes(*fields)[source]

Return all nonempty values of the fields from all docs this array contains

Parameters

fields (str) – Variable length argument with the name of the fields to extract

Return type

Union[List, List[List]]

Returns

Returns a list of the values for these fields. When fields has multiple values, then it returns a list of list.

get_attributes_with_docs(*fields)[source]

Return all nonempty values of the fields together with their nonempty docs

Parameters

fields (str) – Variable length argument with the name of the fields to extract

Return type

Tuple[Union[List, List[List]], DocumentArray]

Returns

Returns a tuple. The first element is a list of the values for these fields. When fields has multiple values, then it returns a list of list. The second element is the non-empty docs.

abstract property embeddings: numpy.ndarray

Return a np.ndarray stacking all the embedding attributes as rows.

Return type

ndarray

property blobs: numpy.ndarray

Return a np.ndarray stacking all the blob attributes.

The blob attributes are stacked together along a newly created first dimension (as if you would stack using np.stack(X, axis=0)).

Warning

This operation assumes all blobs have the same shape and dtype. All dtype and shape values are assumed to be equal to the values of the first element in the DocumentArray / DocumentArrayMemmap

Warning

This operation currently does not support sparse arrays.

Return type

ndarray

property tags: List[jina.types.struct.StructView]

Get the tags attribute of all Documents

Return type

List[StructView]

property texts: List[str]

Get the text attribute of all Documents

Return type

List[str]

property buffers: List[bytes]

Get the buffer attribute of all Documents

Return type

List[bytes]