jina.types.arrays.document

class jina.types.arrays.document.DocumentArray[source]

Bases: jina.types.arrays.traversable.TraversableSequence, collections.abc.MutableSequence

DocumentArray is a mutable sequence of Document. It gives an efficient view of an array of Documents. One can iterate over it like a generator but ALSO modify it, count it, get item, or union two ‘DocumentArray’s using the ‘+’ and ‘+=’ operators.

Parameters

docs_proto (Union['RepeatedContainer', Sequence['Document']]) – A list of Document

DocumentSet is deprecated. A new class name is ChunkArray.

insert(index, doc)[source]

Insert :param:`doc.proto` at :param:`index` into the array of :class:`DocumentArray . :type index: int :param index: Position of the insertion. :type doc: Document :param doc: The doc to be inserted.

Return type

None

append(doc)[source]

Append :param:`doc` in DocumentArray. :type doc: Document :param doc: The doc needs to be appended. :rtype: Document :return: Appended internal list.

add(doc)[source]

Shortcut to append(), do not override this method. :type doc: Document :param doc: the document to add to the array :rtype: Document :return: Appended internal list.

extend(iterable)[source]

Extend the DocumentArray by appending all the items from the iterable. :type iterable: Iterable[ForwardRef] :param iterable: the iterable of Documents to extend this array with

Return type

None

clear()[source]

Clear the data of DocumentArray

reverse()[source]

In-place reverse the sequence.

build()[source]

Build a doc_id to doc mapping so one can later index a Document using doc_id as string key.

sort(*args, **kwargs)[source]

Sort the items of the DocumentArray in place.

Parameters
  • args – variable list of arguments to pass to the sorting underlying function

  • kwargs – keyword arguments to pass to the sorting underlying function

property all_embeddings

Return all embeddings from every document in this array as a ndarray :return: The corresponding documents in a DocumentArray,

and the documents have no embedding in a DocumentArray.

Return type

A tuple of embedding in np.ndarray

get_all_sparse_embeddings(embedding_cls_type)[source]

Return all embeddings from every document in this array as a sparse array

Parameters

embedding_cls_type (EmbeddingClsType) – Type of sparse matrix backend, e.g. scipy, torch or tf.

Returns

The corresponding documents in a DocumentArray, and the documents have no embedding in a DocumentArray.

Return type

A tuple of embedding and DocumentArray as sparse arrays

property all_contents

Return all embeddings from every document in this array as a ndarray :return: The corresponding documents in a DocumentArray,

and the documents have no contents in a DocumentArray.

Return type

A tuple of embedding in np.ndarray

extract_docs(*fields, stack_contents=False)[source]

Return in batches all the values of the fields :type fields: str :param fields: Variable length argument with the name of the fields to extract :type stack_contents: Union[bool, List[bool]] :param stack_contents: boolean flag indicating if output arrays should be stacked with np.stack :rtype: Tuple[Union[ndarray, List[ndarray]], DocumentArray] :return: Returns an np.ndarray or an array of np.ndarray with the batches for these fields

new()[source]

Create a new empty document appended to the end of the array. :rtype: Document :return: a new Document appended to the internal list