jina.types.sets.document

class jina.types.sets.document.DocumentSet(docs_proto)[source]

Bases: jina.types.sets.traversable.TraversableSequence, collections.abc.MutableSequence

DocumentSet is a mutable sequence of Document. It gives an efficient view of a list of Document. One can iterate over it like a generator but ALSO modify it, count it, get item, or union two ‘DocumentSet’s using the ‘+’ and ‘+=’ operators.

Parameters

docs_proto (Union['RepeatedContainer', Sequence['Document']]) – A list of Document

insert(index, doc)[source]

Insert :param:`doc.proto` at :param:`index` into the list of :class:`DocumentSet .

Parameters
  • index (int) – Position of the insertion.

  • doc (Document) – The doc needs to be inserted.

Return type

None

append(doc)[source]

Append :param:`doc` in DocumentSet.

Parameters

doc (Document) – The doc needs to be appended.

Return type

Document

Returns

Appended list.

add(doc)[source]

Shortcut to append(), do not override this method.

Parameters

doc (Document) – the document to add to the set

Return type

Document

Returns

Appended list.

extend(iterable)[source]

Extend the DocumentSet by appending all the items from the iterable.

Parameters

iterable (Iterable[ForwardRef]) – the iterable of Documents to extend this set with

Return type

None

clear()[source]

Clear the data of DocumentSet

reverse()[source]

In-place reverse the sequence.

build()[source]

Build a doc_id to doc mapping so one can later index a Document using doc_id as string key.

sort(*args, **kwargs)[source]

Sort the items of the DocumentSet in place.

Parameters
  • args – variable set of arguments to pass to the sorting underlying function

  • kwargs – keyword arguments to pass to the sorting underlying function

property all_embeddings

Return all embeddings from every document in this set as a ndarray

Returns

The corresponding documents in a DocumentSet, and the documents have no embedding in a DocumentSet.

Return type

A tuple of embedding in np.ndarray

get_all_sparse_embeddings(embedding_cls_type)[source]

Return all embeddings from every document in this set as a sparse array

Parameters

embedding_cls_type (EmbeddingClsType) – Type of sparse matrix backend, e.g. scipy, torch or tf.

Returns

The corresponding documents in a DocumentSet, and the documents have no embedding in a DocumentSet.

Return type

A tuple of embedding and DocumentSet as sparse arrays

property all_contents

Return all embeddings from every document in this set as a ndarray

Returns

The corresponding documents in a DocumentSet, and the documents have no contents in a DocumentSet.

Return type

A tuple of embedding in np.ndarray

extract_docs(*fields, stack_contents=False)[source]

Return in batches all the values of the fields

Parameters
  • fields (str) – Variable length argument with the name of the fields to extract

  • stack_contents (Union[bool, List[bool]]) – boolean flag indicating if output lists should be stacked with np.stack

Return type

Tuple[Union[ndarray, List[ndarray]], DocumentSet]

Returns

Returns an np.ndarray or a list of np.ndarray with the batches for these fields

new()[source]

Create a new empty document appended to the end of the set.

Return type

Document

Returns

a new Document appended to the set