jina.types.arrays.abstract module

class jina.types.arrays.abstract.AbstractDocumentArray[source]

Bases: abc.ABC

Abstract class that defines the public interface of DocumentArray classes

abstract get_attributes(*fields)[source]

Return all nonempty values of the fields from all docs this array contains

Parameters

fields (str) – Variable length argument with the name of the fields to extract

Return type

Union[List, List[List]]

abstract get_attributes_with_docs(*fields)[source]

Return all nonempty values of the fields together with their nonempty docs

Parameters

fields (str) – Variable length argument with the name of the fields to extract

Return type

Tuple[Union[List, List[List]], DocumentArray]

abstract traverse(traversal_paths)[source]

Return an Iterator of :class:TraversableSequence of the leaves when applying the traversal_paths. Each :class:TraversableSequence is either the root Documents, a ChunkArray or a MatchArray.

Parameters

traversal_paths (Iterable[str]) – a list of string that represents the traversal path

Return type

Iterable[ForwardRef]

abstract traverse_flat_per_path(traversal_paths)[source]

Returns a flattened :class:TraversableSequence per path in :param:traversal_paths with all Documents, that are reached by the path.

Parameters

traversal_paths (Iterable[str]) – a list of string that represents the traversal path

Return type

Iterable[ForwardRef]

abstract traverse_flat(traversal_paths)[source]

Returns a single flattened :class:TraversableSequence with all Documents, that are reached via the :param:traversal_paths.

Warning

When defining the :param:traversal_paths with multiple paths, the returned :class:Documents are determined at once and not on the fly. This is a different behavior then in :method:traverse and :method:traverse_flattened_per_path!

Parameters

traversal_paths (Iterable[str]) – a list of string that represents the traversal path

Return type

TraversableSequence

abstract match(darray, metric='cosine', limit=inf, normalization=None, use_scipy=False, metric_name=None)[source]

Compute embedding based nearest neighbour in another for each Document in self, and store results in matches.

Note

‘cosine’, ‘euclidean’, ‘sqeuclidean’ are supported natively without extra dependency.

You can use other distance metric provided by scipy, such as ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.

To use scipy metric, please set use_scipy=True.

  • To make all matches values in [0, 1], use dA.match(dB, normalization=(0, 1))

  • To invert the distance as score and make all values in range [0, 1],

    use dA.match(dB, normalization=(1, 0)). Note, how normalization differs from the previous.

Parameters
  • darray (Union[ForwardRef, ForwardRef]) – the other DocumentArray or DocumentArrayMemmap to match against

  • metric (Union[str, Callable[[ForwardRef, ForwardRef], ForwardRef]]) – the distance metric

  • limit (Optional[int]) – the maximum number of matches, when not given all Documents in another are considered as matches

  • normalization (Optional[Tuple[int, int]]) – a tuple [a, b] to be used with min-max normalization, the min distance will be rescaled to a, the max distance will be rescaled to b all values will be rescaled into range [a, b].

  • use_scipy (bool) – use Scipy as the computation backend

  • metric_name (Optional[str]) – if provided, then match result will be marked with this string.

Return type

None

abstract visualize(output=None, title=None, colored_tag=None, colormap='rainbow', method='pca', show_axis=False)[source]

Visualize embeddings in a 2D projection with the PCA algorithm. This function requires matplotlib installed.

If tag_name is provided the plot uses a distinct color for each unique tag value in the documents of the DocumentArray.

Parameters
  • output (Optional[str]) – Optional path to store the visualization. If not given, show in UI

  • title (Optional[str]) – Optional title of the plot. When not given, the default title is used.

  • colored_tag (Optional[str]) – Optional str that specifies tag used to color the plot

  • colormap (str) – the colormap string supported by matplotlib.

  • method (str) – the visualization method, available pca, tsne. pca is fast but may not well represent nonlinear relationship of high-dimensional data. tsne requires scikit-learn to be installed and is much slower.

  • show_axis (bool) – If set, axis and bounding box of the plot will be printed.

abstract sample(k, seed=None)[source]

random sample k elements from DocumentArray without replacement.

Parameters
  • k (int) – Number of elements to sample from the document array.

  • seed (Optional[int]) – initialize the random number generator, by default is None. If set will save the state of the random function to produce certain outputs.

Return type

DocumentArray

abstract shuffle(seed=None)[source]

Randomly shuffle documents within the DocumentArray.

Parameters

seed (Optional[int]) – initialize the random number generator, by default is None. If set will save the state of the random function to produce certain outputs.

Return type

DocumentArray

abstract extend(iterable)[source]

Extend the DocumentArrayMemmap by appending all the items from the iterable.

Parameters

iterable (Iterable[Document]) – the iterable of Documents to extend this array with

Return type

None

abstract append(doc, **kwargs)[source]

Append :param:`doc` in DocumentArrayMemmap.

Parameters
  • doc (Document) – The doc needs to be appended.

  • kwargs – keyword args