jina.types.arrays.neural_ops module

class jina.types.arrays.neural_ops.DocumentArrayNeuralOpsMixin[source]

Bases: object

A mixin that provides match functionality to DocumentArrays

match(darray, metric='cosine', limit=inf, normalization=None, use_scipy=False, metric_name=None)[source]

Compute embedding based nearest neighbour in another for each Document in self, and store results in matches.

Note

‘cosine’, ‘euclidean’, ‘sqeuclidean’ are supported natively without extra dependency.

You can use other distance metric provided by scipy, such as ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.

To use scipy metric, please set use_scipy=True.

  • To make all matches values in [0, 1], use dA.match(dB, normalization=(0, 1))

  • To invert the distance as score and make all values in range [0, 1],

    use dA.match(dB, normalization=(1, 0)). Note, how normalization differs from the previous.

Parameters
  • darray (Union[ForwardRef, ForwardRef]) – the other DocumentArray or DocumentArrayMemmap to match against

  • metric (Union[str, Callable[[ForwardRef, ForwardRef], ForwardRef]]) – the distance metric

  • limit (Optional[int]) – the maximum number of matches, when not given all Documents in another are considered as matches

  • normalization (Optional[Tuple[int, int]]) – a tuple [a, b] to be used with min-max normalization, the min distance will be rescaled to a, the max distance will be rescaled to b all values will be rescaled into range [a, b].

  • use_scipy (bool) – use Scipy as the computation backend

  • metric_name (Optional[str]) – if provided, then match result will be marked with this string.

Return type

None

visualize(colored_tag=None, output=None, title=None, colormap='rainbow', show_axis=False)[source]

Visualize embeddings in a 2D projection with the PCA algorithm. This function requires matplotlib installed.

If tag_name is provided the plot uses a distinct color for each unique tag value in the documents of the DocumentArray.

Parameters
  • colored_tag (Optional[str]) – Optional str that specifies tag used to color the plot

  • output (Optional[str]) – Optional path to store the visualization. If not given, show in UI

  • title (Optional[str]) – Optional title of the plot. When not given, the default title is used.

  • colormap (str) – the colormap string supported by matplotlib.

  • show_axis (bool) – If set, axis and bounding box of the plot will be printed.