jina.executors.indexers

class jina.executors.indexers.BaseIndexer(index_filename=None, key_length=36, *args, **kwargs)[source]

Bases: jina.executors.BaseExecutor

Base class for storing and searching any kind of data structure.

The key functions here are add() and query(). One can decorate them with jina.helper.batching() and jina.logging.profile.profiling().

One should always inherit from either BaseVectorIndexer or BaseKVIndexer.

See also

jina.drivers.handlers.index

Note

Calling save() to save a BaseIndexer will create more than one files. One is the serialized version of the BaseIndexer object, often ends with .bin

Warning

When using BaseIndexer out of the Pod, use it with context manager

with BaseIndexer() as b:
    b.add()

So that it can safely save the data. Or you have to manually call b.close() to close the indexer safely.

Parameters
  • index_filename (Optional[str]) – the name of the file for storing the index, when not given metas.name is used.

  • args – Additional positional arguments which are just used for the parent initialization

  • kwargs – Additional keyword arguments which are just used for the parent initialization

key_length

the default minimum length of the key, will be expanded one time on the first batch

add(*args, **kwargs)[source]

Add documents to the index.

Parameters
  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

update(*args, **kwargs)[source]

Update documents on the index.

Parameters
  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

delete(*args, **kwargs)[source]

Delete documents from the index.

Parameters
  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

post_init()[source]

query handler and write handler can not be serialized, thus they must be put into post_init().

query(*args, **kwargs)[source]

Query documents from the index.

Parameters
  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

property index_abspath

Get the file path of the index storage

Return type

str

Returns

absolute path

query_handler

The decorator to cache property of a class.

null_query_handler

The decorator to cache property of a class.

property is_exist

Check if the database is exist or not

Return type

bool

Returns

true if the absolute index path exists, else false

write_handler

The decorator to cache property of a class.

get_query_handler()[source]

Get a readable index handler when the index_abspath already exist, need to be overridden

get_add_handler()[source]

Get a writable index handler when the index_abspath already exist, need to be overridden

get_create_handler()[source]

Get a writable index handler when the index_abspath does not exist, need to be overridden

property size

The number of vectors or documents indexed.

Return type

int

Returns

size

close()[source]

Close all file-handlers and release all resources.

flush()[source]

Flush all buffered data to index_abspath

sample()[source]

Return a sample from this indexer, useful in sanity check

class jina.executors.indexers.BaseVectorIndexer(index_filename=None, key_length=36, *args, **kwargs)[source]

Bases: jina.executors.indexers.BaseIndexer

An abstract class for vector indexer. It is equipped with drivers in requests.on

All vector indexers should inherit from it.

It can be used to tell whether an indexer is vector indexer, via isinstance(a, BaseVectorIndexer)

embedding_cls_type = 'dense'
query_by_key(keys, *args, **kwargs)[source]

Get the vectors by id, return a subset of indexed vectors

Parameters
  • keys (Iterable[str]) – a list of id, i.e. doc.id in protobuf

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

ndarray

add(keys, vectors, *args, **kwargs)[source]

Add new chunks and their vector representations

Parameters
  • keys (Iterable[str]) – a list of id, i.e. doc.id in protobuf

  • vectors (EncodingType) – vector representations in B x D

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

None

query(vectors, top_k, *args, **kwargs)[source]

Find k-NN using query vectors, return chunk ids and chunk scores

Parameters
  • vectors (EncodingType) – query vectors in ndarray, shape B x D

  • top_k (int) – int, the number of nearest neighbour to return

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

Tuple[ForwardRef, ForwardRef]

update(keys, vectors, *args, **kwargs)[source]

Update vectors on the index.

Parameters
  • keys (Iterable[str]) – a list of id, i.e. doc.id in protobuf

  • vectors (EncodingType) – vector representations in B x D

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

None

delete(keys, *args, **kwargs)[source]

Delete vectors from the index.

Parameters
  • keys (Iterable[str]) – a list of id, i.e. doc.id in protobuf

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

None

class jina.executors.indexers.BaseKVIndexer(index_filename=None, key_length=36, *args, **kwargs)[source]

Bases: jina.executors.indexers.BaseIndexer

An abstract class for key-value indexer.

All key-value indexers should inherit from it.

It can be used to tell whether an indexer is key-value indexer, via isinstance(a, BaseKVIndexer)

add(keys, values, *args, **kwargs)[source]

Add the serialized documents to the index via document ids.

Parameters
  • keys (Iterable[str]) – a list of id, i.e. doc.id in protobuf

  • values (Iterable[bytes]) – serialized documents

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

None

query(key, *args, **kwargs)[source]

Find the serialized document to the index via document id.

Parameters
  • key (str) – document id

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

Optional[bytes]

update(keys, values, *args, **kwargs)[source]

Update the serialized documents on the index via document ids.

Parameters
  • keys (Iterable[str]) – a list of id, i.e. doc.id in protobuf

  • values (Iterable[bytes]) – serialized documents

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

None

delete(keys, *args, **kwargs)[source]

Delete the serialized documents from the index via document ids.

Parameters
  • keys (Iterable[str]) – a list of id, i.e. doc.id in protobuf

  • args – Additional positional arguments

  • kwargs – Additional keyword arguments

Return type

None

class jina.executors.indexers.UniqueVectorIndexer(routes=None, resolve_all=True, *args, **kwargs)[source]

Bases: jina.executors.compound.CompoundExecutor

A frequently used pattern for combining a BaseVectorIndexer and a DocCache

class jina.executors.indexers.CompoundIndexer(routes=None, resolve_all=True, *args, **kwargs)[source]

Bases: jina.executors.compound.CompoundExecutor

A Frequently used pattern for combining A BaseVectorIndexer and BaseKVIndexer. It will be equipped with predefined requests.on behaviors:

  • In the index time
      1. stores the vector via BaseVectorIndexer

      1. remove all vector information (embedding, buffer, blob, text)

      1. store the remained meta information via BaseKVIndexer

  • In the query time
      1. Find the knn using the vector via BaseVectorIndexer

      1. remove all vector information (embedding, buffer, blob, text)

      1. Fill in the meta information of the document via BaseKVIndexer

One can use the ChunkIndexer via

!ChunkIndexer
components:
  - !NumpyIndexer
    with:
      index_filename: vec.gz
    metas:
      name: vecidx  # a customized name
      workspace: ${{TEST_WORKDIR}}
  - !BinaryPbIndexer
    with:
      index_filename: chunk.gz
    metas:
      name: chunkidx  # a customized name
      workspace: ${{TEST_WORKDIR}}
metas:
  name: chunk_compound_indexer
  workspace: ${{TEST_WORKDIR}}

Without defining any requests.on logic. When load from this YAML, it will be auto equipped with

on:
  SearchRequest:
    - !VectorSearchDriver
      with:
        executor: BaseVectorIndexer
    - !PruneDriver
      with:
        pruned:
          - embedding
          - buffer
          - blob
          - text
    - !KVSearchDriver
      with:
        executor: BaseKVIndexer
    IndexRequest:
    - !VectorIndexDriver
      with:
        executor: BaseVectorIndexer
    - !PruneDriver
      with:
        pruned:
          - embedding
          - buffer
          - blob
          - text
    - !KVIndexDriver
      with:
        executor: BaseKVIndexer
  ControlRequest:
    - !ControlReqDriver {}