jina.executors.indexers

class jina.executors.indexers.BaseIndexer(index_filename=None, *args, **kwargs)[source]

Bases: jina.executors.BaseExecutor

base class for storing and searching any kind of data structure

The key functions here are add() and query(). One can decorate them with jina.decorator.require_train(), jina.helper.batching() and jina.logging.profile.profiling().

One should always inherit from either BaseVectorIndexer or BaseKVIndexer.

See also

jina.drivers.handlers.index

Note

Calling save() to save a BaseIndexer will create more than one files. One is the serialized version of the BaseIndexer object, often ends with .bin

Warning

When using BaseIndexer out of the Pod, use it with context manager

with BaseIndexer() as b:
    b.add()

So that it can safely save the data. Or you have to manually call b.close() to close the indexer safely.

Parameters
  • index_filename (Optional[str]) – the name of the file for storing the index, when not given metas.name is used.

  • args

  • kwargs

index_filename = None

the file name of the stored index, no path is required

handler_mutex = None

only one handler at a time by default

add(*args, **kwargs)[source]
post_init()[source]

query handler and write handler can not be serialized, thus they must be put into post_init().

query(*args, **kwargs)[source]
property index_abspath

Get the file path of the index storage

Return type

str

query_handler
null_query_handler
property is_exist

Check if the database is exist or not

Return type

bool

write_handler
get_query_handler()[source]

Get a readable index handler when the index_abspath already exist, need to be overrided

get_add_handler()[source]

Get a writable index handler when the index_abspath already exist, need to be overrided

get_create_handler()[source]

Get a writable index handler when the index_abspath does not exist, need to be overrided

property size

The number of vectors/chunks indexed

Return type

int

close()[source]

Close all file-handlers and release all resources.

flush()[source]

Flush all buffered data to index_abspath

train(*args, **kwargs)

Train this executor, need to be overrided

Return type

None

class jina.executors.indexers.BaseVectorIndexer(index_filename=None, *args, **kwargs)[source]

Bases: jina.executors.indexers.BaseIndexer

An abstract class for vector indexer. It is equipped with drivers in requests.on

All vector indexers should inherit from it.

It can be used to tell whether an indexer is vector indexer, via isinstance(a, BaseVectorIndexer)

Parameters
  • index_filename (Optional[str]) – the name of the file for storing the index, when not given metas.name is used.

  • args

  • kwargs

query_by_id(ids, *args, **kwargs)[source]

Get the vectors by id, return a subset of indexed vectors

Parameters
  • ids (Union[List[int], ndarray]) – a list of id, i.e. doc.id in protobuf

  • args

  • kwargs

Return type

ndarray

Returns

add(keys, vectors, *args, **kwargs)[source]

Add new chunks and their vector representations

Parameters
  • keys (ndarray) – chunk_id in 1D-ndarray, shape B x 1

  • vectors (ndarray) – vector representations in B x D

query(keys, top_k, *args, **kwargs)[source]

Find k-NN using query vectors, return chunk ids and chunk scores

Parameters
  • keys (ndarray) – query vectors in ndarray, shape B x D

  • top_k (int) – int, the number of nearest neighbour to return

Return type

Tuple[ndarray, ndarray]

Returns

a tuple of two ndarray. The first is ids in shape B x K (dtype=int), the second is scores in shape B x K (dtype=float)

train(*args, **kwargs)

Train this executor, need to be overrided

Return type

None

class jina.executors.indexers.BaseKVIndexer(index_filename=None, *args, **kwargs)[source]

Bases: jina.executors.indexers.BaseIndexer

An abstract class for key-value indexer.

All key-value indexers should inherit from it.

It can be used to tell whether an indexer is key-value indexer, via isinstance(a, BaseKVIndexer)

Parameters
  • index_filename (Optional[str]) – the name of the file for storing the index, when not given metas.name is used.

  • args

  • kwargs

add(keys, values, *args, **kwargs)[source]
query(key)[source]

Find the protobuf chunk/doc using id

Parameters

key (Any) – id

Return type

Optional[Any]

Returns

protobuf chunk or protobuf document

train(*args, **kwargs)

Train this executor, need to be overrided

Return type

None

class jina.executors.indexers.UniqueVectorIndexer(routes=None, resolve_all=True, *args, **kwargs)[source]

Bases: jina.executors.compound.CompoundExecutor

A frequently used pattern for combining a BaseVectorIndexer and a DocIDCache

Create a new CompoundExecutor object

Parameters
  • routes (Optional[Dict[str, Dict]]) –

    a map of function routes. The key is the function name, the value is a tuple of two pieces, where the first element is the name of the referred component (metas.name) and the second element is the name of the referred function.

    See also

    add_route()

  • resolve_all (bool) – universally add *_all() to all functions that have the identical name

Example:

We have two dummy executors as follows:

class dummyA(BaseExecutor):
    def say(self):
        return 'a'

    def sayA(self):
        print('A: im A')


class dummyB(BaseExecutor):
    def say(self):
        return 'b'

    def sayB(self):
        print('B: im B')

and we create a CompoundExecutor consisting of these two via

da, db = dummyA(), dummyB()
ce = CompoundExecutor()
ce.components = lambda: [da, db]

Now the new executor ce have two new methods, i.e ce.sayA() and ce.sayB(). They point to the original dummyA.sayA() and dummyB.sayB() respectively. One can say ce has inherited these two methods.

The interesting part is say(), as this function name is shared between dummyA and dummyB. It requires some resolution. When resolve_all=True, then a new function say_all() is add to ce. ce.say_all works as if you call dummyA.sayA() and dummyB.sayB() in a row. This makes sense in some cases such as training, saving. In other cases, it may require a more sophisticated resolution, where one can use add_route() to achieve that. For example,

ce.add_route('say', db.name, 'say')
assert b.say() == 'b'

Such resolution is what we call routes here, and it can be specified in advance with the arguments routes in __init__(), or using YAML.

!CompoundExecutor
components: ...
with:
  resolve_all: true
  routes:
    say:
    - dummyB-e3acc910
    - say
train(*args, **kwargs)

Train this executor, need to be overrided

Return type

None

class jina.executors.indexers.CompoundIndexer(routes=None, resolve_all=True, *args, **kwargs)[source]

Bases: jina.executors.compound.CompoundExecutor

A Frequently used pattern for combining A BaseVectorIndexer and BaseKVIndexer. It will be equipped with predefined requests.on behaviors:

  • In the index time
      1. stores the vector via BaseVectorIndexer

      1. remove all vector information (embedding, buffer, blob, text)

      1. store the remained meta information via BaseKVIndexer

  • In the query time
      1. Find the knn using the vector via BaseVectorIndexer

      1. remove all vector information (embedding, buffer, blob, text)

      1. Fill in the meta information of the chunk via BaseKVIndexer

One can use the ChunkIndexer via

!ChunkIndexer
components:
  - !NumpyIndexer
    with:
      index_filename: vec.gz
    metas:
      name: vecidx  # a customized name
      workspace: $TEST_WORKDIR
  - !BinaryPbIndexer
    with:
      index_filename: chunk.gz
    metas:
      name: chunkidx  # a customized name
      workspace: $TEST_WORKDIR
metas:
  name: chunk_compound_indexer
  workspace: $TEST_WORKDIR

Without defining any requests.on logic. When load from this YAML, it will be auto equipped with

on:
  SearchRequest:
    - !VectorSearchDriver
      with:
        executor: BaseVectorIndexer
    - !PruneDriver
      with:
        pruned:
          - embedding
          - buffer
          - blob
          - text
    - !KVSearchDriver
      with:
        executor: BaseKVIndexer
    IndexRequest:
    - !VectorIndexDriver
      with:
        executor: BaseVectorIndexer
    - !PruneDriver
      with:
        pruned:
          - embedding
          - buffer
          - blob
          - text
    - !KVIndexDriver
      with:
        executor: BaseKVIndexer
  ControlRequest:
    - !ControlReqDriver {}

Create a new CompoundExecutor object

Parameters
  • routes (Optional[Dict[str, Dict]]) –

    a map of function routes. The key is the function name, the value is a tuple of two pieces, where the first element is the name of the referred component (metas.name) and the second element is the name of the referred function.

    See also

    add_route()

  • resolve_all (bool) – universally add *_all() to all functions that have the identical name

Example:

We have two dummy executors as follows:

class dummyA(BaseExecutor):
    def say(self):
        return 'a'

    def sayA(self):
        print('A: im A')


class dummyB(BaseExecutor):
    def say(self):
        return 'b'

    def sayB(self):
        print('B: im B')

and we create a CompoundExecutor consisting of these two via

da, db = dummyA(), dummyB()
ce = CompoundExecutor()
ce.components = lambda: [da, db]

Now the new executor ce have two new methods, i.e ce.sayA() and ce.sayB(). They point to the original dummyA.sayA() and dummyB.sayB() respectively. One can say ce has inherited these two methods.

The interesting part is say(), as this function name is shared between dummyA and dummyB. It requires some resolution. When resolve_all=True, then a new function say_all() is add to ce. ce.say_all works as if you call dummyA.sayA() and dummyB.sayB() in a row. This makes sense in some cases such as training, saving. In other cases, it may require a more sophisticated resolution, where one can use add_route() to achieve that. For example,

ce.add_route('say', db.name, 'say')
assert b.say() == 'b'

Such resolution is what we call routes here, and it can be specified in advance with the arguments routes in __init__(), or using YAML.

!CompoundExecutor
components: ...
with:
  resolve_all: true
  routes:
    say:
    - dummyB-e3acc910
    - say
train(*args, **kwargs)

Train this executor, need to be overrided

Return type

None