jina.executors.indexers¶
-
class
jina.executors.indexers.
BaseIndexer
(index_filename=None, *args, **kwargs)[source]¶ Bases:
jina.executors.BaseExecutor
base class for storing and searching any kind of data structure
The key functions here are
add()
andquery()
. One can decorate them withjina.decorator.require_train()
,jina.helper.batching()
andjina.logging.profile.profiling()
.One should always inherit from either
BaseVectorIndexer
orBaseKVIndexer
.See also
jina.drivers.handlers.index
Note
Calling
save()
to save aBaseIndexer
will create more than one files. One is the serialized version of theBaseIndexer
object, often ends with.bin
Warning
When using
BaseIndexer
out of the Pod, use it with context managerwith BaseIndexer() as b: b.add()
So that it can safely save the data. Or you have to manually call b.close() to close the indexer safely.
- Parameters
index_filename (
Optional
[str
]) – the name of the file for storing the index, when not given metas.name is used.args –
kwargs –
-
index_filename
= None¶ the file name of the stored index, no path is required
-
post_init
()[source]¶ query handler and write handler can not be serialized, thus they must be put into
post_init()
.
-
property
index_abspath
¶ Get the file path of the index storage
- Return type
str
-
query_handler
¶
-
null_query_handler
¶
-
property
is_exist
¶ Check if the database is exist or not
- Return type
bool
-
write_handler
¶
-
get_query_handler
()[source]¶ Get a readable index handler when the
index_abspath
already exist, need to be overrided
-
get_add_handler
()[source]¶ Get a writable index handler when the
index_abspath
already exist, need to be overrided
-
get_create_handler
()[source]¶ Get a writable index handler when the
index_abspath
does not exist, need to be overrided
-
property
size
¶ The number of vectors/chunks indexed
- Return type
int
-
class
jina.executors.indexers.
BaseVectorIndexer
(index_filename=None, *args, **kwargs)[source]¶ Bases:
jina.executors.indexers.BaseIndexer
An abstract class for vector indexer. It is equipped with drivers in
requests.on
All vector indexers should inherit from it.
It can be used to tell whether an indexer is vector indexer, via
isinstance(a, BaseVectorIndexer)
- Parameters
index_filename (
Optional
[str
]) – the name of the file for storing the index, when not given metas.name is used.args –
kwargs –
-
query_by_id
(ids, *args, **kwargs)[source]¶ Get the vectors by id, return a subset of indexed vectors
- Parameters
ids (
Union
[List
[int
],ndarray
]) – a list ofid
, i.e.doc.id
in protobufargs –
kwargs –
- Return type
ndarray
- Returns
-
add
(keys, vectors, *args, **kwargs)[source]¶ Add new chunks and their vector representations
- Parameters
keys (
ndarray
) –chunk_id
in 1D-ndarray, shape B x 1vectors (
ndarray
) – vector representations in B x D
-
query
(keys, top_k, *args, **kwargs)[source]¶ Find k-NN using query vectors, return chunk ids and chunk scores
- Parameters
keys (
ndarray
) – query vectors in ndarray, shape B x Dtop_k (
int
) – int, the number of nearest neighbour to return
- Return type
Tuple
[ndarray
,ndarray
]- Returns
a tuple of two ndarray. The first is ids in shape B x K (dtype=int), the second is scores in shape B x K (dtype=float)
-
class
jina.executors.indexers.
BaseKVIndexer
(index_filename=None, *args, **kwargs)[source]¶ Bases:
jina.executors.indexers.BaseIndexer
An abstract class for key-value indexer.
All key-value indexers should inherit from it.
It can be used to tell whether an indexer is key-value indexer, via
isinstance(a, BaseKVIndexer)
- Parameters
index_filename (
Optional
[str
]) – the name of the file for storing the index, when not given metas.name is used.args –
kwargs –
-
class
jina.executors.indexers.
UniqueVectorIndexer
(routes=None, resolve_all=True, *args, **kwargs)[source]¶ Bases:
jina.executors.compound.CompoundExecutor
A frequently used pattern for combining a
BaseVectorIndexer
and aDocIDCache
Create a new
CompoundExecutor
object- Parameters
routes (
Optional
[Dict
[str
,Dict
]]) –a map of function routes. The key is the function name, the value is a tuple of two pieces, where the first element is the name of the referred component (
metas.name
) and the second element is the name of the referred function.See also
add_route()
resolve_all (
bool
) – universally add*_all()
to all functions that have the identical name
Example:
We have two dummy executors as follows:
class dummyA(BaseExecutor): def say(self): return 'a' def sayA(self): print('A: im A') class dummyB(BaseExecutor): def say(self): return 'b' def sayB(self): print('B: im B')
and we create a
CompoundExecutor
consisting of these two viada, db = dummyA(), dummyB() ce = CompoundExecutor() ce.components = lambda: [da, db]
Now the new executor
ce
have two new methods, i.ece.sayA()
andce.sayB()
. They point to the originaldummyA.sayA()
anddummyB.sayB()
respectively. One can sayce
has inherited these two methods.The interesting part is
say()
, as this function name is shared betweendummyA
anddummyB
. It requires some resolution. When resolve_all=True, then a new functionsay_all()
is add toce
.ce.say_all
works as if you calldummyA.sayA()
anddummyB.sayB()
in a row. This makes sense in some cases such as training, saving. In other cases, it may require a more sophisticated resolution, where one can useadd_route()
to achieve that. For example,ce.add_route('say', db.name, 'say') assert b.say() == 'b'
Such resolution is what we call routes here, and it can be specified in advance with the arguments
routes
in__init__()
, or using YAML.!CompoundExecutor components: ... with: resolve_all: true routes: say: - dummyB-e3acc910 - say
-
class
jina.executors.indexers.
CompoundIndexer
(routes=None, resolve_all=True, *args, **kwargs)[source]¶ Bases:
jina.executors.compound.CompoundExecutor
A Frequently used pattern for combining A
BaseVectorIndexer
andBaseKVIndexer
. It will be equipped with predefinedrequests.on
behaviors:- In the index time
stores the vector via
BaseVectorIndexer
remove all vector information (embedding, buffer, blob, text)
store the remained meta information via
BaseKVIndexer
- In the query time
Find the knn using the vector via
BaseVectorIndexer
remove all vector information (embedding, buffer, blob, text)
Fill in the meta information of the chunk via
BaseKVIndexer
One can use the
ChunkIndexer
via!ChunkIndexer components: - !NumpyIndexer with: index_filename: vec.gz metas: name: vecidx # a customized name workspace: ${{TEST_WORKDIR}} - !BinaryPbIndexer with: index_filename: chunk.gz metas: name: chunkidx # a customized name workspace: ${{TEST_WORKDIR}} metas: name: chunk_compound_indexer workspace: ${{TEST_WORKDIR}}
Without defining any
requests.on
logic. When load from this YAML, it will be auto equipped withon: SearchRequest: - !VectorSearchDriver with: executor: BaseVectorIndexer - !PruneDriver with: pruned: - embedding - buffer - blob - text - !KVSearchDriver with: executor: BaseKVIndexer IndexRequest: - !VectorIndexDriver with: executor: BaseVectorIndexer - !PruneDriver with: pruned: - embedding - buffer - blob - text - !KVIndexDriver with: executor: BaseKVIndexer ControlRequest: - !ControlReqDriver {}
Create a new
CompoundExecutor
object- Parameters
routes (
Optional
[Dict
[str
,Dict
]]) –a map of function routes. The key is the function name, the value is a tuple of two pieces, where the first element is the name of the referred component (
metas.name
) and the second element is the name of the referred function.See also
add_route()
resolve_all (
bool
) – universally add*_all()
to all functions that have the identical name
Example:
We have two dummy executors as follows:
class dummyA(BaseExecutor): def say(self): return 'a' def sayA(self): print('A: im A') class dummyB(BaseExecutor): def say(self): return 'b' def sayB(self): print('B: im B')
and we create a
CompoundExecutor
consisting of these two viada, db = dummyA(), dummyB() ce = CompoundExecutor() ce.components = lambda: [da, db]
Now the new executor
ce
have two new methods, i.ece.sayA()
andce.sayB()
. They point to the originaldummyA.sayA()
anddummyB.sayB()
respectively. One can sayce
has inherited these two methods.The interesting part is
say()
, as this function name is shared betweendummyA
anddummyB
. It requires some resolution. When resolve_all=True, then a new functionsay_all()
is add toce
.ce.say_all
works as if you calldummyA.sayA()
anddummyB.sayB()
in a row. This makes sense in some cases such as training, saving. In other cases, it may require a more sophisticated resolution, where one can useadd_route()
to achieve that. For example,ce.add_route('say', db.name, 'say') assert b.say() == 'b'
Such resolution is what we call routes here, and it can be specified in advance with the arguments
routes
in__init__()
, or using YAML.!CompoundExecutor components: ... with: resolve_all: true routes: say: - dummyB-e3acc910 - say