Jina Python API Reference¶
- jina.clients
- jina.docker
- jina.drivers
- jina.drivers.querylang
- jina.drivers.rank
- jina.drivers.cache
- jina.drivers.control
- jina.drivers.convert
- jina.drivers.craft
- jina.drivers.debug
- jina.drivers.delete
- jina.drivers.encode
- jina.drivers.evaluate
- jina.drivers.index
- jina.drivers.multimodal
- jina.drivers.predict
- jina.drivers.reduce
- jina.drivers.search
- jina.drivers.segment
- jina.executors
- jina.executors.classifiers
- jina.executors.crafters
- jina.executors.encoders
- jina.executors.evaluators
- jina.executors.indexers
- jina.executors.rankers
- jina.executors.segmenters
- jina.executors.compound
- jina.executors.decorators
- jina.executors.devices
- jina.executors.metas
- jina.executors.requests
- jina.flow
- jina.helloworld
- jina.jaml
- jina.logging
- jina.optimizers
- jina.parsers
- jina.peapods
- jina.proto
- jina.schemas
- jina.types
Top-level module of Jina.
The primary function of this module is to import all of the public Jina interfaces into a single place. The interfaces themselves are located in sub-modules, as described below.
-
class
jina.
AsyncClient
(args)[source]¶ Bases:
jina.clients.base.BaseClient
AsyncClient
is the asynchronous version of theClient
.They share the same interface, except in
AsyncClient
train()
,index()
,search()
methods are coroutines (i.e. declared with the async/await syntax), simply calling them will not schedule them to be executed.To actually run a coroutine, user need to put them in an event loop, e.g. via
asyncio.run()
,asyncio.create_task()
.AsyncClient
can be very useful in the integration settings, where Jina/Flow/Client is NOT the main logic, but rather served as a part of other program. In this case, users often do not want to let Jina control theasyncio.eventloop
. On contrary,Client
is controlling and wrapping the event loop internally, making the Client looks synchronous from outside.For example, say you have the Flow running in remote. You want to use Client to connect to it do some index and search, but meanwhile you have some other IO-bounded jobs and want to do them concurrently. You can use
AsyncClient
,from jina.clients.asyncio import AsyncClient ac = AsyncClient(...) async def jina_client_query(): await ac.search(...) async def heavylifting(): await other_library.download_big_files(...) async def concurrent_main(): await asyncio.gather(jina_client_query(), heavylifting()) if __name__ == '__main__': # under python asyncio.run(concurrent_main())
One can think of
Client
as Jina-managed eventloop, whereasAsyncClient
is self-managed eventloop.-
train
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘train’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Yield
result
- Return type
None
-
search
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘search’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Yield
result
- Return type
None
-
index
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘index’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Yield
result
- Return type
None
-
delete
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘delete’ request to the Flow.
- Parameters
inputs (
Union
[str
,Iterable
[str
],Callable
[…,Iterable
[str
]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Document idon_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Yield
result
- Return type
None
-
update
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘update’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Yield
result
- Return type
None
-
reload
(targets, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Send ‘reload’ request to the Flow.
- Parameters
targets (
Union
[str
,List
[str
]]) – the regex string or list of regex strings to match the pea/pod names.on_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Yield
result
-
-
class
jina.
AsyncFlow
(args=None, env=None, **kwargs)[source]¶ Bases:
jina.flow.mixin.async_crud.AsyncCRUDFlowMixin
,jina.flow.mixin.async_control.AsyncControlFlowMixin
,jina.flow.base.BaseFlow
AsyncFlow
is the asynchronous version of theFlow
. They share the same interface, except inAsyncFlow
train()
,index()
,search()
methods are coroutines (i.e. declared with the async/await syntax), simply calling them will not schedule them to be executed. To actually run a coroutine, user need to put them in an eventloop, e.g. viaasyncio.run()
,asyncio.create_task()
.AsyncFlow
can be very useful in the integration settings, where Jina/Jina Flow is NOT the main logic, but rather served as a part of other program. In this case, users often do not want to let Jina control theasyncio.eventloop
. On contrary,Flow
is controlling and wrapping the eventloop internally, making the Flow looks synchronous from outside.In particular,
AsyncFlow
makes Jina usage in Jupyter Notebook more natural and reliable. For example, the following code will use the eventloop that already spawned in Jupyter/ipython to run Jina Flow (instead of creating a new one).from jina import AsyncFlow import numpy as np with AsyncFlow().add() as f: await f.index_ndarray(np.random.random([5, 4]), on_done=print)
Notice that the above code will NOT work in standard Python REPL, as only Jupyter/ipython implements “autoawait”.
See also
Asynchronous in REPL: Autoawait
https://ipython.readthedocs.io/en/stable/interactive/autoawait.html
Another example is when using Jina as an integration. Say you have another IO-bounded job
heavylifting()
, you can use this feature to schedule Jinaindex()
andheavylifting()
concurrently. For example,async def run_async_flow_5s(): # WaitDriver pause 5s makes total roundtrip ~5s with AsyncFlow().add(uses='- !WaitDriver {}') as f: await f.index_ndarray(np.random.random([5, 4]), on_done=validate) async def heavylifting(): # total roundtrip takes ~5s print('heavylifting other io-bound jobs, e.g. download, upload, file io') await asyncio.sleep(5) print('heavylifting done after 5s') async def concurrent_main(): # about 5s; but some dispatch cost, can't be just 5s, usually at <7s await asyncio.gather(run_async_flow_5s(), heavylifting())
One can think of
Flow
as Jina-managed eventloop, whereasAsyncFlow
is self-managed eventloop.
-
jina.
Classifier
¶
-
class
jina.
Client
(args)[source]¶ Bases:
jina.clients.base.BaseClient
A simple Python client for connecting to the gRPC gateway.
It manages the asyncio event loop internally, so all interfaces are synchronous from the outside.
-
train
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘train’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Return type
None
- Returns
None
-
search
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘search’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Return type
None
- Returns
None
-
index
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘index’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Return type
None
- Returns
None
-
update
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘update’ request to the Flow.
- Parameters
inputs (
Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],Callable
[…,Union
[Document
,Iterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]],AsyncIterable
[Union
[~DocumentContentType, ~DocumentSourceType,Document
,Tuple
[~DocumentContentType, ~DocumentContentType],Tuple
[~DocumentSourceType, ~DocumentSourceType]]]]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Documenton_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Return type
None
- Returns
None
-
delete
(inputs, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Issue ‘update’ request to the Flow.
- Parameters
inputs (
Union
[str
,Iterable
[str
],Callable
[…,Iterable
[str
]]]) – input data which can be an Iterable, a function which returns an Iterable, or a single Document id.on_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Return type
None
- Returns
None
-
reload
(targets, on_done=None, on_error=None, on_always=None, **kwargs)[source]¶ Send ‘reload’ request to the Flow.
- Parameters
targets (
Union
[str
,List
[str
]]) – the regex string or list of regex strings to match the pea/pod names.on_done (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is resolved.on_error (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is rejected.on_always (
Optional
[Callable
[…,None
]]) – the function to be called when theRequest
object is is either resolved or rejected.kwargs – additional parameters
- Returns
None
-
-
jina.
Crafter
¶ alias of
jina.executors.crafters.BaseCrafter
-
class
jina.
Document
(document=None, field_resolver=None, copy=False, **kwargs)[source]¶ Bases:
jina.types.mixin.ProtoTypeMixin
,jina.types.document.traversable.Traversable
Document
is one of the primitive data type in Jina.It offers a Pythonic interface to allow users access and manipulate
jina.jina_pb2.DocumentProto
object without working with Protobuf itself.To create a
Document
object, simply:from jina import Document d = Document() d.text = 'abc'
Jina requires each Document to have a string id. You can set a custom one, or if non has been set a random one will be assigned.
Or you can use
Document
as a context manager:with Document() as d: d.text = 'hello' assert d.id # now `id` has value
To access and modify the content of the document, you can use
text
,blob
, andbuffer
. Each property is implemented with proper setter, to improve the integrity and user experience. For example, assigningdoc.blob
ordoc.embedding
can be simply done via:import numpy as np # to set as content d.content = np.random.random([10, 5]) # to set as embedding d.embedding = np.random.random([10, 5])
MIME type is auto set/guessed when setting
content
anduri
Document
also provides multiple way to build from existing Document. You can buildDocument
fromjina_pb2.DocumentProto
,bytes
,str
, andDict
. You can also use it as view (i.e. weak reference when building from an existingjina_pb2.DocumentProto
). For example,a = DocumentProto() b = Document(a, copy=False) a.text = 'hello' assert b.text == 'hello'
You can leverage the
convert_a_to_b()
interface to convert between content forms.- Parameters
document (
Optional
[~DocumentSourceType]) – the document to construct from. Ifbytes
is given then deserialize aDocumentProto
;dict
is given then parse aDocumentProto
from it;str
is given, then consider it as a JSON string and parse aDocumentProto
from it; finally, one can also give DocumentProto directly, then depending on thecopy
, it builds a view or a copy from it.copy (
bool
) – whendocument
is given as aDocumentProto
object, build a view (i.e. weak reference) from it or a deep copy from it.field_resolver (
Optional
[Dict
[str
,str
]]) – a map from field names defined indocument
(JSON, dict) to the field names defined in Protobuf. This is only used when the givendocument
is a JSON string or a Python dict.kwargs – other parameters to be set _after_ the document is constructed
Note
When
document
is a JSON string or Python dictionary object, the constructor will only map the values from known fields defined in Protobuf, all unknown fields are mapped todocument.tags
. For example,d = Document({'id': '123', 'hello': 'world', 'tags': {'good': 'bye'}}) assert d.id == '123' # true assert d.tags['hello'] == 'world' # true assert d.tags['good'] == 'bye' # true
-
property
siblings
¶ The number of siblings of the :class:
Document
- Getter
number of siblings
- Setter
number of siblings
- Type
int
- Return type
int
-
property
weight
¶ - Return type
float
- Returns
the weight of the document
-
property
modality
¶ - Return type
str
- Returns
the modality of the document.
-
property
content_hash
¶ Get the content hash of the document.
- Returns
the content_hash from the proto
-
update
(source, exclude_fields=None, include_fields=None)[source]¶ Updates fields specified in
include_fields
from the source to current Document.- Parameters
exclude_fields (
Optional
[Tuple
[str
, …]]) – a tuple of field names that excluded from the current document, when not given the non-empty fields of the current document is considered asexclude_fields
include_fields (
Optional
[Tuple
[str
, …]]) – a tuple of field names that included from the source document
Note
*.
destination
will be modified in place,source
will be unchanged- Return type
None
-
update_content_hash
(exclude_fields=('id', 'chunks', 'matches', 'content_hash', 'parent_id'), include_fields=None)[source]¶ Update the document hash according to its content.
- Parameters
exclude_fields (
Optional
[Tuple
[str
]]) – a tuple of field names that excluded when computing content hashinclude_fields (
Optional
[Tuple
[str
]]) – a tuple of field names that included when computing content hash
Note
“exclude_fields” and “include_fields” are mutually exclusive, use one only
- Return type
None
-
property
id
¶ The document id in hex string, for non-binary environment such as HTTP, CLI, HTML and also human-readable. it will be used as the major view.
- Return type
str
- Returns
the id from the proto
-
property
parent_id
¶ The document’s parent id in hex string, for non-binary environment such as HTTP, CLI, HTML and also human-readable. it will be used as the major view.
- Return type
str
- Returns
the parent id from the proto
-
property
blob
¶ Return
blob
, one of the content form of a Document.Note
Use
content
to return the content of a Document- Return type
ndarray
- Returns
the blob content from the proto
-
property
embedding
¶ Return
embedding
of the content of a Document.- Return type
ndarray
- Returns
the embedding from the proto
-
property
matches
¶ Get all matches of the current document.
- Return type
- Returns
the set of matches attached to this document
-
property
chunks
¶ Get all chunks of the current document.
- Return type
- Returns
the set of chunks of this document
-
set_attrs
(**kwargs)[source]¶ Bulk update Document fields with key-value specified in kwargs
See also
get_attrs()
for bulk get attributes- Parameters
kwargs – the keyword arguments to set the values, where the keys are the fields to set
-
get_attrs
(*args)[source]¶ Bulk fetch Document fields and return a dict of the key-value pairs
See also
update()
for bulk set/update attributesNote
Arguments will be extracted using dunder_get .. highlight:: python .. code-block:: python
d = Document({‘id’: ‘123’, ‘hello’: ‘world’, ‘tags’: {‘id’: ‘external_id’, ‘good’: ‘bye’}})
assert d.id == ‘123’ # true assert d.tags[‘hello’] == ‘world’ # true assert d.tags[‘good’] == ‘bye’ # true assert d.tags[‘id’] == ‘external_id’ # true
res = d.get_attrs(*[‘id’, ‘tags__hello’, ‘tags__good’, ‘tags__id’])
assert res[‘id’] == ‘123’ # true assert res[‘tags__hello’] == ‘world’ # true assert res[‘tags__good’] == ‘bye’ # true assert res[‘tags__id’] == ‘external_id’ # true
- Parameters
args – the variable length values to extract from the document
- Return type
Dict
[str
,Any
]- Returns
a dictionary mapping the fields in :param:args to the actual attributes of this document
-
get_attrs_values
(*args)[source]¶ Bulk fetch Document fields and return a list of the values of these fields
Note
Arguments will be extracted using dunder_get .. highlight:: python .. code-block:: python
d = Document({‘id’: ‘123’, ‘hello’: ‘world’, ‘tags’: {‘id’: ‘external_id’, ‘good’: ‘bye’}})
assert d.id == ‘123’ # true assert d.tags[‘hello’] == ‘world’ # true assert d.tags[‘good’] == ‘bye’ # true assert d.tags[‘id’] == ‘external_id’ # true
res = d.get_attrs_values(*[‘id’, ‘tags__hello’, ‘tags__good’, ‘tags__id’])
assert res == [‘123’, ‘world’, ‘bye’, ‘external_id’]
- Parameters
args – the variable length values to extract from the document
- Return type
List
[Any
]- Returns
a list with the attributes of this document ordered as the args
-
property
buffer
¶ Return
buffer
, one of the content form of a Document.Note
Use
content
to return the content of a Document- Return type
bytes
- Returns
the buffer bytes from this document
-
property
text
¶ Return
text
, one of the content form of a Document.Note
Use
content
to return the content of a Document- Returns
the text from this document content
-
property
uri
¶ Return the URI of the document.
- Return type
str
- Returns
the uri from this document proto
-
property
mime_type
¶ Get MIME type of the document
- Return type
str
- Returns
the mime_type from this document proto
-
property
content_type
¶ Return the content type of the document, possible values: text, blob, buffer
- Return type
str
- Returns
the type of content present in this document proto
-
property
content
¶ Return the content of the document. It checks whichever field among
blob
,text
,buffer
has value and return it.- Return type
~DocumentContentType
- Returns
the value of the content depending on :meth:`content_type
-
property
granularity
¶ Return the granularity of the document.
- Returns
the granularity from this document proto
-
property
adjacency
¶ Return the adjacency of the document.
- Returns
the adjacency from this document proto
-
property
score
¶ Return the score of the document.
- Returns
the score attached to this document as :class:NamedScore
-
convert_buffer_to_blob
(**kwargs)[source]¶ Assuming the
buffer
is a _valid_ buffer of Numpy ndarray, setblob
accordingly.- Parameters
kwargs – reserved for maximum compatibility when using with ConvertDriver
Note
One can only recover values not shape information from pure buffer.
-
convert_buffer_image_to_blob
(color_axis=- 1, **kwargs)[source]¶ Convert an image buffer to blob
- Parameters
color_axis (
int
) – the axis id of the color channel,-1
indicates the color channel info at the last axiskwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_blob_to_uri
(width, height, resize_method='BILINEAR', **kwargs)[source]¶ Assuming
blob
is a _valid_ image, seturi
accordingly :type width:int
:param width: the width of the blob :type height:int
:param height: the height of the blob :type resize_method:str
:param resize_method: the resize method name :param kwargs: reserved for maximum compatibility when using with ConvertDriver
-
convert_uri_to_blob
(color_axis=- 1, uri_prefix=None, **kwargs)[source]¶ Convert uri to blob
- Parameters
color_axis (
int
) – the axis id of the color channel,-1
indicates the color channel info at the last axisuri_prefix (
Optional
[str
]) – the prefix of the urikwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_data_uri_to_blob
(color_axis=- 1, **kwargs)[source]¶ Convert data URI to image blob
- Parameters
color_axis (
int
) – the axis id of the color channel,-1
indicates the color channel info at the last axiskwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_uri_to_buffer
(**kwargs)[source]¶ Convert uri to buffer Internally it downloads from the URI and set
buffer
.- Parameters
kwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_uri_to_data_uri
(charset='utf-8', base64=False, **kwargs)[source]¶ Convert uri to data uri. Internally it reads uri into buffer and convert it to data uri
- Parameters
charset (
str
) – charset may be any character set registered with IANAbase64 (
bool
) – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters.kwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_buffer_to_uri
(charset='utf-8', base64=False, **kwargs)[source]¶ Convert buffer to data uri. Internally it first reads into buffer and then converts it to data URI.
- Parameters
charset (
str
) – charset may be any character set registered with IANAbase64 (
bool
) – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters.kwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_text_to_uri
(charset='utf-8', base64=False, **kwargs)[source]¶ Convert text to data uri.
- Parameters
charset (
str
) – charset may be any character set registered with IANAbase64 (
bool
) – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters.kwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_uri_to_text
(**kwargs)[source]¶ Assuming URI is text, convert it to text
- Parameters
kwargs – reserved for maximum compatibility when using with ConvertDriver
-
convert_content_to_uri
(**kwargs)[source]¶ Convert content in URI with best effort
- Parameters
kwargs – reserved for maximum compatibility when using with ConvertDriver
-
MergeFrom
(doc)[source]¶ Merge the content of target :param:doc into current document.
- Parameters
doc (
Document
) – the document to merge from
-
CopyFrom
(doc)[source]¶ Copy the content of target :param:doc into current document.
- Parameters
doc (
Document
) – the document to copy from
-
plot
(output=None, inline_display=False)[source]¶ Visualize the Document recursively.
- Parameters
output (
Optional
[str
]) – a filename specifying the name of the image to be created, the suffix svg/jpg determines the file type of the output imageinline_display (
bool
) – show image directly inside the Jupyter Notebook
- Return type
None
-
property
non_empty_fields
¶ Return the set fields of the current document that are not empty
- Return type
Tuple
[str
]- Returns
the tuple of non-empty fields
-
class
jina.
DocumentSet
(docs_proto)[source]¶ Bases:
jina.types.sets.traversable.TraversableSequence
,collections.abc.MutableSequence
DocumentSet
is a mutable sequence ofDocument
. It gives an efficient view of a list of Document. One can iterate over it like a generator but ALSO modify it, count it, get item, or union two ‘DocumentSet’s using the ‘+’ and ‘+=’ operators.- Parameters
docs_proto (Union['RepeatedContainer', Sequence['Document']]) – A list of
Document
-
insert
(index, doc)[source]¶ Insert :param:`doc.proto` at :param:`index` into the list of :class:`DocumentSet .
- Parameters
index (
int
) – Position of the insertion.doc (Document) – The doc needs to be inserted.
- Return type
None
-
append
(doc)[source]¶ Append :param:`doc` in
DocumentSet
.
-
extend
(iterable)[source]¶ Extend the
DocumentSet
by appending all the items from the iterable.- Parameters
iterable (
Iterable
[ForwardRef
]) – the iterable of Documents to extend this set with- Return type
None
-
clear
()[source]¶ Clear the data of
DocumentSet
-
build
()[source]¶ Build a doc_id to doc mapping so one can later index a Document using doc_id as string key.
-
sort
(*args, **kwargs)[source]¶ Sort the items of the
DocumentSet
in place.- Parameters
args – variable set of arguments to pass to the sorting underlying function
kwargs – keyword arguments to pass to the sorting underlying function
-
property
all_embeddings
¶ Return all embeddings from every document in this set as a ndarray
- Returns
The corresponding documents in a
DocumentSet
, and the documents have no embedding in aDocumentSet
.- Return type
A tuple of embedding in
np.ndarray
-
property
all_contents
¶ Return all embeddings from every document in this set as a ndarray
- Returns
The corresponding documents in a
DocumentSet
, and the documents have no contents in aDocumentSet
.- Return type
A tuple of embedding in
np.ndarray
-
extract_docs
(*fields, stack_contents=False)[source]¶ Return in batches all the values of the fields
- Parameters
fields (
str
) – Variable length argument with the name of the fields to extractstack_contents (
bool
) – boolean flag indicating if output lists should be stacked with np.stack
- Return type
Tuple
[Union
[ndarray
,List
[ndarray
]],DocumentSet
]- Returns
Returns an
np.ndarray
or a list ofnp.ndarray
with the batches for these fields
-
jina.
Encoder
¶ alias of
jina.executors.encoders.BaseEncoder
-
jina.
Evaluator
¶
-
jina.
Executor
¶ alias of
jina.executors.BaseExecutor
-
class
jina.
Flow
(args=None, env=None, **kwargs)[source]¶ Bases:
jina.flow.mixin.crud.CRUDFlowMixin
,jina.flow.mixin.control.ControlFlowMixin
,jina.flow.base.BaseFlow
The synchronous version of
AsyncFlow
.For proper usage see this guide <https://docs.jina.ai/chapters/flow/index.html>
-
jina.
Indexer
¶ alias of
jina.executors.indexers.BaseIndexer
-
class
jina.
Message
(envelope, request, *args, **kwargs)[source]¶ Bases:
object
Message
is one of the primitive data type in Jina.It offers a Pythonic interface to allow users access and manipulate
jina.jina_pb2.MessageProto
object without working with Protobuf itself.A container class for
jina_pb2.MessageProto
. Note, the Protobuf version ofjina_pb2.MessageProto
contains ajina_pb2.EnvelopeProto
andjina_pb2.RequestProto
. Here, it contains:a
jina_pb2.EnvelopeProto
object- and one of:
a
Request
object wrappingjina_pb2.RequestProto
a
jina_pb2.RequestProto
object
It provide a generic view of as
jina_pb2.MessageProto
, allowing one to access its member, request and envelope as if usingjina_pb2.MessageProto
object directly.This class also collected all helper functions related to
jina_pb2.MessageProto
into one place.- Parameters
envelope (
Union
[bytes
,EnvelopeProto
,None
]) – Represents a Envelope, a part of the Message.request (
Union
[bytes
,RequestProto
]) – Represents a Requestargs – Additional positional arguments.
kwargs – Additional keyword arguments.
-
property
proto
¶ Get the RequestProto.
- Return type
MessageProto
- Returns
protobuf object
-
property
is_data_request
¶ check if the request is not a control request
Warning
If
request
change the type, e.g. by leveraging the feature ofoneof
, this property wont be updated. This is not considered as a good practice.- Return type
bool
- Returns
boolean which states if data is requested
-
dump
()[source]¶ Get the message in a list of bytes.
- Return type
List
[bytes
]- Returns
array, containing encoded receiver id, serialized envelope and the compressed serialized envelope
-
property
colored_route
¶ Get the string representation of the routes in a message.
- Return type
str
- Returns
colored route
-
add_route
(name, identity)[source]¶ Add a route to the envelope.
- Parameters
name (
str
) – the name of the pod serviceidentity (
str
) – the identity of the pod service
-
property
size
¶ Get the size in bytes.
To get the latest size, use it after
dump()
:return: size of the message
-
property
response
¶ Get the response of the message in protobuf.
Note
This should be only called at Gateway
- Return type
- Returns
request object which contains the response
-
merge_envelope_from
(msgs)[source]¶ Extend the current envelope routes with :param: msgs.
- Parameters
msgs (
List
[Message
]) – List of msgs.
-
add_exception
(ex=None, executor=None)[source]¶ Add exception to the last route in the envelope
- Parameters
ex (
Optional
[ForwardRef
]) – Exception to be addedexecutor (BaseExecutor) – Executor related to the exception
- Return type
None
-
property
is_error
¶ Return if the envelope status is ERROR.
- return
boolean stating if the status code of the envelope is error
- Return type
bool
-
property
is_ready
¶ Return if the envelope status is READY.
- Return type
bool
- Returns
boolean stating if the status code of the envelope is ready
-
class
jina.
MultimodalDocument
(document=None, chunks=None, modality_content_map=None, copy=False, **kwargs)[source]¶ Bases:
jina.types.document.Document
MultimodalDocument
is a data type created based on Jina primitive data typeDocument
.It shares the same methods and properties with
Document
, while it focus on modality at chunk level.Warning
It assumes that every
chunk
of adocument
belongs to a different modality.It assumes that every
MultimodalDocument
have at least two chunks.Build
MultimodalDocument
frommodality_content_mapping
expects you assignDocument.content
as the value of the dictionary.
- Parameters
document (
Optional
[~DocumentSourceType]) – the document to construct from. Ifbytes
is given then deserialize aDocumentProto
;dict
is given then parse aDocumentProto
from it;str
is given, then consider it as a JSON string and parse aDocumentProto
from it; finally, one can also give DocumentProto directly, then depending on thecopy
, it builds a view or a copy from it.chunks (
Optional
[Sequence
[Document
]]) – the chunks of the multimodal document to initialize with. Expected to received a list ofDocument
, with different modalities.copy (
bool
) – whendocument
is given as aDocumentProto
object, build a view (i.e. weak reference) from it or a deep copy from it.kwargs – further key value arguments
document – the document to construct from. If
bytes
is given then deserialize aDocumentProto
;dict
is given then parse aDocumentProto
from it;str
is given, then consider it as a JSON string and parse aDocumentProto
from it; finally, one can also give DocumentProto directly, then depending on thecopy
, it builds a view or a copy from it.copy – when
document
is given as aDocumentProto
object, build a view (i.e. weak reference) from it or a deep copy from it.field_resolver – a map from field names defined in
document
(JSON, dict) to the field names defined in Protobuf. This is only used when the givendocument
is a JSON string or a Python dict.kwargs – other parameters to be set _after_ the document is constructed
- Param
modality_content_mapping: A Python dict, the keys are the modalities and the values are the
content
of theDocument
Note
When
document
is a JSON string or Python dictionary object, the constructor will only map the values from known fields defined in Protobuf, all unknown fields are mapped todocument.tags
. For example,d = Document({'id': '123', 'hello': 'world', 'tags': {'good': 'bye'}}) assert d.id == '123' # true assert d.tags['hello'] == 'world' # true assert d.tags['good'] == 'bye' # true
-
property
is_valid
¶ A valid
MultimodalDocument
should meet the following requirements:Document should consist at least 2 chunks.
Length of modality is not identical to length of chunks.
- Return type
bool
- Returns
true if the document is valid
-
property
modality_content_map
¶ Get the mapping of modality and content, the mapping is represented as a
dict
, the keys are the modalities of the chunks, the values are the corresponded content of the chunks.- Return type
Dict
- Returns
the mapping of modality and content extracted from chunks.
-
property
modalities
¶ Get all modalities of the
MultimodalDocument
.- Return type
List
[str
]- Returns
List of modalities extracted from chunks of the document.
-
update_content_hash
(exclude_fields=('id', 'matches', 'content_hash'), include_fields=None)[source]¶ Update content hash of the document by including
chunks
when computing the hash- param exclude_fields
a tuple of field names that excluded when computing content hash
- Parameters
include_fields (
Optional
[Tuple
[str
]]) – a tuple of field names that included when computing content hash- Return type
None
-
class
jina.
NdArray
(proto=None, is_sparse=False, dense_cls=<class 'jina.types.ndarray.dense.numpy.DenseNdArray'>, sparse_cls=<class 'jina.types.ndarray.sparse.scipy.SparseNdArray'>, *args, **kwargs)[source]¶ Bases:
jina.types.ndarray.BaseNdArray
NdArray
is one of the primitive data type in Jina.It offers a Pythonic interface to allow users access and manipulate
jina.jina_pb2.NdArrayProto
object without working with Protobuf itself.A generic view of the Protobuf NdArray, unifying the view of DenseNdArray and SparseNdArray
This class should be used in nearly all the Jina context.
Simple usage:
# start from empty proto a = NdArray() # start from an existig proto a = NdArray(doc.embedding) # set value a.value = np.random.random([10, 5]) # get value print(a.value) # set value to a TF sparse tensor a.is_sparse = True a.value = SparseTensor(...) print(a.value)
Advanced usage:
NdArray
also takes a dense NdArray and a sparse NdArray constructor as arguments. You can consider them as the backend for dense and sparse NdArray. The combination is your choice, it could be:# numpy (dense) + scipy (sparse) from .dense.numpy import DenseNdArray from .sparse.scipy import SparseNdArray NdArray(dense_cls=DenseNdArray, sparse_cls=SparseNdArray) # numpy (dense) + pytorch (sparse) from .dense.numpy import DenseNdArray from .sparse.pytorch import SparseNdArray NdArray(dense_cls=DenseNdArray, sparse_cls=SparseNdArray) # numpy (dense) + tensorflow (sparse) from .dense.numpy import DenseNdArray from .sparse.tensorflow import SparseNdArray NdArray(dense_cls=DenseNdArray, sparse_cls=SparseNdArray)
Once you set sparse_cls, it will only accept the data type in that particular type. That is, you can not use a
NdArray
equipped with Tensorflow sparse to set/get Pytorch or Scipy sparse matrices.- Parameters
proto (
Optional
[NdArrayProto
]) – the protobuf message, when not given then create a new one viaget_null_proto()
is_sparse (
bool
) – if the ndarray is sparse, can be changed laterdense_cls (
Type
[BaseDenseNdArray
]) – the to-be-used class for DenseNdArray when is_sparse=Falsesparse_cls (
Type
[BaseSparseNdArray
]) – the to-be-used class for SparseNdArray when is_sparse=Trueargs – additional positional arguments stored as member and used for the parent initialization
kwargs – additional key value arguments stored as member and used for the parent initialization
Set the constructor method.
-
property
value
¶ Get the value of protobuf and return in corresponding type.
- Returns
value
-
class
jina.
QueryLang
(querylang=None, copy=False)[source]¶ Bases:
jina.types.mixin.ProtoTypeMixin
QueryLang
is one of the primitive data type in Jina.It offers a Pythonic interface to allow users access and manipulate
jina.jina_pb2.QueryLangProto
object without working with Protobuf itself.- To create a
QueryLang
object from a Dict containing the name of aBaseDriver
, and the parameters to override, simply:
from jina import QueryLang ql = QueryLang({name: 'SliceQL', priority: 1, parameters: {'start': 3, 'end': 1}})
Warning
The BaseDriver needs to be a QuerySetReader to be able to read the QueryLang
One can also build a :class`QueryLang` from JSON string, bytes, dict or directly from a protobuf object.
A
QueryLang
object (no matter how it is constructed) can be converted to protobuf object by using:# to protobuf object ql.as_pb_object
- Parameters
querylang (Optional[QueryLangSourceType]) – the query language source to construct from, acceptable types include:
jina_pb2.QueryLangProto
,bytes
,str
,Dict
, Tuple.copy (bool) – when
querylang
is given as aQueryLangProto
object, build a view (i.e. weak reference) from it or a deep copy from it.
Set constructor method.
-
property
priority
¶ Get the priority of this query language. The query language only takes effect when if it has a higher priority than the internal one with the same name
- Return type
int
-
property
name
¶ Get the name of the driver that the query language attached to.
- Return type
str
- To create a
-
class
jina.
QueryLangSet
(querylang_protos)[source]¶ Bases:
collections.abc.MutableSequence
QueryLangSet
is a mutable sequence ofQueryLang
. It gives an efficient view of a list of Document. One can iterate over it like a generator but ALSO modify it, count it, get item.- Parameters
querylang_protos (
RepeatedCompositeContainer
) – A list ofQueryLangProto
Set constructor method.
-
insert
(index, ql)[source]¶ Insert :param:`ql` at :param:`index` into _querylangs_proto.
- Return type
None
-
append
(value)[source]¶ Append :param:`value` in _querylangs_proto.
-
jina.
Ranker
¶ alias of
jina.executors.rankers.BaseRanker
-
class
jina.
Request
(request=None, envelope=None, copy=False)[source]¶ Bases:
jina.types.mixin.ProtoTypeMixin
Request
is one of the primitive data type in Jina.It offers a Pythonic interface to allow users access and manipulate
jina.jina_pb2.RequestProto
object without working with Protobuf itself.A container for serialized
jina_pb2.RequestProto
that only triggers deserialization and decompression when receives the first read access to its member.It overrides
__getattr__()
to provide the same get/set interface as anjina_pb2.RequestProto
object.- Parameters
request (
Union
[bytes
,dict
,str
,RequestProto
,None
]) – The request.envelope (
Optional
[EnvelopeProto
]) – EnvelopeProto object.copy (
bool
) – Copy the request ifcopy
is True.
Set constructor method.
- Parameters
request (
Union
[bytes
,dict
,str
,RequestProto
,None
]) – request object as bytes, dictionary, string or protobuf instanceenvelope (
Optional
[EnvelopeProto
]) – envelope of the requestcopy (
bool
) – if true, request is copied
-
is_used
¶ Return True when request has been r/w at least once
-
property
body
¶ Return the request type, raise
ValueError
if request_type not set.- Returns
body property
-
as_typed_request
(request_type)[source]¶ Change the request class according to the one_of value in
body
.- Parameters
request_type (
str
) – string representation of the request type- Returns
self
-
property
request_type
¶ Return the request body type, when not set yet, return
None
.- Return type
Optional
[str
]- Returns
request type
-
property
proto
¶ Cast
self
to ajina_pb2.RequestProto
. This will triggeris_used
. Laziness will be broken and serialization will be recomputed when callingSerializeToString()
.- Return type
RequestProto
- Returns
protobuf instance
-
SerializeToString
()[source]¶ Convert serialized data to string.
- Return type
bytes
- Returns
serialized request
-
property
queryset
¶ Get the queryset in
QueryLangSet
type.- Return type
- Returns
query lang set
-
class
jina.
Response
[source]¶ Bases:
object
Response is the
Request
object returns from the flow. Right now it shares the same representation asRequest
. At 0.8.12,Response
is a simple alias. But it does give a more consistent semantic on the client API: send aRequest
and receive aResponse
.
-
jina.
Segmenter
¶