(Beta) DocArray v2#
Jina provides early support for DocArray v2 which is a rewrite of DocArray. DocArray v2 makes the dataclass feature of DocArray v1 a first-class citizen and for this purpose it is built on top of pydantic and . An important shift is that DocArray v2 adapts to users’ data, whereas DocArray v1 forces user to adapt to the Document schema.
Beta support DocArray v2 is still in alpha, its support in Jina is still an experimental feature, and the API is subject to change.
DocArray v2 schema#
At the heart of DocArray v2 is a new schema that is more flexible and expressive than the original DocArray schema.
You can refer to the DocArray v2 readme for more details.
On the Jina side, this flexibility extends to every Executor, where you can now customize input and output schemas:
With DocArray v1 (the version currently used by default in Jina), a Document has a fixed schema and an Executor performs in-place operations on it.
With DocArray v2, an Executor defines its own input and output schemas. It also provides several predefined schemas that you can use out of the box.
(Beta) New Executor API#
To reflect the change with DocArray v2, the Executor API now supports schema definition. The design is inspired by FastAPI.
from jina import Executor, requests from docarray import BaseDocument, DocumentArray from docarray.documents import ImageDoc from docarray.typing import AnyTensor import numpy as np class InputDoc(BaseDocument): img: ImageDoc class OutputDoc(BaseDocument): embedding: AnyTensor class MyExec(Executor): @requests(on='/bar') def bar( self, docs: DocumentArray[InputDoc], **kwargs ) -> DocumentArray[OutputDoc]: docs_return = DocumentArray[OutputDoc]( [OutputDoc(embedding=np.zeros((100, 1))) for _ in range(len(docs))] ) return docs_return
For our Executor we define:
An input schema
InputDocand an output schema
OutputDoc, which are Documents.
barendpoint, which takes a DocumentArray of
InputDocas input and returns a DocumentArray of
Note that the type hint is actually more that just a hint – the Executor uses it to infer the actual schema of the endpoint.
You can also explicitly define the schema of the endpoint by using the
response_schema parameters of the
class MyExec(Executor): @requests( on='/bar', request_schema=DocumentArray[InputDoc], response_schema=DocumentArray[OutputDoc], ) def bar(self, docs, **kwargs): docs_return = DocumentArray[OutputDoc]( [OutputDoc(embedding=np.zeros((100, 1))) for _ in range(len(docs))] ) return docs_return
If there is no
response_schema, the type hint is used to infer the schema. If both exist,
response_schema will be used.
(Beta) Client API#
In the client, you similarly specify the schema that you expect the Flow to return. You can pass the return type by using the
return_type parameter in the
from jina import Deployment with Deployment(uses=MyExec) as dep: docs = dep.post( on='/bar', inputs=InputDoc(img=ImageDoc(tensor=np.zeros((3, 224, 224)))), return_type=DocumentArray[OutputDoc], ) assert docs.embedding.shape == (100, 1) assert docs.__class__.document_type == OutputDoc
Pydantic documentation for more details on the schema definition