(Beta) DocArray v2#
Jina provides early support for DocArray v2 which is a rewrite of DocArray. DocArray v2 makes the dataclass feature of DocArray v1 a first-class citizen and for this purpose it is built on top of pydantic and . An important shift is that DocArray v2 adapts to users’ data, whereas DocArray v1 forces user to adapt to the Document schema.
Warning
Beta support DocArray v2 is still in alpha, its support in Jina is still an experimental feature, and the API is subject to change.
DocArray v2 schema#
At the heart of DocArray v2 is a new schema that is more flexible and expressive than the original DocArray schema.
You can refer to the DocArray v2 readme for more details.
On the Jina side, this flexibility extends to every Executor, where you can now customize input and output schemas:
With DocArray v1 (the version currently used by default in Jina), a Document has a fixed schema and an Executor performs in-place operations on it.
With DocArray v2, an Executor defines its own input and output schemas. It also provides several predefined schemas that you can use out of the box.
(Beta) New Executor API#
To reflect the change with DocArray v2, the Executor API now supports schema definition. The design is inspired by FastAPI.
from jina import Executor, requests
from docarray import BaseDocument, DocumentArray
from docarray.documents import ImageDoc
from docarray.typing import AnyTensor
import numpy as np
class InputDoc(BaseDocument):
img: ImageDoc
class OutputDoc(BaseDocument):
embedding: AnyTensor
class MyExec(Executor):
@requests(on='/bar')
def bar(
self, docs: DocumentArray[InputDoc], **kwargs
) -> DocumentArray[OutputDoc]:
docs_return = DocumentArray[OutputDoc](
[OutputDoc(embedding=np.zeros((100, 1))) for _ in range(len(docs))]
)
return docs_return
For our Executor we define:
An input schema
InputDoc
and an output schemaOutputDoc
, which are Documents.The
bar
endpoint, which takes a DocumentArray ofInputDoc
as input and returns a DocumentArray ofOutputDoc
.
Note that the type hint is actually more that just a hint – the Executor uses it to infer the actual schema of the endpoint.
You can also explicitly define the schema of the endpoint by using the request_schema
and
response_schema
parameters of the requests
decorator:
class MyExec(Executor):
@requests(
on='/bar',
request_schema=DocumentArray[InputDoc],
response_schema=DocumentArray[OutputDoc],
)
def bar(self, docs, **kwargs):
docs_return = DocumentArray[OutputDoc](
[OutputDoc(embedding=np.zeros((100, 1))) for _ in range(len(docs))]
)
return docs_return
If there is no request_schema
and response_schema
, the type hint is used to infer the schema. If both exist, request_schema
and response_schema
will be used.
(Beta) Client API#
In the client, you similarly specify the schema that you expect the Flow to return. You can pass the return type by using the return_type
parameter in the client.post
method:
from jina import Deployment
with Deployment(uses=MyExec) as dep:
docs = dep.post(
on='/bar',
inputs=InputDoc(img=ImageDoc(tensor=np.zeros((3, 224, 224)))),
return_type=DocumentArray[OutputDoc],
)
assert docs[0].embedding.shape == (100, 1)
assert docs.__class__.document_type == OutputDoc