DocArray support#

Jina depends heavily on DocArray to provide the data that is processed inside Jina Executors and sent by our Clients. Recently, DocArray was heavily refactored for version 0.30.

Starting from that version, DocArray usage has changed drastically, however Jina can work seamlessly and automatically with any of the versions of Jina. Jina will automatically detect the docarray version installed and use the corresponding methods and APIs. However, developers must take into account that some APIs and usages have changed, especially when it comes to developing Executors.

The new version makes the dataclass feature of DocArray<0.30 a first-class citizen and for this purpose it is built on top of Pydantic. An important shift is that the new DocArray adapts to users’ data, whereas DocArray<0.30 forces user to adapt to the Document schema.

Document schema#

At the heart of DocArray>=0.30 is a new schema that is more flexible and expressive than the original DocArray schema.

You can refer to the DocArray README for more details. Please note that also the names of data structure change in the new version of DocArray.

TODO: ADD snippets for both versions

On the Jina side, this flexibility extends to every Executor, where you can now customize input and output schemas:

  • With DocArray<0.30 a Document has a fixed schema in the input and the output

  • With DocArray>=0.30 (the version currently used by default in Jina), an Executor defines its own input and output schemas. It also provides several predefined schemas that you can use out of the box.

Executor API#

To reflect the change with DocArray >=0.30, the Executor API supports schema definition. The design is inspired by FastAPI.

The main difference, is that for docarray<0.30 there is only a single Document with a fixed schema. However, with docarray>=0.30 user needs to define their own Document by subclassing from BaseDoc or taking any of the predefined Document types provided.

from jina import Executor, requests
from docarray import DocList, BaseDoc
from docarray.documents import ImageDoc
from docarray.typing import AnyTensor

import numpy as np

class InputDoc(BaseDoc):
    img: ImageDoc

class OutputDoc(BaseDoc):
    embedding: AnyTensor

class MyExec(Executor):
    @requests(on='/bar')
    def bar(
        self, docs: DocList[InputDoc], **kwargs
    ) -> DocList[OutputDoc]:
        docs_return = DocList[OutputDoc](
            [OutputDoc(embedding=np.zeros((100, 1))) for _ in range(len(docs))]
        )
        return docs_return
from jina import Executor, requests
from docarray import Document, DocumentArray

import numpy as np


class MyExec(Executor):
    @requests(on='/bar')
    def bar(
        self, docs: DocumentArray, **kwargs
    ):
        docs_return = DocumentArray(
            [Document(embedding=np.zeros((100, 1))) for _ in range(len(docs))]
        )
        return docs_return

To ease with the transition from the old to the new docarray versions, there is the LegacyDocument which is a predefined Document that aims to provide the same data type as the original Document in docarray<0.30.

Client API#

In the client, the big change is that when using docarray>=0.30. you specify the schema that you expect the Deployment or Flow to return. You can pass the return type by using the return_type parameter in the client.post method:

from jina import Client
from docarray import DocList, BaseDoc
from docarray.documents import ImageDoc
from docarray.typing import AnyTensor

class InputDoc(BaseDoc):
    img: ImageDoc

class OutputDoc(BaseDoc):
    embedding: AnyTensor

c = Client(host='')
c.post('/', DocList[InputDoc]([InputDoc(img=ImageDoc()) for _ in range(10)]), return_type=DocList[OutputDoc])
from jina import Client
from docarray import DocumentArray, Document

c = Client(host='')
c.post('/', DocumentArray([Document() for _ in range(10)]))

See also#