Migrate Jina 2 to Jina 3#

Jina 3 comes with many improvements but to be able to enjoy them, you will also have to make some tweaks to your existing Jina 2 code.

One of the major changes in Jina 3 is DocArray being an external dependency: The previously included Document and DocumentArray data structures now form their own library and include new features, improved performance, and increased flexibility.

Accordingly, most of the breaking changes that users will experience when updating to Jina 3 are mainly related to Document and DocumentArray.

DocArray library

DocArray is our new library that includes the Document and DocumentArray data structures. Inside their own library, Document and DocumentArray are faster and more versatile than ever, and underpin neural search apps as well as the Jina ecosystem, including Jina and Finetuner.

In general, the breaking changes are aiming for increased simplicity and consistency, making your life easier in the long run. Here you can find out what exactly you will have to adapt.

Simple changes at a glance#

Many of the changes introduced in Jina 3 are easily adapted to a Jina 2 codebase. The modifications in the following table should, in most cases, be safe to perform without further thought or effort.

Jina 2	Jina 3
`doc.blob`	`doc.tensor`
`doc.buffer`	`doc.blob`
`docs.get_attributes('attribute')`	`docs[:, 'attribute']`
`['path1', 'path2']`	`'path1,path2'`
`docs.traverse_flat(paths)`	`docs['@paths']`
`docs.flatten()`	`docs[...]`
`doc.SerializeToString()`	`doc.to_bytes()`
`Document(bytes)`	`Document.from_bytes()`
`from jina import Document, DocumentArray`	`from docarray import Document, DocumentArray`

There are, however, some more nuanced changes in Jina 3 as well. These are outlined below.

Document: More natural attribute names and Pythonic serialization#

Docarray introduces more natural naming conventions for Document and DocumentArray attributes.

doc.blob is renamed to doc.tensor, to align with external libraries like PyTorch and Tensorflow
doc.buffer is renamed to doc.blob, to align with the industry standard
doc.SerializeToString() is removed in favour of doc.to_bytes() and doc.to_json()
Creating a Document from serialized data using Document(bytes) is removed in favour of Document.from_bytes(bytes) and Document.from_json(bytes)

Flow and Client: Simplified `.post()` behavior#

client.post() and flow.post() now return a flattened DocumentArray instead of a list of Responses when no callback function is specified.

.post() can still be configured to return a list of Responses, by passing return_responses=True to the Client or Flow constructors.

Consistent YAML parsing syntax#

In Jina 3, YAML syntax is aligned with Github Actions notation, which leads to the following changes:

Referencing environment variables using the syntax ${{ VAR }} is no longer allowed. The POSIX notations for environment variables, $var, has been deprecated. Instead, use ${{ ENV.VAR }}.
The syntax ${{ VAR }} now defaults to signifying a context variable, passed in a dict(). If you want to be explicit about the use of context variables, you can use ${{ CONTEXT.VAR }}.
Relative paths can point to other variables within the same .yaml file, and can be references using the syntax ${{root.path.to.var}}.

Environment variables vs. relative paths

Note that the only difference between and environment variable and relative path syntax is the inclusion of spaces in the former (${{ var }}), and the omission of spaces in the latter (${{path}}).

Common errors and solutions#

AttributeError: 'Document' object has no attribute 'buffer'

Solution

Replace doc.buffer with doc.blob in your entire codebase

RuntimeError: Could not infer dtype of NoneType while performing doc.embed()

Solution

Replace doc.blob with doc.tensor in your entire codebase

AttributeError: 'DocumentArray' object has no attribute 'get_attributes'

Solution

Replace docs.get_attributes('attribute') with docs[:, 'attribute']

AttributeError: 'Document' object has no attribute 'SerializeToString'

Solution

Replace doc.SerializeToString with doc.to_bytes or doc.to_json

ValueError: Failed to initialize docarray.document.Document from obj=b"..."

Solution

Replace Document(bytes) with Document.from_bytes(bytes)

TypeError: batch() got an unexpected keyword argument 'traversal_paths'

Solution

Replace docs.batch(traversal_path='path', batch_size=bs) with docs['@path'].batch(batch_size=bs)

TypeError: batch() got an unexpected keyword argument 'require_attr'

Solution

Replace docs.batch(traversal_path='path', require_attr='attr') with DocumentArray(filter(lambda x: bool(x.attr)), docs).batch(batch_size=bs)

AttributeError: 'Document' object has no attribute 'docs' when operating on the output of flow.post()

Solution

Remove resp[i].docs as flow.post() already returns a DocumentArray

Migrate Jina 2 to Jina 3#

Simple changes at a glance#

Document: More natural attribute names and Pythonic serialization#

DocumentArray: Simplified attribute, element access and new storage options#

Flow and Client: Simplified .post() behavior#

Consistent YAML parsing syntax#

Common errors and solutions#

Flow and Client: Simplified `.post()` behavior#