Fluent Interface

Jina provides a simple fluent interface for Document that allows one to process (often preprocess) a Document object by chaining methods. For example to read an image file as numpy.ndarray, resize it, normalize it and then store it to another file; one can simply do:

from jina import Document

d = (
    Document(uri='apple.png')
    .load_uri_to_image_blob()
    .set_image_blob_shape((64, 64))
    .set_image_blob_normalization()
    .dump_image_blob_to_file('apple1.png')
)
../../../_images/apple1.png

Original apple.png

../../../_images/apple11.png

Processed apple1.png

Important

Note that, chaining methods always modify the original Document in-place. That means the above example is equivalent to:

from jina import Document

d = Document(uri='apple.png')

(d.load_uri_to_image_blob()
  .set_image_blob_shape((64, 64))
  .set_image_blob_normalization()
  .dump_image_blob_to_file('apple1.png'))

Parallelization

Fluent interface is super useful when processing a large DocumentArray or DocumentArrayMemmap. One can leverage map() to speed up things quite a lot.

The following example shows the time difference on preprocessing ~6000 image Documents.

from jina import DocumentArray
from jina.logging.profile import TimeContext

docs = DocumentArray.from_files('*.jpg')

def foo(d):
    return (d.load_uri_to_image_blob()
            .set_image_blob_normalization()
            .set_image_blob_channel_axis(-1, 0))

with TimeContext('map-process'):
    for d in docs.map(foo, backend='process'):
        pass

with TimeContext('map-thread'):
    for d in docs.map(foo, backend='thread'):
        pass

with TimeContext('for-loop'):
    for d in docs:
        foo(d)
map-process ...	map-process takes 5 seconds (5.55s)
map-thread ...	map-thread takes 10 seconds (10.28s)
for-loop ...	for-loop takes 18 seconds (18.52s)

Methods

All the following methods can be chained.

Convert

Provide helper functions for Document to support conversion between blob, text and buffer.

TextData

Provide helper functions for Document to support text data.

ImageData

Provide helper functions for Document to support image data.

AudioData

Provide helper functions for Document to support audio data.

BufferData

Provide helper functions for Document to handle binary data.

DumpFile

Provide helper functions for Document to dump content to a file.

ContentProperty

Provide helper functions for Document to allow universal content property access.

VideoData

Provide helper functions for Document to support video data.

SingletonSugar

Provide sugary syntax for Document by inheriting methods from DocumentArray

MeshData

Provide helper functions for Document to support 3D mesh data and point cloud.