Basic Concepts#

This chapter introduces the basic terminologies you will encounter in the docs. But first, let’s look at the code below:

from jina import DocumentArray, Executor, Flow, requests


class FooExec(Executor):
    @requests
    async def add_text(self, docs: DocumentArray, **kwargs):
        for d in docs:
            d.text += 'hello, world!'


class BarExec(Executor):
    @requests
    async def add_text(self, docs: DocumentArray, **kwargs):
        for d in docs:
            d.text += 'goodbye!'


f = Flow(port=12345).add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2)

with f:
    f.block()
from jina import Client, DocumentArray

c = Client(port=12345)
r = c.post('/', DocumentArray.empty(2))
print(r.texts)

Running it gives you:

['hello, world!goodbye!', 'hello, world!goodbye!']

What happens underneath is depicted in the following animation:

../../_images/arch-overview.svg

The following concepts will be covered in the user guide:

Document#

Document is the fundamental data structure in Jina for representing multi-modal and cross-modal data. It is the essential element of IO in Jina. More information can be found in DocArray’s Docs.

DocumentArray#

DocumentArray is a list-like container of multiple Documents. More information can be found in DocArray’s Docs.

Executor#

Executor is a Python class that has a group of functions using DocumentArray as IO. Loosely speaking, each Executor is a microservice.

Flow#

Flow ties multiple Executors together into a logic pipeline to achieve a task. If Executor is a microservice, then Flow is the end-to-end service.

Gateway#

Gateway is the entrypoint of a Flow. It exposes multiple protocols for external communications; it routes all internal traffics.

Client#

Client is for connecting to a Gateway and sending/receiving data from it.

Deployment#

Deployment is an abstraction around Executor that lets the Gateway communicate with an Executor. It encapsulates and abstracts internal replication details.

gRPC, Websocket, HTTP#

They are network protocols for transmitting data. gRPC is always used between Gateway and Deployment communication.

TLS#

TLS is a security protocol designed to facilitate privacy and data security for communications over the Internet. The communication between Client and Gateway is protected by TLS.

Two coding styles#

In the documentation, you often see two coding styles when describing a Jina project:

Pythonic#

The Flow and Executors are all written in Python files, and the entrypoint is via Python.

YAMLish #

The Executors are written in Python files, and the Flow is defined in a YAML file. The entrypoint is via Jina CLI jina flow --uses flow.yml.

For example, the serve-side code above follows Pythonic style. It can be written as YAMLish style as follows:

from jina import DocumentArray, Executor, requests


class FooExec(Executor):
    @requests
    async def add_text(self, docs: DocumentArray, **kwargs):
        for d in docs:
            d.text += 'hello, world!'


class BarExec(Executor):
    @requests
    async def add_text(self, docs: DocumentArray, **kwargs):
        for d in docs:
            d.text += 'goodbye!'
jtype: Flow
with:
  port: 12345
executors:
- uses: FooExec
  replicas: 3
  py_modules: executor.py
- uses: BarExec
  replicas: 2
  py_modules: executor.py
jina flow --uses flow.yml

The YAMLish style separates the Flow representation from the logic code. It is more flexible to config and should be used for more complex projects in production. In many integrations such as JCloud, Kubernetes, YAMLish is more preferred.

Note that the two coding styles can be converted to each other easily. To load a Flow YAML into Python and run it:

from jina import Flow

f = Flow.load_config('flow.yml')

with f:
    f.block()

To dump a Flow into YAML:

from jina import Flow

Flow().add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2).save_config(
    'flow.yml'
)

Relationship between Jina and DocArray#

DocArray is a crucial upstream dependency of Jina. It is the data structure behind Jina. Without DocArray, Jina can not run.

DocArray contains a set of rich API for on the local & monolith development. Jina scales DocArray to the cloud. The picture below shows their relationship.

../../_images/docarray-jina.svg

In a common development journey, a brand-new project first moves horizontally left with DocArray, leveraging all machine learning stacks to improve quality and completing logics in a local environment. At this point, a POC is built. Then move vertically up with Jina, enhancing the POC with service endpoint, scalability and cloud-native features. Finally, you reach to the point where your service is ready for production.