Basic Concepts#
This chapter introduces the basic terminologies you will encounter in the docs. But first, let’s look at the code below:
from jina import DocumentArray, Executor, Flow, requests
class FooExec(Executor):
@requests
async def add_text(self, docs: DocumentArray, **kwargs):
for d in docs:
d.text += 'hello, world!'
class BarExec(Executor):
@requests
async def add_text(self, docs: DocumentArray, **kwargs):
for d in docs:
d.text += 'goodbye!'
f = Flow(port=12345).add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2)
with f:
f.block()
from jina import Client, DocumentArray
c = Client(port=12345)
r = c.post('/', DocumentArray.empty(2))
print(r.texts)
Running it gives you:
['hello, world!goodbye!', 'hello, world!goodbye!']
What happens underneath is depicted in the following animation:
The following concepts will be covered in the user guide:
- Document#
Document is the fundamental data structure in Jina for representing multi-modal and cross-modal data. It is the essential element of IO in Jina. More information can be found in DocArray’s Docs.
- DocumentArray#
DocumentArray is a list-like container of multiple Documents. More information can be found in DocArray’s Docs.
- Executor#
Executor
is a Python class that has a group of functions using DocumentArray as IO. Loosely speaking, each Executor is a microservice.- Flow#
Flow
ties multipleExecutor
s together into a logic pipeline to achieve a task. If Executor is a microservice, then Flow is the end-to-end service.- Gateway#
Gateway is the entrypoint of a Flow. It exposes multiple protocols for external communications; it routes all internal traffics.
- Client#
Client
is for connecting to a Gateway and sending/receiving data from it.- Deployment#
Deployment is an abstraction around
Executor
that lets the Gateway communicate with an Executor. It encapsulates and abstracts internal replication details.- gRPC, Websocket, HTTP#
They are network protocols for transmitting data. gRPC is always used between Gateway and Deployment communication.
- TLS#
TLS is a security protocol designed to facilitate privacy and data security for communications over the Internet. The communication between Client and Gateway is protected by TLS.
Two coding styles#
In the documentation, you often see two coding styles when describing a Jina project:
- Pythonic#
The Flow and Executors are all written in Python files, and the entrypoint is via Python.
- YAMLish #
The Executors are written in Python files, and the Flow is defined in a YAML file. The entrypoint is via Jina CLI
jina flow --uses flow.yml
.
For example, the serve-side code above follows Pythonic style. It can be written as YAMLish style as follows:
from jina import DocumentArray, Executor, requests
class FooExec(Executor):
@requests
async def add_text(self, docs: DocumentArray, **kwargs):
for d in docs:
d.text += 'hello, world!'
class BarExec(Executor):
@requests
async def add_text(self, docs: DocumentArray, **kwargs):
for d in docs:
d.text += 'goodbye!'
jtype: Flow
with:
port: 12345
executors:
- uses: FooExec
replicas: 3
py_modules: executor.py
- uses: BarExec
replicas: 2
py_modules: executor.py
jina flow --uses flow.yml
The YAMLish style separates the Flow representation from the logic code. It is more flexible to config and should be used for more complex projects in production. In many integrations such as JCloud, Kubernetes, YAMLish is more preferred.
Note that the two coding styles can be converted to each other easily. To load a Flow YAML into Python and run it:
from jina import Flow
f = Flow.load_config('flow.yml')
with f:
f.block()
To dump a Flow into YAML:
from jina import Flow
Flow().add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2).save_config(
'flow.yml'
)
Relationship between Jina and DocArray#
DocArray is a crucial upstream dependency of Jina. It is the data structure behind Jina. Without DocArray, Jina can not run.
DocArray contains a set of rich API for on the local & monolith development. Jina scales DocArray to the cloud. The picture below shows their relationship.
In a common development journey, a brand-new project first moves horizontally left with DocArray, leveraging all machine learning stacks to improve quality and completing logics in a local environment. At this point, a POC is built. Then move vertically up with Jina, enhancing the POC with service endpoint, scalability and cloud-native features. Finally, you reach to the point where your service is ready for production.