Jina YAML Syntax Reference¶
Jina configurations use YAML syntax, and must have either a .yml
or .yaml
file extension. If you’re new to YAML and want to learn more, see Learn YAML in five minutes.
Executor
YAML Syntax¶
All executors defined in jina.executors
can be loaded from a YAML config via jina.executors.BaseExecutor.load_config()
or via the CLI jina pod --uses.
The executor YAML config follows the syntax below.
!BasePbIndexer
with:
index_filename: doc.gzip
metas: # <- metas defined in :mod`jina.executors.metas`
name: doc_indexer # a customized name
workspace: $TEST_WORKDIR
-
!SomeExecutorClass
¶ The class of the executor, can be any class inherited from
jina.executors.BaseExecutor
. Note that it must starts with!
to tell the YAML parser that the section below is describing this class.
-
with
¶ A list of arguments in the
__init__()
function of this executor. One can use environment variables here to expand the variables.
-
metas
¶ A list of meta arguments defined in
jina.executors.metas
.
If an executor has no __init__()
or __init__()
requires no arguments, then one do not need to write with
at all.
In the minimum case, if you don’t want to specify any with
and metas
, you can simply write:
# encoder.yml
!AwesomeExecutor
Or even not using this YAML but simply write:
import jina.executors.BaseExecutor
a = BaseExecutor.load_config('AwesomeExecutor')
CompoundExecutor
YAML Syntax¶
A compound executor is a set of executors bundled together, as defined in jina.executors.compound
. It follows the syntax above with an additional feature: routing.
!CompoundExecutor
components:
- !NumpyIndexer
with:
num_dim: -1
index_key: HNSW32
index_filename: vec.idx
metas:
name: my_vec_indexer
- !BasePbIndexer
with:
index_filename: chunk.gzip
metas:
name: chunk_meta_indexer
with:
routes:
meta_add:
chunk_meta_indexer: add
meta_query:
chunk_meta_indexer: query
query:
my_vec_indexer: query
add:
my_vec_indexer: add
metas:
name: chunk_compound_indexer
workspace: $TEST_WORKDIR
-
components
¶ A list of executors specified. Note that
metas.name
must be specified if you want to later quote this executor inwith.routes
.
Referencing Variables in Executor
and CompoundExecutor
YAML¶
In the YAML config, one can reference environment variables with $ENV
, or using {path.variable}
to reference the variable defined inside the YAML. For example,
components:
- with:
index_filename: metaproto
metas:
name: test_meta
good_var:
- 1
- 2
bad_var: '{root.metas.name}'
- with:
index_filename: npidx
metas:
name: test_numpy
bad_var: '{root.components[0].metas.good_var[1]}' # expand to the string 'real-compound'
float_var: '{root.float.val}' # expand to the float 0.232
mixed: '{root.float.val}-{root.components[0].metas.good_var[1]}-{root.metas.name}' # expand to the string '0.232-2-real-compound'
mixed_env: '{root.float.val}-$ENV1' # expand to the string '0.232-a'
name_shortcut: '{this.name}' # expand to the string 'test_nunpy'
metas:
name: real-compound
rootvar: 123
float:
val: 0.232
-
root.var
¶ Referring to the top-level variable defined in the root.
-
this.var
¶ Referring to the same-level variable.
Note
One must quote the string when using referenced values, i.e. '{root.metas.name}'
but not {root.metas.name}
.
Driver
YAML Sytanx¶
jina.drivers.Driver
helps the jina.executors
to handle the network traffic by interpreting the traffic data (e.g. Protobuf) into the format that the Executor can understand and handle (e.g. Numpy array). Drivers can be specified using keyword requests and on
!CompoundExecutor
components:
- !Splitter
metas:
name: splitter
- !Sentencizer
with:
min_sent_len: 3
max_sent_len: 128
punct_chars: '.,;!?:'
metas:
name: sentencizer
name: crafter
workspace: $WORKSPACE
metas:
py_modules: splitter.py
requests:
on:
[SearchRequest, IndexRequest]:
- !CraftDriver
with:
executor: splitter
method: craft
- !SegmentDriver
with:
executor: sentencizer
ControlRequest:
- !ControlReqDriver {}
-
requests
¶ -
on
¶ -
request_type
¶ Possible values are
QueryRequest
,IndexRequest
,TrainRequest
, or a list of them.-
!SomeDriverClass
¶ The class of the driver, can be any class inherited from jina.drivers.BaseDriver. Note that it must starts with ! to tell the YAML parser that the section below is describing this class.
-
with
¶ A list of arguments in the
__init__()
function of this driver. One can use environment variables here to expand the variables.
-
metas
¶ A list of meta arguments defined in
jina.executors.metas
.
-
-
-
Note
If no drivers are specified in the yaml file, default drivers defined in executors.requests.* files at jina.resources
wii be used.
Flow
YAML Sytanx¶
jina.flow.Flow
can be loaded from a YAML config file. It follows the following syntax as the example below:
!Flow
with:
sse_logger: true
pods:
chunk_seg:
driver_group: segment
replicas: 3
encode1:
driver_group: index-meta-doc
replicas: 2
needs: chunk_seg
encode2:
driver_group: index-meta-doc
replicas: 2
needs: chunk_seg
join_all:
needs: [encode1, encode2]
A valid Flow specification starts with !Flow
as the first line.
-
with
¶ A list of arguments in the
jina.flow.Flow.__init__()
function
-
pods
¶ A map of
jina.peapods.pod.BasePod
contained in the flow. The key is the name of this pod and the value is a map of arguments accepted by jina pod. One can referneeds
to a pod by its name.
The flows given by the following Python code and the YAML config are identical.
f = (Flow(uses='my-driver.yml')
.add(name='chunk_seg', driver_group='segment',
uses='preprocess/gif2chunk.yml',
replicas=3)
.add(name='doc_idx', driver_group='index-meta-doc',
uses='index/doc.yml')
.add(name='tf_encode', driver_group='encode',
uses='encode/encode.yml',
replicas=3, needs='chunk_seg')
.add(name='chunk_idx', driver_group='index-chunk-and-meta',
uses='index/npvec.yml')
.join(['doc_idx', 'chunk_idx'])
)
!Flow # my-flow.yml
with:
driver_uses: my-driver.yml
pods:
chunk_seg:
driver_group: segment
exec_uses: preprocess/gif2chunk.yml
replicas: 3
doc_idx:
driver_group: index-meta-doc
exec_uses: index/doc.yml
tf_encode:
driver_group: encode
exec_uses: encode/encode.yml
needs: chunk_seg
replicas: 3
chunk_idx:
driver_group: index-chunk-and-meta
exec_uses: index/npvec.yml
join_all:
driver_group: merge
needs: [doc_idx, chunk_idx]
from jina.flow import Flow
g = Flow.load_config('my-flow.yml')
assert(f==g) # return True
Note that you can replace the value of replicas
with an environment variables $REPLICAS
in the YAML and it will be expanded during load_config()
.