Basics#

Tip

Executor uses docarray.DocumentArray as input and output data structure. Please first read DocArray’s docs to get an impression how does it work.

Executor is a self-contained component and performs a group of tasks on a DocumentArray. It encapsulates functions that process DocumentArrays. Inside the Executor, these functions are decorated with @requests. To create an Executor, you only need to follow three principles:

  1. An Executor should subclass directly from the jina.Executor class. Executor can also be a dataclass

  2. An Executor class is a bag of functions with shared state or configuration (via self); it can contain an arbitrary number of functions with arbitrary names.

  3. Functions decorated by requests will be invoked according to their on= endpoint. These functions can be coroutines (async def) or regular functions.

Constructor#

Subclass#

Every new executor should be a subclass of Executor.

You can name your executor class freely.

__init__#

No need to implement __init__ if your Executor does not contain initial states or if it is a dataclass

If your executor has __init__, it needs to carry **kwargs in the signature and call super().__init__(**kwargs) in the body:

from jina import Executor


class MyExecutor(Executor):
    def __init__(self, foo: str, bar: int, **kwargs):
        super().__init__(**kwargs)
        self.bar = bar
        self.foo = foo
from dataclasses import dataclass
from jina import Executor


@dataclass
class MyExecutor(Executor):
    bar: int
    foo: str

What is inside kwargs?

Here, kwargs are reserved for Jina to inject metas and requests (representing the request-to-function mapping) values when the Executor is used inside a Flow. Also when Executor is a dataclass these parameters are injected by Jina as in the regular case when calling super().__init__

You can access the values of these arguments in the __init__ body via self.metas/self.requests/self.runtime_args, or modify their values before passing them to super().__init__().

Destructor#

You might need to execute some logic when your Executor’s destructor is called.

For example, you want to persist data to the disk (e.g. in-memory indexed data, fine-tuned model,…). To do so, you can overwrite the close() method and add your logic.

Jina will make sure that the close() method is executed when the Executor is terminated inside a Flow or when deployed in any cloud-native environment.

You can think of this as Jina using the Executor as a context manager, making sure that the close() method is always executed.

from jina import Executor


class MyExec(Executor):
    def close(self):
        print('closing...')

Attributes#

When implementing an Executor, if your Executor overrides __init__, it needs to carry **kwargs in the signature and call super().__init__(**kwargs)

from jina import Executor


class MyExecutor(Executor):
    def __init__(self, foo: str, bar: int, **kwargs):
        super().__init__(**kwargs)
        self.bar = bar
        self.foo = foo

This is important because when an Executor is instantiated in the context of a Flow, Jina is adding extra arguments. Some of these arguments can be used when developing the internal logic of the Executor.

These special arguments are workspace, requests, metas, runtime_args.

Another alternative, is to declare your Executor as a dataclass. In this case, user does not provide an specific constructor. Then, Jina will inject all these special arguments without the need of the user to call any specific method.

from dataclasses import dataclass
from jina import Executor


@dataclass
class MyExecutor(Executor):
    bar: int
    foo: str

workspace#

Each Executor has a special workspace that is reserved for that specific Executor instance. The .workspace property contains the path to this workspace.

This workspace is based on the workspace passed when adding the Executor: flow.add(..., workspace='path/to/workspace/'). The final workspace is generated by appending '/<executor_name>/<shard_id>/'.

This can be provided to the Executor via the Python or YAML API.

Default workspace

If the user hasn’t provided a workspace, the Executor uses a default workspace, which is defined in the JINA_DEFAULT_WORKSPACE_BASE environment variable.

Caution

After you install jina, the JINA_DEFAULT_WORKSPACE_BASE environment variable will be set in your .bashrc, .zshrc, or .fish file.

To change the default Executor workspace on your system, you can change the value of this environment variable. However, if you directly edit the corresponding command in your .bashrc (or .zshrc/.fish) file, your changes will be reverted the next time you install jina on your system.

Instead, you can add export JINA_DEFAULT_WORKSPACE_BASE=$YOUR_WOKSPACE after the # JINA_CLI_END comment.

requests#

By default, an Executor object contains requests as an attribute when loaded from the Flow. This attribute is a Dict describing the mapping between Executor methods and network endpoints: It holds endpoint strings as keys, and pointers to functions as values.

These can be provided to the Executor via the Python or YAML API.

metas#

An Executor object contains metas as an attribute when loaded from the Flow. It is of SimpleNamespace type and contains some key-value information.

The list of the metas are:

  • name: Name given to the Executor;

  • description: Description of the Executor (optional, reserved for future-use in auto-docs);

  • py_modules: List of Python modules needed to import the Executor. It can be Python package path e.g. foo.bar.package.module or file path to the modules needed to import the Executor.

These can be provided to the Executor via the Python or YAML API.

runtime_args#

By default, an Executor object contains runtime_args as an attribute when loaded from the Flow. It is of SimpleNamespace type and contains information in key-value format. As the name suggests, runtime_args are dynamically determined during runtime, meaning that you don’t know the value before running the Executor. These values are often related to the system/network environment around the Executor, and less about the Executor itself, like shard_id and replicas. They are usually set with the add() method.

The list of the runtime_args is:

  • name: Name given to the Executor. This is dynamically adapted from the name in metas and depends on some additional arguments like shard_id.

  • replicas: Number of replicas of the same Executor deployed with the Flow.

  • shards: Number of shards of the same Executor deployed with the Flow.

  • shard_id: Identifier of the shard corresponding to the given Executor instance.

  • workspace: Path to be used by the Executor. Note that the actual workspace directory used by the Executor is obtained by appending '/<executor_name>/<shard_id>/' to this value.

  • py_modules: Python package path e.g. foo.bar.package.module or file path to the modules needed to import the Executor. This is another way to pass py-modules to the Executor from the Flow

These can not be provided by the user through any API. They are generated by the Flow orchestration.

See further#