# Guideline When Adding New Executor¶

New deep learning model? New indexing algorithm? When the existing executors/drivers do not fit your requirement, and you can not find a useful one from Jina Hub, you can simply extend Jina to what you need without even touching the Jina codebase.

In this chapter, we will show you the guideline of making an extension for a jina.executors.BaseExecutor. Generally speaking, the steps are the following:

1. Decide which Executor class to inherit from;

2. Override __init__() and post_init();

3. Override the core method of the base class;

4. (Optional) implement the save logic.

## Decide which Executor class to inherit from¶

The list of executors supported by the current Jina can be found here. As one can see, all executors are inherited from jina.executors.BaseExecutor. So do you want to inherit directly from BaseExecutor for your extension as well? In general you don’t. Rule of thumb, you always pick the executor that shares the similar logic to inherit.

If your algorithm is so unique and does not fit any any of the category below, you may want to submit an issue for discussion before you start.

Note

Inherit from class X when …

• jina.executors.encoders.BaseEncoder

You want to represent the chunks as vector embeddings.

• jina.executors.indexers.BaseIndexer

You want to save and retrieve vectors and key-value information from storage.

• jina.executors.craters.BaseCrafter

You want to segment/transform the documents and chunks.

• jina.executors.craters.BaseDocCrafter

You want to transform the documents by modifying some fields.

• jina.executors.craters.BaseChunkCrafter

You want to transform the chunks by modifying some fields.

• jina.executors.craters.BaseSegmenter

You want to segment the documents into chunks.

• jina.executors.Chunk2DocRanker

You want to segment/transform the documents and chunks.

• jina.executors.CompoundExecutor

You want to combine multiple executors in one.

• jina.executors.BaseClassifier

You want to enrich the documents and chunks with a classifer.

## Override __init__() and post_init()¶

### Override __init__()¶

You can put simple type attributes that define the behavior of your Executor into __init__(). Simple types represent all pickle-able types, including: integer, bool, string, tuple of simple types, list of simple types, map of simple type. For example,

from jina.executors.crafters import BaseSegmenter

class GifPreprocessor(BaseSegmenter):
def __init__(self, img_shape: int = 96, every_k_frame: int = 1, max_frame: int = None, from_bytes: bool = False, *args, **kwargs):
super().__init__(*args, **kwargs)
self.img_shape = img_shape
self.every_k_frame = every_k_frame
self.max_frame = max_frame
self.from_bytes = from_bytes


Remember to add super().__init__(*args, **kwargs) to your __init__(). Only in this way you can enjoy many magic features, e.g. YAML support, persistence from the base class (and BaseExecutor).

Note

All attributes declared in __init__() will be persisted during save() and load().

### Override post_init()¶

So what if the data you need to load is not in simple type. For example, a deep learning graph, a big pretrained model, a gRPC stub, a tensorflow session, a thread? The you can put them into post_init().

Another scenario is when you know there is a better persistence method other than pickle. For example, your hyperparameters matrix in numpy ndarray is certainly pickable. However, one can simply read and write it via standard file IO, and it is likely more efficient than pickle. In this case, you do the data loading in post_init().

Here is a good example.

from jina.executors.encoders import BaseTextEncoder

def __init__(self,
model_name: str = 'ernie_tiny',
max_length: int = 128,
*args,
**kwargs):
super().__init__(*args, **kwargs)
self.model_name = model_name
self.max_length = max_length

def post_init(self):
self.model = hub.Module(name=self.model_name)
self.model.MAX_SEQ_LEN = self.max_length


Note

post_init() is also a good place to introduce package dependency, e.g. import x or from x import y. Naively, one can always put all imports upfront at the top of the file. However, this will throw an ModuleNotFound exception when this package is not installed locally. Sometimes it may break the whole system because of this one missing dependency.

Rule of thumb, only import packages where you really need them. Often these dependencies are only required in post_init() and the core method, which we shall see later.

## Override the core method of the base class¶

Each Executor has a core method, which defines the algorithmic behavior of the Executor. For making your own extension, you have to override the core method. The following table lists the core method you may want to override. Note some executors may have multiple core methods.

 Base class Core method(s) BaseEncoder encode() BaseCrafter craft() BaseIndexer add(), query() BaseRanker score() BaseClassifier predict() BaseEvaluator evaluate()

Feel free to override other methods/properties as you need. But frankly, most of the extension can be done by simply overriding the core methods listed above. Nothing more. You can read the source code of our executors for details.

## Implement the persistence logic¶

If you don’t override post_init(), then you don’t need to implement persistence logic. You get YAML and persistency support off-the-shelf because of BaseExecutor. Simple crafters and rankers fall into this category.

If you override post_init() but you don’t care about persisting its state in the next run (when the executor process is restarted); or the state is simply unchanged during the run, then you don’t need to implement persistence logic. Loading from a fixed pretrained deep learning model falls into this category.

Persistence logic is only required when you implement customized loading logic in :meth:post_init and the state is changed during the run. Then you need to override __getstate__(). Many of the indexers fall into this category.

In the example below, the tokenizer is loaded in post_init() and saved in __getstate__(), whcih completes the persistency cycle.

class CustomizedEncoder(BaseEncoder):

def post_init(self):
self.tokenizer = tokenizer_dict[self.model_name].from_pretrained(self._tmp_model_path)

def __getstate__(self):
self.tokenizer.save_pretrained(self.model_abspath)
return super().__getstate__()


## How Can I Use My Extension¶

You can use the extension by specifying py_modules in the YAML file. For example, your extension Python file is called my_encoder.py, which describes MyEncoder. Then you can define a YAML file (say my.yml) as follows:

!MyEncoder
with:
greetings: hello im external encoder
metas:
py_modules: my_encoder.py


Note

You can also assign a list of files to metas.py_modules if your Python logic is splitted over multiple files. This YAML file and all Python extension files should be put under the same directory.

Then simply use it in Jina CLI by specifying jina pod --uses=my.yml, or Flow().add(uses='my.yml') in Flow API.

Warning

If you use customized executor inside a jina.executors.CompoundExecutor, then you only need to set metas.py_modules at the root level, not at the sub-component level.

## I Want to Contribute it to Jina¶

We are really glad to hear that! We have done quite some effort to help you contribute and share your extensions with others.

You can easily pack your extension and share it with others via Docker image. For more information, please check out Jina Hub. Just make a pull request there and our CICD system will take care of building, testing and uploading.