Deploy with Kubernetes#

Deploying a Flow in Kubernetes is the recommended way of using Jina in production.

Since a Flow is composed of Executors which can run in different runtimes depending on how you want to deploy your Flow, Kubernetes can easily take over the lifetime management of Executors.

In general, Jina follows the following principle when it comes to deploying in Kubernetes: You, the user, know your use case and requirements the best. This means that while Jina generates configurations for you that run out of the box, as a professional user you should always see them as just a starting point to get you off the ground.

In this how-to you will go through how to deploy a simple Flow using Kubernetes, how to customize the Kubernetes configuration to your needs, and how to scale Executors using replicas and shards.


Do you know JCloud simplifies Flow deployment and hosting? It saves all the trouble for you so you can focus on what really matters!


To follow along with this how-to, you will need access to a Kubernetes cluster.

You can either set up minikube, or use one of many managed Kubernetes solutions in the cloud:

Deploy a simple Flow#

By simple in this context we mean a Flow without replicated or sharded Executors - you will see how to use those in Kubernetes later on.

For now, define a Flow. You can either do this through the Flow YAML interface or directly in Python, like we do here:

from jina import Flow

f = (
    .add(name='encoder', uses='jinahub+docker://CLIPEncoder')
    .add(name='indexer', uses='jinahub+docker://AnnLiteIndexer', uses_with={'dim': 512})

Here you can essentially define any Flow of your liking. Just make sure that all Executors are containerized, either by using ‘jinahub+docker’, or by {ref}containerizing your local Executors <dockerize-exec>.

The example Flow here simply encodes and indexes text or image data using two Executors from the Jina Hub.

Next, you can generate Kubernetes YAML configs from the Flow. It is good practice to define a new Kubernetes namespace for that purpose:

f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')

You should expect the following file structure to be generated - don’t worry if it is slightly different, there can be changes to this from one Jina version to the other:

└── k8s_flow
    ├── gateway
    │   └── gateway.yml
    └── encoder
    │   └── encoder.yml
    └── indexer
        └── indexer.yml

You can inspect these files to see how Flow concepts are mapped to Kubernetes entities. And as always, feel free to modify these config files as you see fit for your use case.

Caution: Executor YAML configurations

As a general rule, the configuration files produced by to_kubernets_yaml() should run out of the box, and if you strictly follow this how-to they will.

However, there is an exception to this: If you use a local dockerized Executor, and this Executors configuration is stored in a file other than config.yaml, you will have to adapt this Executor’s Kubernetes YAML. To do this, open the file and replace config.yaml with the actual path to the Executor configuration.

The reason for this is that when a Flow is defined by being passed a Docker image, it has no knowledge of what Executor configuration was used to create that image. Since all of our Tutorials use config.yaml for this purpose, the Flow can only use this as a best guess. Please adapt this if you used a differently named Executor configuration file.

Next you can actually apply these configuration files to your cluster, using kubectl. This will launch all Flow microservices.

First, create the namespace you defined earlier:

kubectl create namespace custom-namespace

Now, you can deploy this Flow to you cluster in the following way:

kubectl apply -R -f ./k8s_flow

We can check that the pods were created:

kubectl get pods -n custom-namespace
NAME                              READY   STATUS    RESTARTS   AGE
encoder-8b5575cb9-bh2x8           1/1     Running   0          60m
gateway-7df8765bd9-xf5tf          1/1     Running   0          60m
indexer-8f676fc9d-4fh52           1/1     Running   0          60m
indexer-head-6fcc679d95-8mrm6     1/1     Running   0          60m

Note that the Jina gateway was deployed with name gateway-7df8765bd9-xf5tf.

Once you see that all the Deployments in the Flow are ready, you can start indexing documents.

import portforward

from jina.clients import Client
from docarray import DocumentArray

with portforward.forward('custom-namespace', 'gateway-7df8765bd9-xf5tf', 8080, 8080):
    client = Client(host='localhost', port=8080)
    client.show_progress = True
    docs =
            lambda d: d.convert_uri_to_datauri()

    print(f' Indexed documents: {len(docs)}')

Scaling Executors: Flow with replicas and shards#

Jina supports two ways of scaling:

  • Replicas can be used with any Executor type and is typically used for performance and availability.

  • Shards are used for partitioning data and should only be used with indexers since they store state.

Check here for more information about these scaling mechanisms.

For shards, Jina creates a separate Deployment in Kubernetes per Shard. Setting f.add(..., shards=num_shards) is sufficient to create a corresponding Kubernetes configuration.

For replicas, Jina uses Kubernetes native replica scaling and relies on a service mesh to load balance request between replicas of the same Executor. Without a service mesh installed in your Kubernetes cluster, all the traffic will be routed to the same replica.

See Also

The impossibility of load balancing between different replicas is a limitation of Kubernetes in combination with gRPC. If you want to learn more about this limitation, see this Kubernetes Blog post.

Install a service mesh#

Service meshes work by attaching a tiny proxy to each of your Kubernetes pods, allowing for smart rerouting, load balancing, request retrying, and host of other features.

Jina relies on a service mesh to load balance request between replicas of the same Executor. You can use your favourite Kubernetes service mesh in combination with your Jina Flow, but the configuration files generated by to_kubernetes_config() already include all necessary annotations for the Linkerd service mesh.


You can use any service mesh with Jina, but Jina Kubernetes configurations come with Linkerd annotations out of the box.

To install Linkerd, you first have to install the Linkerd CLI. After that, you install its control plane in your cluster. This is what will automatically set up and manage the service mesh proxies when you deploy your Flow.

Once the Flow is deployed on Kubernetes, you can use all the native Kubernetes tools like kubectl to perform operations on the Pods and Deployments.

You can use this to add or remove replicas, to run rolling update operations, etc …


Many service meshes have the ability to perform retries themselves. Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination with Jina’s own retry policy.

Instead, you may want to disable Jina level retries by setting Flow(retries=0) in Python, or retries: 0 in the Flow YAML with block.

Deploy your Flow with shards and replicas#

After your service mesh is installed, your cluster is ready to run a Flow with scaled Executors. You can adapt your Flow from above to work with two replicas for the encoder, and two shards for the indexer:

from jina import Flow

f = (
    .add(name='encoder', uses='jinahub+docker://CLIPEncoder', replicas=2)
        uses_with={'dim': 512},

Again, you can generate your Kubernetes configurations:

f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')

Now you should see the following file structure:

└── k8s_flow
    ├── gateway
    │   └── gateway.yml
    └── encoder
    │   └─ encoder.yml
    └── indexer
        ├── indexer-0.yml
        ├── indexer-1.yml
        └── indexer-head.yml

And you can apply your configuration like usual:

Hint: Cluster cleanup

If you already have the simple Flow from the first example running on your cluster, make sure to delete it using kubectl delete -R -f ./k8s_flow.

kubectl apply -R -f ./k8s_flow

Scaling the Gateway#

The Gateway is responsible for providing the API of the Flow. If you have a large Flow with many Clients and many replicated Executors, the Gateway can become the bottleneck. In this case you can also scale up the Gateway deployment to be backed by multiple Kubernetes Pods. This is done by the regular means of Kubernetes: Either increase the number of replicas in the generated yaml configuration files or add replicas while running. To expose your Gateway replicas outside Kubernetes, you can add a load balancer as described here.


You can use a custom Docker image for the Gateway deployment. Just set the envrironment variable JINA_GATEWAY_IMAGE to the desired image before generating the configuration.

Extra Kubernetes options#

One could see that you can’t add basic Kubernetes feature like Secrets, ConfigMap or Lables via the pythonic interface. That is intended and it does not mean that we don’t support these features. On the contrary we allow you to fully express your Kubernetes configuration by using the Kubernetes API so that you can add you own Kubernetes standard to jina.


We recommend dumping the Kubernetes configuration files and then editing the files to suit your needs.

Here are possible configuration options you may need to add or change

  • Add labels selectors to the Deployments to suit your case

  • Add requests and limits for the resources of the different Pods

  • Setup persistent volume storage to save your data on disk

  • Pass custom configuration to your Executor with ConfigMap

  • Manage the credentials of your Executor with secrets

  • Edit the default rolling update configuration

Exposing your Flow#

The previous examples use port-forwarding to index documents to the Flow. Thinking about real world applications, you might want to expose your service to make it reachable by the users, so that you can serve search requests


Exposing your Flow only works if the environment of your Kubernetes cluster supports External Loadbalancers.

Once the Flow is deployed, you can expose a service.

kubectl expose deployment gateway --name=gateway-exposed --type LoadBalancer --port 80 --target-port 8080 -n custom-namespace
sleep 60 # wait until the external ip is configured

Export the external ip which is needed for the client in the next section when sending documents to the Flow.

export EXTERNAL_IP=`kubectl get service gateway-exposed -n custom-namespace -o=jsonpath='{.status.loadBalancer.ingress[0].ip}'`


The client sends an image to the exposed Flow on $EXTERNAL_IP and retrieves the matches retrieved from the Flow. Finally, it prints the uri of the closest matches.

You will need to configure your Client to connect to the Flow via the external IP by doing:

import os
from jina.clients import Client

host = os.environ['EXTERNAL_IP']
port = 80

client = Client(host=host, port=port)

client.show_progress = True
docs = DocumentArray.from_files("./imgs/*.png").apply(
    lambda d: d.convert_uri_to_datauri()
queried_docs ="/search", inputs=docs)

matches = queried_docs[0].matches
print(f"Matched documents: {len(matches)}")

Key takeaways#

To put it succinctly, there are just three key takeaways about deploying a Jina Flow using Kubernetes:

  1. Use f.to_kubernetes_yaml() to generate Kubernetes configuration files from a Jina Flow object

  2. Modify the generated files freely - you know better what you need than we do!

  3. To enable replicated Executors, use a service mesh

See also#