JCloud extends Jina’s Flow YAML specification by introducing the special field
jcloud. This lets you define resources and scaling policies for each Executor and gateway.
Here’s a Flow with two Executors that have specific resource needs.
indexer demands 10G
ebs disk, whereas
encoder demands two cores, 8G RAM and two dedicated GPUs.
jtype: Flow executors: - name: encoder uses: jinahub+docker://Encoder jcloud: resources: cpu: 2 memory: 8G gpu: 2 - name: indexer uses: jinahub+docker://Indexer jcloud: resources: storage: type: ebs size: 10G
Allocate Executor resources#
Since each Executor has its own business logic, it may require different cloud resources. One Executor might need more RAM, whereas another might need a bigger disk.
In JCloud, you can pass highly customizable, finely-grained resource requests for each Executor using the
jcloud.resources argument in your Flow YAML.
0.1 (1/10 of a core) CPU is allocated to each Executor. You can use the
cpu argument under
resources to change it.
JCloud offers the general Intel Xeon processor (Skylake 8175M or Cascade Lake 8259CL) by default.
Maximum of 16 cores is allowed per Executor.
jtype: Flow executors: - name: executor1 uses: jinahub+docker://Executor1 jcloud: resources: cpu: 0.5
JCloud supports GPU workloads with two different usages:
If GPU is enabled, JCloud will provide NVIDIA A10G Tensor Core GPUs with 24G memory for workloads in both usage types.
When using GPU resources, it may take a few extra minutes before all Executors are ready to serve traffic.
An Executor using a
shared GPU shares this GPU with up to four other Executors.
This enables time-slicing, which allows workloads that land on oversubscribed GPUs to interleave with one another.
jtype: Flow executors: - name: executor1 uses: jinahub+docker://Executor1 jcloud: resources: gpu: shared
The tradeoffs with a
shared GPU are increased latency, jitter, and potential out-of-memory (OOM) conditions when many different applications are time-slicing on the GPU. If your application is memory consuming, we suggest using a dedicated GPU.
Using a dedicated GPU is the default way to provision GPU for an Executor. This automatically creates nodes or assigns the Executor to land on a GPU node. In this case, the Executor owns the whole GPU. You can assign between 1 and 4 GPUs.
jtype: Flow executors: - name: executor1 uses: jinahub+docker://Executor1 jcloud: resources: gpu: 2
Spot vs on-demand instance#
For cost optimization, JCloud tries to deploy all Executors on
spot capacity. This is ideal for stateless Executors, which can withstand interruptions and restarts. It is recommended to use
on-demand capacity for stateful Executors (e.g. indexers) however.
jtype: Flow executors: - name: executor1 uses: jinahub+docker://Executor1 jcloud: capacity: on-demand
100M of RAM is allocated to each Executor. You can use the
memory argument under
resources to change it.
Maximum of 16G RAM is allowed per Executor.
jtype: Flow executors: - name: executor1 uses: jinahub+docker://Executor1 jcloud: resources: memory: 8G
By default, we attach an
efs to all Executors in a Flow. The benefits of doing so are:
It can grow dynamically, so you don’t need to shrink/grow volumes as and when necessary.
All Executors in the Flow can share a disk.
The same disk can also be shared with another Flow by passing a workspace-id while deploying a Flow.
jc deploy flow.yml --workspace-id <prev-flow-id>
If your Executor needs high IO, you can use
ebs instead. Please note that:
The disk cannot be shared with other Executors or Flows.
You must pass a storage size parameter (default:
jtype: Flow executors: - name: executor1 uses: jinahub+docker://Executor1 jcloud: resources: storage: type: ebs size: 10G - name: executor2 uses: jinahub+docker://Executor2 jcloud: resources: storage: type: efs
Scale out Executors#
On JCloud, demand-based autoscaling functionality is naturally offered thanks to the underlying Kubernetes architecture. This means that you can maintain serverless deployments in a cost-effective way with no headache of setting the right number of replicas anymore!
The easiest way to scale out your Executor is to use a Serverless Executor. This can be enabled by using
jinahub+serverless:// instead of
jinahub+docker:// in Executor’s
uses, such as:
jtype: Flow executors: - name: executor1 uses: jinahub+serverless://Executor1
JCloud autoscaling leverages Knative behind the scenes, and
jinahub+serverless uses a set of Knative configurations as defaults.
For more information about the Knative autoscaling configurations, please visit Knative autoscaling.
jinahub+serverless:// doesn’t meet your requirements, you can further customize autoscaling configurations by using the
autoscale argument on a per-Executor basis in the Flow YAML, such as:
jtype: Flow executors: - name: executor1 uses: jinahub+docker://Executor1 jcloud: autoscale: min: 1 max: 2 metric: rps target: 50
Below are the defaults and requirements for the configurations:
Minimum number of replicas (
int, up to 5
Maximum number of replicas
Metric for scaling
Target number after which replicas autoscale
After JCloud deployment using the autoscaling configuration, the Flow serving part is just the same; the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you should only worry about the code!
JCloud provides support Ingress gateways to expose your Flows to the public internet with TLS.
In JCloud. We use Let’s Encrypt for TLS.
The JCloud gateway is different from Jina’s gateway. In JCloud, a gateway works as a proxy to distribute internet traffic between Flows, each of which has a Jina gateway (which is responsible for managing external gRPC/HTTP/Websocket traffic to your Executors)
By default, the JCloud gateway will close connections that have been idle for over 600 seconds. If you want longer a connection timeout threshold, you can change the
timeout parameter under
jtype: Flow jcloud: gateway: timeout: 600 executors: - name: executor1 uses: jinahub+docker://Executor1
Control gateway resources#
If you’d like to customize the gateway’s CPU or memory, you can specify the
cpu argument under
jtype: Flow jcloud: gateway: resources: memory: 800M cpu: 0.4 executors: - name: encoder uses: jinahub+docker://Encoder
A Flow deployment without a gateway is often used for External Executors, which can be shared over different Flows. You can disable a gateway by setting
jtype: Flow jcloud: expose_gateway: false executors: - name: custom uses: jinahub+docker://CustomExecutor
You can also deploy and expose multiple external Executors:
jtype: Flow jcloud: expose_gateway: false executors: - name: custom1 uses: jinahub+docker://CustomExecutor1 - name: custom2 uses: jinahub+docker://CustomExecutor2
Other deployment options#
Specify Jina version#
To control Jina’s version while deploying a Flow to
jcloud, you can pass the
version argument in the Flow YAML:
jtype: Flow jcloud: version: 3.4.11 executors: - name: executor1 uses: jinahub+docker://Executor1
You can use
labels (as key-value pairs) to attach metadata to your Flows:
jtype: Flow jcloud: labels: username: johndoe app: fashion-search executors: - name: executor1 uses: jinahub+docker://Executor1
labels have the following restrictions:
Must be 63 characters or fewer.
Must begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between.
The following keys are skipped if passed in the Flow YAML.