YAML specification#

Built on top of Flow YAML specification, JCloud YAML extends it by introducing a special field jcloud. With it, one can define resources and scaling policies for each Executor and Gateway.

Here’s a Flow with 2 Executors with specific resource needs. indexer demands for 10G ebs disk, whereas encoder demands for 2 cores, 8G RAM & 2 dedicated GPUs.

jtype: Flow
executors:
  - name: indexer
    uses: jinahub+docker://Indexer
    jcloud:
      resources:
        storage: 
          type: ebs
          size: 10G
  - name: encoder
    uses: jinahub+docker://Encoder
    jcloud:
      resources:
        cpu: 2
        memory: 8G
        gpu: 2

Allocate resources for Executors#

Since each Executor has its own business logic, it might require different Cloud resources. One might need a higher RAM, whereas another might need a bigger disk.

In JCloud, we allow users to pass highly customizable, fine-grained resource requests for each Executor using jcloud.resources argument in your Flow YAML.

CPU#

By default, 0.1 (1/10 of a core) CPU is allocated to each Executor. You can use cpu arg under resources to customise it.

JCloud offers the general Intel Xeon processor (Skylake 8175M or Cascade Lake 8259CL) by default.

Note

Maximum of 16 cores is allowed per Executor.

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+docker://Executor1
    jcloud:
      resources:
        cpu: 0.5

GPU#

JCloud supports GPU workloads with two different usages: shared or dedicated.

If GPU is enabled, JCloud will provide NVIDIA A10G Tensor Core GPUs with 24G memory for workloads in both usage types.

Note

When using GPU resources, it may take few extra mins until all Executors ready to serve traffic.

Shared GPU#

An executor using a shared GPU shares this GPU with up to 4 other Executors. This enables a time-slicing, which allows workloads that land on oversubscribed GPUs to interleave with one another.

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+docker://Executor1
    jcloud:
      resources:
        gpu: shared

Caution

The tradeoffs with shared GPU are increased latency, jitter, and potential out-of-memory (OOM) conditions when many different applications are time-slicing on the GPU. If your application is memory consuming, we suggest using a dedicated GPU.

Dedicated GPU#

Using a dedicated GPU is the default way to provision GPU for the Executor. This will automatically create nodes or assign the Executor to land on a GPU node. In this case, executor owns the whole GPU. You can assign between 1 and 4 GPUs.

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+docker://Executor1
    jcloud:
      resources:
        gpu: 2

Spot vs on-demand instance#

For cost optimization, jcloud tries to deploy all Executors on spot capacity. These are ideal for stateless Executors, which can withstand interruptions & restarts. It is recommended to use on-demand capacity for stateful Executors (e.g.- indexers) though.

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+docker://Executor1
    jcloud:
      capacity: on-demand

Memory#

By default, 100M of RAM is allocated to each Executor. You can use memory arg under resources to customise it.

Note

Maximum of 16G RAM is allowed per Executor.

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+docker://Executor1
    jcloud:
      resources:
        memory: 8G

Storage#

JCloud supports 2 kinds of Storage types efs (default) and ebs. The former one is a network file storage, whereas the latter is a block device.

Note

By default, we attach an efs to all the Executors in a Flow. The benefits of doing so are

It can grow in size dynamically, so you don’t need to shrink/grow volumes as & when necessary.
All Executors in the Flow can share a disk.
The same disk can also be shared with another Flow by passing a workspace-id while deploying a Flow.

jc deploy flow.yml --workspace-id <prev-flow-id>

If your Executor needs high IO, you can use ebs instead. Please note that,

The disk cannot be shared with other Executors / Flows.
You must pass a size of storage (default: 1G, max 10G).

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+docker://Executor1
    jcloud:
      resources:
        storage:
          type: ebs
          size: 10G
  - name: executor2
    uses: jinahub+docker://Executor2
    jcloud:
      resources:
        storage:
          type: efs

Scale out Executors#

On JCloud, demand-based autoscaling functionality is naturally offered thanks to the underlying Kubernetes architecture. This means that you can maintain serverless deployments in a cost-effective way with no headache of setting the right number of replicas anymore!

Autoscaling with `jinahub+serveless://`#

The easiest way to scale out your Executor is to use Serverless Executor. This can be enabled by simply use jinahub+serverless:// instead of jinahub+docker:// in Executor’s uses, such as:

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+serverless://Executor1

JCloud autoscaling leverages Knative behind the scenes, and jinahub+serverless uses a set of Knative configurations as defaults.

Note

For more information about the Knative Autoscaling configurations, please visit Knative Autoscaling.

Scale-out manually#

If jinahub+serverless:// doesn’t meet your requirements, you can further customize Autoscaling configurations by using the autoscale argument on a per-Executor basis in the Flow YAML, such as:

jtype: Flow
executors:
  - name: executor1
    uses: jinahub+docker://Executor1
    jcloud:
      autoscale:
        min: 1
        max: 2
        metric: rps
        target: 50

Below are the defaults and requirements for the configurations:

Name	Default	Allowed	Description
min	1	int	Minimum number of replicas (0 means serverless)
max	2	int, up to 5	Maximum number of replicas
metric	concurrency	`concurrency` / `rps`	Metric for scaling
target	100	int	Target number after which replicas autoscale

After JCloud deployment using the Autoscaling configurations, the Flow serving part is just the same; the only difference you would probably notice is it may take extra seconds to handle the initial requests since it may need to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you should only worry about the code!

Config Gateway#

To expose users’ Flows to the public Internet with TLS, JCloud provides support Ingress Gateways.

In JCloud. We use Let’s Encrypt for TLS.

Note

The JCloud gateway is different from Jina’s Gateway. In JCloud, a gateway works as a proxy to distribute internet traffic between Flows, each of which has a Jina Gateway (which is responsible to manage external gRPC/HTTP/Websocket traffic to your Executors)

Set timeout#

By default, JCloud gateway will close connections that have been idle for over 600 seconds. If you want longer connection timeout threshold, you can consider changing the timeout parameter in gateway.

jtype: Flow
jcloud:
  gateway:
    timeout: 600
executors:
  - name: executor1
    uses: jinahub+docker://Executor1

Control resources of the Gateway#

If you’d like to customize the Gateway’s CPU or memory, memory / cpu arg needs to be specified under jcloud.gateway.resources as follows:

jtype: Flow
jcloud:
  gateway:
    resources:
      memory: 800M
      cpu: 0.4
executors:
  - name: encoder
    uses: jinahub+docker://Encoder

Disable Gateway#

A Flow deployment without a Gateway is often used as External Executors, which can be shared over different Flows. One can disable Gateway by setting expose_gateway: false:

jtype: Flow
jcloud:
  expose_gateway: false
executors:
  - name: custom
    uses: jinahub+docker://CustomExecutor

You can also deploy & expose multiple External Executors.

jtype: Flow
jcloud:
  expose_gateway: false
executors:
  - name: custom1
    uses: jinahub+docker://CustomExecutor1
  - name: custom2
    uses: jinahub+docker://CustomExecutor2

../../../_images/external-executors-multiple.png

Other deployment options#

Specify Jina version#

To control Jina’s version while deploying a Flow to jcloud, you can pass version arg in the Flow yaml.

jtype: Flow
jcloud:
  version: 3.4.11
executors:
  - name: executor1
    uses: jinahub+docker://Executor1

Lifetime#

A Flow that receives no traffic in 24 hours will be automatically deleted by default.

To ignore the lifetime reclaim policy of a Flow, you can use the retention_days parameter in the Flow yaml. retention_days will keep the flow alive for x days (0<x<365). flows is going to be removed after x days regardless of above reclaim policy. -1 is to keep the flow alive regardless of the reclaim policy.

Note

If retention-days argument configured as x (0<x<365). Flows will be removed after retention-days, regradless of the usage.
If retention-days argument configured as -1. Flows will not be removed, regradless of the usage.
If retention-days argument not configured, or set to 0. We will detect if flows are idle daily, they will be terminated if they are not serving requests for the last 24hrs.

jtype: Flow
jcloud:
  retention_days: 7
executors:
  - name: executor1
    uses: jinahub+docker://Executor1

YAML specification#

Allocate resources for Executors#

CPU#

GPU#

Shared GPU#

Dedicated GPU#

Spot vs on-demand instance#

Memory#

Storage#

Scale out Executors#

Autoscaling with jinahub+serveless://#

Scale-out manually#

Config Gateway#

Set timeout#

Control resources of the Gateway#

Disable Gateway#

Other deployment options#

Specify Jina version#

Lifetime#

Autoscaling with `jinahub+serveless://`#