Jina Cloud Benchmark & Deployment

Purpose: Showcase how to scale Jina and how to deploy to AWS


Infrastructure

This example benchmark prioritizes a real-world scenario, where a user hosts Jina on the cloud. We use six machines in AWS, of instance size c5.2xlarge (8 vCPUs, 16 GB RAM). We also add a 100 GB ssd. For the vector indexer machine we increased RAM to 32 GB.

The instances are split as follows:

  • Client: from this machine we create the Flows and issue the client requests (indexing and querying)

  • Flow: this machine hosts the Gateway to the Flow

  • Encoder: this machine processes the raw data (text or image) with the respective encoders

  • Indexer: this machine hosts the indexers

  • Ranker: this machine hosts the Redis servers and the rankers

Datasets

To run these tests, we prepared two scenarios: text and image search. These are the most common use cases for neural search.

Note that in these tests, the quality of the results is not assessed, and therefore only the amount of returned matches is validated.

Configuration

The configuration parameters of this experiment can be found in the .env file. These values are set as environment variables and replaced in the YAML files, configuring the flow and the pods.

The parameters are organized in 4 groups:

  • Infrastructure parameters: Configure what the different machines are.

  • Sharding parameters: Configure the parallelization and sharding of different components

  • Functional/indexers parameters: Configure the functional parameters that can affect the performance of the search

  • Client parameters: Configure how the client connects to the Flow

Sharding and scheduling parameters

Jina allows for parallelization of data processing. You can read an overview of these options here. We use these to parallelize the various Pods. On the same topic, we also configure the scheduling strategy.

Functional/indexers parameters

As part of the benchmark, we also compare the performance of the NumpyIndexer with the advanced indexers, based on Annoy and Faiss. We also provide specific arguments to these: num_tress for Annoy; index_key for Faiss.

Client

In order to simulate a real-world scenarios, we also test using multiple concurrent clients. This is achieved using the Python multiprocessing library.

In order to stress-test the system, we also issue index and query requests for several hours in a row. Each request contains a specific number of Documents, split across requests of the same request_size.

The client also sets the TOP_K parameter. This limits the number of matches returned by the Flow. This also affects performance.

Experiments

NOTE For now we only provide results for the image dataset. The text dataset will work too, if you want to test that.

For computing an estimated throughput we used the following configuration:

  • Encoder: parallel: 2

  • Vector Indexer: shards: 8

  • key-value Indexer: parallel: 2

  • scheduling: load_balance

See full environment configuration (the .env file):

Click to expand
##Infrastructure parameters
JINA_ENCODER_HOST=encoder
JINA_RANKER_HOST=ranker
JINA_REDIS_INDEXER_HOST=ranker
JINA_VEC_INDEXER_HOST=vector

##Flow parameters
FLOW_HOST=flow_host
FLOW_PORT=8000

##Sharding/Performance parameters
JINA_SHARDS_ENCODER=2
JINA_SHARDS_INDEXERS=8
JINA_SHARDS_REDIS=2
OMP_NUM_THREADS=1
SCHEDULING=load_balance

##Functional/Indexers parameters
JINA_ENCODER_DRIVER_BATCHING=16
JINA_DISTANCE_REVERSE=False
JINA_FAISS_IMAGE=docker://jinahub/pod.indexer.faissindexer:0.0.15-0.9.33
JINA_ANNOY_IMAGE=docker://jinahub/pod.indexer.annoyindexer:0.0.16-0.9.33
JINA_FAISS_INDEX_KEY='IVF50,Flat'
JINA_ANNOY_NUM_TREES=100
JINA_ANNOY_SEARCH_K=-1

##Client/run parameters
TOP_K=50
#Number of documents a client will try to index at every connection
DOCS_INDEX=1000
#Number of documents a client will try to query at every connection
DOCS_QUERY=1000
PYTHON_EXEC=python3
DATASET=image
#Number of seconds for which clients will try to index documents. (The time is checked after each cycle of indexing `DOCS_INDEX`)
TIME_LOAD_INDEX=18000
#Number of seconds for which clients will try to query documents. (The time is checked after each cycle of indexing `DOCS_QUERY`)
TIME_LOAD_QUERY=3600
#Number of documents every request will contain
REQ_SIZE=50
#Number of concurrent clients indexing
CONCURRENCY_INDEX=5
#Number of concurrent clients querying
CONCURRENCY_QUERY=1
SLEEP_TIME=10

We provide two experiments, depending on the key-value store we employ.

Experiment 1 - Redis

In this example we use redis as a key-value store.

Results

Operation Vector Indexer KV Indexer Time (h) Documents QPS
Index NumpyIndexer Redis 5 1448000 80.22
Query Faiss (index_key='IVF50,Flat') Redis 1 22000 5.89
Query Annoy (trees=100, k=-1) Redis 1 23000 6.14
Query NumpyIndexer Redis 1 8000 2.07

Experiment 2 - BinaryPbIndexer

In this example we use the BinaryPbIndexer as a key-value store, instead of redis.

Results

Operation Vector Indexer KV Indexer Time (h) Documents QPS
Index NumpyIndexer BinaryPb 5 1283000 71.05
Query Faiss (index_key='IVF50,Flat') BinaryPb 1 108000 29.77
Query Annoy (trees=100, k=-1) BinaryPb 1 113000 31.12
Query NumpyIndexer BinaryPb 1 9000 2.36

Test it yourself

To reproduce these benchmarks, follow this guide