OpenTelemetry Support#

Hint

Prometheus-only based metrics collection will soon be deprecated. Refer to Monitor with Prometheus and Grafana for the old setup.

There are two major setups required to visualize/monitor your application’s signals using OpenTelemetry. The first setup is covered by Jina which integrates the OpenTelemetry API and SDK at the application level. The Flow Instrumentation page covers in detail the steps required to enable OpenTelemetry in a Flow. A Client can also be instrumented which is documented in the Client Instrumentation section.

This section covers the OpenTelemetry infrastructure setup required to collect, store and visualize the traces and metrics data exported by the Pods. This setup is the user’s responsibility, and this section only serves as the initial/introductory guide to running OpenTelemetry infrastructure components.

Since OpenTelemetry is open source and is mostly responsible for the API standards and specification, various providers implement the specification. This section follows the default recommendations from the OpenTelemetry documentation that also fits into the Jina implementations.

Exporting traces and metrics data#

Pods created using a Flow with tracing or metrics enabled use the SDK Exporters to send the data to a central Collector component. You can use this collector to further process and store the data for visualization and alerting.

The push/export-based mechanism also allows the application to start pushing data immediately on startup. This differs from the pull-based mechanism where you need a separate scraping registry to discovery service to identify data scraping targets.

You can configure the exporter backend host and port using the traces_exporter_host, traces_exporter_port, metrics_exporter_host and metrics_exporter_port. Even though the Collector is metric data-type agnostic (it accepts any type of OpenTelemetry API data model), we provide separate configuration for Tracing and Metrics to give you more flexibility in choosing infrastructure components.

Jina’s default exporter implementation is OTLPSpanExporter and OTLPMetricExporter. The exporters also use the gRPC data transfer protocol. The following environment variables can be used to further configure the exporter client based on your requirements. The full list of exporter related environment variables are documented by the PythonSDK library. Apart from OTEL_EXPORTER_OTLP_PROTOCOL and OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, you can use all other library version specific environment variables to configure the exporter clients.

Collector#

The Collector is a huge ecosystem of components that support features like scraping, collecting, processing and further exporting data to storage backends. The collector itself can also expose endpoints to allow scraping data. We recommend reading the official documentation to understand the the full set of features and configuration required to run a Collector. Read the below section to understand the minimum number of components and the respective configuration required for operating with Jina.

We recommend using the OpenTelemetry Collector from the contrib repository. We also use:

Jaeger for collecting traces, visualizing tracing data and alerting based on tracing data.
Prometheus for collecting metric data and/or alerting.
Grafana for visualizing data from Prometheus/Jaeger and/or alerting based on the data queried.

Hint

Jaeger provides a comprehensive out of the box tools for end-to-end tracing monitoring, visualization and alerting. You can substitute other tools to achieve the necessary goals of observability and performance analysis. The same can be said for Prometheus and Grafana.

Docker Compose#

A minimal docker-compose.yml file can look like:

version: "3"
services:
  # Jaeger
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"

  otel-collector:
    image: otel/opentelemetry-collector:0.61.0
    command: [ "--config=/etc/otel-collector-config.yml" ]
    volumes:
      - ${PWD}/otel-collector-config.yml:/etc/otel-collector-config.yml
    ports:
      - "8888" # Prometheus metrics exposed by the collector
      - "8889" # Prometheus exporter metrics
      - "4317:4317" # OTLP gRPC receiver
    depends_on:
      - jaeger

  prometheus:
    container_name: prometheus
    image: prom/prometheus:latest
    volumes:
      - ${PWD}/prometheus-config.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    container_name: grafana
    image: grafana/grafana-oss:latest
    ports:
      - 3000:3000

The corresponding OpenTelemetry Collector configuration below needs to be stored in file otel-collector-config.yml:

receivers:
  otlp:
    protocols:
      grpc:

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true
  
  prometheus:
    endpoint: "0.0.0.0:8889"
    resource_to_telemetry_conversion:
      enabled: true
    # can be used to add additional labels
    const_labels:
      label1: value1

processors:
  batch:

service:
  extensions: []
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [jaeger]
      processors: [batch]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

This setup creates a gRPC Collector Receiver on port 4317 that collects data pushed by the Flow Pods. Collector exporters for Jaeger and Prometheus backends are configured to export tracing and metrics data respectively. The final service section creates a collector pipeline combining the receiver (collect data) and exporter (to backend), process (batching) sub-components.

The minimal Prometheus configuration needs to be stored in prometheus-config.yml.

scrape_configs:
  - job_name: 'otel-collector'
    scrape_interval: 500ms
    static_configs:
      - targets: ['otel-collector:8889']
      - targets: ['otel-collector:8888']

The Prometheus configuration now only needs to scrape from the OpenTelemetry Collector to get all the data from OpenTelemetry Metrics instrumented applications.

Running a Flow locally#

Run the Flow and a sample request that we want to instrument locally. If the backends are running successfully the Flow has exported data to the Collector which can be queried and viewed.

First start a Flow:

from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
import time

class MyExecutor(Executor):
    
    @requests
    def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        time.sleep(0.5)
        return docs

with Flow(
    port=54321,
    tracing=True,
    traces_exporter_host='http://localhost',
    traces_exporter_port=4317,
    metrics=True,
    metrics_exporter_host='http://localhost',
    metrics_exporter_port=4317,
).add(uses=MyExecutor) as f:
    f.block()

Second execute requests using the instrumented jina.Client:

from jina import Client
from docarray import DocList, BaseDoc

client = Client(
    host='grpc://localhost:54321',
    tracing=True,
    traces_exporter_host='http://localhost',
    traces_exporter_port=4317,
)
client.post('/', DocList[BaseDoc]([BaseDoc()]), return_type=DocList[BaseDoc])
client.teardown_instrumentation()

Hint

The jina.Client currently only supports OpenTelemetry Tracing.

Viewing Traces in Jaeger UI#

You can open the Jaeger UI here. You can find more information on the Jaeger UI in the official docs.

Hint

The list of available traces are documented in the Flow Instrumentation section.

Monitor with Prometheus and Grafana#

External entities (like Grafana) can access these aggregated metrics via the PromQL query language, and let users visualize metrics with dashboards. Check out a comprehensive tutorial for more information.

Download a sample Grafana dashboard JSON file and import it into Grafana to get started with some pre-built graphs:

../../_images/grafana-histogram-metrics.png

Hint

A list of available metrics is in the Flow Instrumentation section. To update your existing Prometheus and Grafana configurations, refer to the OpenTelemetry migration guide.