Most transient errors can be attributed to network issues between the client and target server or between a server’s dependencies like a database. The errors can be:
ignored if an operation produced by a generator or sequence of operations isn’t relevant to the overall success.
retried up to a certain limit which assumes that the recovery logic kicks in to repair transient errors.
accept that the operation cannot be successfully completed.
Transient fault handling with retries#
post() method accepts
backoff_multiplier parameters to control the capacity to retry requests when a transient connectivity error
occurs, using an exponential backoff strategy.
This can help to overcome transient network connectivity issues which are broadly captured by the
max_attempts parameter determines the number of sending attempts, including the original request.
backoff_multiplier parameters determine the randomized delay in seconds
before retry attempts.
The initial retry attempt will occur at
initial_backoff. In general, the n-th attempt will occur
random(0, min(initial_backoff*backoff_multiplier**(n-1), max_backoff)).
Handling gRPC retries for streaming and unary RPC methods#
post() method supports the
stream boolean parameter (defaults to
the gRPC server side streaming RPC method will be invoked. If set to
False, the server side unary RPC method will
be invoked. Some important implication of
using retries with gRPC are:
The built-in gRPC retries are limited in scope and are implemented to work under certain circumstances. More details are specified in the design document.
streamparameter is set to True and if the
inputsparameters is a
Iterable, the retry must be handled as below because the result must be consumed to check for errors in the stream of responses. The gRPC service retry is still configured but cannot be guaranteed.
from jina import Client from dorcarray import BaseDoc from jina.clients.base.retry import wait_or_raise_err from jina.helper import run_async client = Client(host='grpc://localhost:12345') max_attempts = 5 initial_backoff = 0.8 backoff_multiplier = 1.5 max_backoff = 5 def input_generator(): for _ in range(10): yield BaseDoc() for attempt in range(1, max_attempts + 1): try: response = client.post( '/', inputs=input_generator(), request_size=2, timeout=0.5, ) assert len(response) == 1 except ConnectionError as err: run_async( wait_or_raise_err, attempt=attempt, err=err, max_attempts=max_attempts, backoff_multiplier=backoff_multiplier, initial_backoff=initial_backoff, max_backoff=max_backoff, )
streamparameter is set to True and the
inputsparameter is a
DocList, the retry is handled internally on the
streamparameter is set to False, the
post()method invokes the unary RPC method and the retry is handled internally.
The retry parameters
max_backoff of the
post() method will be used to set the gRPC retry service options. This improves the chances of success if the gRPC retry conditions are met.
Continue streaming when an Executor error occurs#
post() accepts a
continue_on_error parameter. When set to
True, the Client
will keep trying to send the remaining requests. The
continue_on_error parameter will only apply
to Exceptions caused by an Executor, but in case of network connectivity issues, an Exception will be raised.
continue_on_error parameter handles the errors that are returned by the Executor as part of its response. The
errors can be logical errors that might be raised
during the execution of the operation. This doesn’t include transient errors represented by
InternalNetworkError triggered during the Gateway and Executor communication.
retries parameter of the Gateway control the number of retries for the transient errors that arise between the
Gateway and Executor communication.
Refer to Network Errors section for more information.
Retries with a large inputs or long-running operations#
When using the gRPC client, it is recommended to set the
stream parameter to False so that the unary RPC is invoked by
which performs the retry internally with the request from the
inputs iterator or generator. The
parameter must also be set to perform smaller operations which can be retried without much overhead on the server.
The HTTP and WebSocket
Refer to Callbacks section for dealing with success and failures after retries.