Skip to main content

OTEL-017: Exporter missing retry_on_failure/sending_queue

Severity: warn (advisory)

Rule Details

Any exporter that pushes data over the network will eventually see a transient failure — TLS handshake timeout, backend restart, rate limit, DNS blip. Without retry_on_failure or sending_queue the failed batch is dropped on the floor. Pull-based exporters (prometheus, prometheusremotewrite) and diagnostic ones (debug, logging) do not need this; everything else does.

This rule fires when an exporter whose base type is not pull-based has neither retry_on_failure nor sending_queue configured.

Exporter-specific alternative retry mechanisms

Some exporters implement their own retry mechanism instead of the standard retry_on_failure/sending_queue fields. The rule recognises these alternatives and does not fire when they are configured:

ExporterAlternative fieldNotes
awsemfmax_retriesAWS CloudWatch EMF exporter (only exposes max_retries via the AWS SDK)
awsxraymax_retriesAWS X-Ray exporter (only exposes max_retries via the AWS SDK)
awscloudwatchlogsmax_retriesAWS CloudWatch Logs exporter — also supports the standard fields; either satisfies the rule
awss3s3uploader.retry_max_attempts / retry_modeAWS S3 exporter has no top-level retry_on_failure; SDK retry is active unless s3uploader.retry_mode: nop

Options

This rule has no options. The set of pull-based exporter types (debug, logging, prometheus, prometheusremotewrite) is exempted inside the policy.

Examples

Avoid
exporters:
otlp/backend:
endpoint: backend:4317
# no retry_on_failure or sending_queue
Prefer
exporters:
otlp/backend:
endpoint: backend:4317
retry_on_failure:
enabled: true
max_elapsed_time: 300s
sending_queue:
enabled: true
num_consumers: 10
queue_size: 5000
Prefer — AWS CloudWatch EMF / X-Ray

The awsemf and awsxray exporters use max_retries as their own retry mechanism (they do not expose retry_on_failure or sending_queue). Configuring it satisfies the rule:

exporters:
awsemf/production:
region: us-east-1
log_group_name: /ecs/otel
max_retries: 5
awsxray:
region: us-east-1
max_retries: 3
Prefer — AWS S3

The awss3 exporter has no top-level retry_on_failure; retries are delegated to the AWS SDK via nested s3uploader settings. Any of the following satisfies the rule:

exporters:
awss3:
s3uploader:
region: us-east-1
s3_bucket: my-bucket
retry_max_attempts: 5 # positive attempts => rule satisfied
# retry_mode: standard # anything other than "nop" also satisfies

Setting s3uploader.retry_mode: nop explicitly disables SDK retry — in that case the rule will fire, and you should add sending_queue instead.

When Not To Use It

Never for production. Leave enabled so any forgotten retry/queue configuration is flagged.

  • OTEL-048sending_queue explicitly disabled
  • OTEL-049sending_queue.queue_size below 10
  • OTEL-050sending_queue.queue_size above 50000
  • OTEL-053 — retry max_elapsed_time set to 0
  • OTEL-065sending_queue without persistent storage

Version

Available since augur v0.1.0.

Further Reading

Resources