OTEL-017: Exporter missing retry_on_failure/sending_queue
Severity: warn (advisory)
Rule Details
Any exporter that pushes data over the network will eventually see a transient failure — TLS handshake timeout, backend restart, rate limit, DNS blip. Without retry_on_failure or sending_queue the failed batch is dropped on the floor. Pull-based exporters (prometheus, prometheusremotewrite) and diagnostic ones (debug, logging) do not need this; everything else does.
This rule fires when an exporter whose base type is not pull-based has neither retry_on_failure nor sending_queue configured.
Exporter-specific alternative retry mechanisms
Some exporters implement their own retry mechanism instead of the standard retry_on_failure/sending_queue fields. The rule recognises these alternatives and does not fire when they are configured:
| Exporter | Alternative field | Notes |
|---|---|---|
awsemf | max_retries | AWS CloudWatch EMF exporter (only exposes max_retries via the AWS SDK) |
awsxray | max_retries | AWS X-Ray exporter (only exposes max_retries via the AWS SDK) |
awscloudwatchlogs | max_retries | AWS CloudWatch Logs exporter — also supports the standard fields; either satisfies the rule |
awss3 | s3uploader.retry_max_attempts / retry_mode | AWS S3 exporter has no top-level retry_on_failure; SDK retry is active unless s3uploader.retry_mode: nop |
Options
This rule has no options. The set of pull-based exporter types (debug, logging, prometheus, prometheusremotewrite) is exempted inside the policy.
Examples
exporters:
otlp/backend:
endpoint: backend:4317
# no retry_on_failure or sending_queue
exporters:
otlp/backend:
endpoint: backend:4317
retry_on_failure:
enabled: true
max_elapsed_time: 300s
sending_queue:
enabled: true
num_consumers: 10
queue_size: 5000
The awsemf and awsxray exporters use max_retries as their own retry mechanism (they do not expose retry_on_failure or sending_queue). Configuring it satisfies the rule:
exporters:
awsemf/production:
region: us-east-1
log_group_name: /ecs/otel
max_retries: 5
awsxray:
region: us-east-1
max_retries: 3
The awss3 exporter has no top-level retry_on_failure; retries are delegated to the AWS SDK via nested s3uploader settings. Any of the following satisfies the rule:
exporters:
awss3:
s3uploader:
region: us-east-1
s3_bucket: my-bucket
retry_max_attempts: 5 # positive attempts => rule satisfied
# retry_mode: standard # anything other than "nop" also satisfies
Setting s3uploader.retry_mode: nop explicitly disables SDK retry — in that case the rule will fire, and you should add sending_queue instead.
When Not To Use It
Never for production. Leave enabled so any forgotten retry/queue configuration is flagged.
Related Rules
- OTEL-048 —
sending_queueexplicitly disabled - OTEL-049 —
sending_queue.queue_sizebelow 10 - OTEL-050 —
sending_queue.queue_sizeabove 50000 - OTEL-053 — retry
max_elapsed_timeset to 0 - OTEL-065 —
sending_queuewithout persistent storage
Version
Available since augur v0.1.0.
Further Reading
Resources
- Rule source:
rules/policy/main/main.rego