[FAQ] OTEL unreachable and fallback

Question

When using io.Insights how does io.Connect handle unreachable OTEL collectors?

Answer

When the endpoint configured in otel.metrics.url cannot accept metric exports, io.Connect relies on the underlying OTLP exporter retry behavior.

Connection config

  • io.Connect retries metric exports within the configured timeoutMillis window. This determines the total time that io.Connect will retry sending the metrics if the exporter endpoint returns one of the following HTTP error codes: 429, 502, 503, 504.
  • The default value for timeoutMillis is 10000 ms and you can change the timeoutMillis duration by providing a numeric value under the metrics config property e.g. ”timeoutMillis”: 60000
  • The retry algorithm itself is inherited from the OTLP exporter. The default maximum number of export attempts is 5 (it is set in DEFAULT_EXPORT_MAX_ATTEMPTS)
  • The upstream OTLP retry behavior uses the Retry-After header when present, and otherwise falls back to exponential backoff with jitter.

Data handling

  • If the retry attempts for a given export are exhausted, the current export is dropped.
  • Subsequent metric exports continue normally.

Recovery

  • Once the collector path is healthy again, later metric exports will succeed without requiring changes to the metric configuration.
  • You can place a reverse proxy in front of the OTEL collector so that, if the collector is down, the proxy returns 502 Bad Gateway.