Rate limit exceeded (OpenAI)

What engineers usually see

  • Request rejected with rate limit error
  • No indication of current rate limit state
  • Cannot predict when requests will succeed
  • May be hitting different rate limit types (TPM, RPM, TPD)

Why this is hard to debug

OpenAI has multiple rate limit types that don't show up in the error messages. You can't tell which one you hit (RPM? TPM? TPD?) or how close you were to the limit. Just a generic error and you're left guessing.

Minimal repro

from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="YOUR_OPENAI_KEY",
    base_url="https://aibadgr.com/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "test"}]
    )
except RateLimitError as e:
    print(f"Rate limited: {e}")

This request routes through AI Badgr and returns a stable request ID that links to an execution record.

Note: AI Badgr is OpenAI-compatible and works as a drop-in proxy. No SDK changes required — only the base_url changes.

What a per-request execution record makes visible

  • Which rate limit was hit (RPM, TPM, TPD)
  • Current usage vs limit
  • Retry strategy recommendations
  • Rate limit reset time
  • Historical rate limit patterns

Run 1 request → get receipt

Change your base URL to https://aibadgr.com/v1 and run your request.

The response includes an X-Badgr-Request-Id header that links to a receipt showing latency, retries, tokens, cost, and failure stage for that specific execution.

Not the engineer?
Share this page with your dev and ask them to run one request through AI Badgr. That's all that's needed to get the receipt.

This kind of thing only makes sense when you can actually see what happened to a single request from start to finish, instead of trying to piece it together from scattered logs.