Rate Limits
Current limit
30 requests per minute per partner API key.
This is a hard ceiling enforced at the Zuplo edge before the origin receives the request.
How it's measured
- Sliding window, not a calendar-minute bucket. The 30-req budget is based on the trailing 60 seconds from the moment the request arrives.
- Per API key. If you have two keys (e.g. a dev key and a prod key from the same tenant), they have independent 30-req/min budgets.
- Every HTTP request counts, regardless of response status. A
422that bounces on validation still consumed one unit of budget. - Idempotent replays count. Re-sending a
request_idthat hits the 5-min cache still counts against rate limit (Zuplo enforces before cache lookup).
What happens on breach
HTTP 429 Too Many Requests with a Retry-After response header. The Retry-After value is seconds to wait before the next safe request.
HTTP/1.1 429 Too Many Requests
Retry-After: 17
Content-Type: application/json
<Zuplo-emitted rate-limit body — see partnerdocs.collectivex.health/api for schema>
The exact body is Zuplo-emitted, so its shape is documented in the auto-generated OpenAPI reference at partnerdocs.collectivex.health/api rather than here. The Retry-After header is what your code should key off of — not the body.
Recommended client handling
Minimum viable
Honor Retry-After literally:
if r.status_code == 429:
time.sleep(int(r.headers["Retry-After"]))
# Then retry with same request_id
That's correct but pessimistic at low concurrency.
Exponential backoff with jitter (preferred)
For request bursts or batch jobs, wrap retries in exponential backoff with full jitter to avoid thundering-herd:
import time
import random
def backoff_with_jitter(attempt: int, retry_after: int | None) -> float:
"""Base case: honor Retry-After. Fallback: exponential with jitter."""
if retry_after is not None:
return retry_after
base = min(2 ** attempt, 32) # 1, 2, 4, 8, 16, 32 (cap)
return random.uniform(0, base) # full jitter
def call_with_retry(request_body, max_retries=5):
for attempt in range(max_retries + 1):
r = httpx.post(
f"{CXH_BASE_URL}/v1/oura/recommendation",
headers={"Authorization": f"ApiKey {CXH_API_KEY}"},
json=request_body,
)
if r.status_code != 429:
return r
retry_after = int(r.headers.get("Retry-After", 0)) or None
sleep_s = backoff_with_jitter(attempt, retry_after)
time.sleep(sleep_s)
raise RuntimeError(f"Exceeded {max_retries} 429 retries")
Node.js / JavaScript
async function callWithRetry(body, { maxRetries = 5 } = {}) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const r = await fetch(`${CXH_BASE_URL}/v1/oura/recommendation`, {
method: "POST",
headers: {
"Authorization": `ApiKey ${CXH_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
if (r.status !== 429) return r;
const retryAfter = parseInt(r.headers.get("Retry-After") || "0", 10);
const base = Math.min(2 ** attempt, 32);
const sleepMs = (retryAfter || Math.random() * base) * 1000;
await new Promise((resolve) => setTimeout(resolve, sleepMs));
}
throw new Error(`Exceeded ${maxRetries} 429 retries`);
}
Planning your throughput
At 30 req/min sustained:
- 30 requests/min = 1 request every 2 seconds.
- ~43k requests/day if evenly paced.
- Burst budget is soft — the sliding window allows ~30 requests in a sub-second burst, then throttles until the window recovers.
For batch workloads (e.g. nightly digest, re-processing), rate-limit your own dispatcher at 25 req/min (leaves 5-req headroom for user-initiated traffic). A simple token-bucket works:
# Refill 25 tokens per minute = 1 token every 2.4 seconds
MIN_INTERVAL_S = 60.0 / 25 # 2.4
last_send = 0.0
for item in batch:
elapsed = time.monotonic() - last_send
if elapsed < MIN_INTERVAL_S:
time.sleep(MIN_INTERVAL_S - elapsed)
send(item)
last_send = time.monotonic()
Getting a higher limit
30 req/min is the default for sandbox integration. Prod cutover may have a different ceiling — decided per partner contract.
To request a higher limit:
- Open a ticket via support with subject
Rate limit increase — <tenant-id>. - Include: current peak req/min (from your own metrics), projected peak, and a justification (user growth, feature launch, batch workload).
- Response time: typically 2–3 business days. Rate-limit changes require CollectiveX-side review of origin capacity.
- Sandbox limits can be raised temporarily for load testing — request a time window (e.g. "weekdays 10am–12pm UTC for 2 weeks").
Anti-patterns
- Don't retry immediately on 429. You'll just consume more budget and get throttled harder.
- Don't retry forever. Cap at 5–7 retries. If you're genuinely hitting the limit, your architecture needs a queue, not longer backoff.
- Don't parallelize requests for the same partner tenant across multiple callers without coordinating rate. Two independent callers each doing 25 req/min = 50 req/min total = constant throttling. Put a shared token-bucket in front.
- Don't confuse 429 with 503. 429 = you're going too fast; back off. 503 = we have a transient outage; also back off but the retry-after semantics differ. Both merit honoring
Retry-Afterif present.
Monitoring your own headroom
Every 200/4xx response carries:
X-RateLimit-Remaining: 27
X-RateLimit-Limit: 30
X-RateLimit-Reset: 42 # seconds until the sliding window resets
Use X-RateLimit-Remaining to drive client-side throttling — back off proactively when remaining drops below your safety margin (e.g. 5 or 6) instead of waiting for the 429.
A 429 indicates you exceeded the window by at least 1 request. Treat hitting any 429 as a sign your dispatch logic is too aggressive and tighten the throttle.