You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most will eventually need to rate limit on various dimensions (concurrency, RPS, per-IP, per-token, per-resource)
Rate limiting may need to happen for reasons other than the API user (load shedding)
"Swiss cheese approach": Cheaper rate limiters can be placed closer to downstream, those that do the less work (require less information like IP / anonymous vs current quota, user account etc)
Algorithms: At the very least use something like a sliding window, or something like leaky bucket or Generic Cell Rate Algorithm. Fixed windows have a lot of flaws. (ex 5000 requests in one second at the beginning of each window)
May need to separate the concept of rate limiting (more internal, protecting the platform) vs API quotas (external, documented)
Clients should be encouraged to be able to react to 429 gracefully, possibly having to use client side queuing. Predicting rate limits through structured responses is hard.
Thoughts
What about limiting based on complexity or cost of the request? For example, a
request that requires a lot of processing or data transfer could be limited more
than a simple request. This could help prevent abuse of the system while still
allowing users to make simple requests.
The text was updated successfully, but these errors were encountered:
Marc thoughts
Most will eventually need to rate limit on various dimensions (concurrency, RPS, per-IP, per-token, per-resource)
Rate limiting may need to happen for reasons other than the API user (load shedding)
"Swiss cheese approach": Cheaper rate limiters can be placed closer to downstream, those that do the less work (require less information like IP / anonymous vs current quota, user account etc)
Algorithms: At the very least use something like a sliding window, or something like leaky bucket or Generic Cell Rate Algorithm. Fixed windows have a lot of flaws. (ex 5000 requests in one second at the beginning of each window)
May need to separate the concept of rate limiting (more internal, protecting the platform) vs API quotas (external, documented)
Maybe hot take: It’s very hard to externally describe the state of the rate limiter externally (headers for example). But maybe you have a higher level concept (quota) that can be expressed this way (https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/)
Clients should be encouraged to be able to react to
429
gracefully, possibly having to use client side queuing. Predicting rate limits through structured responses is hard.Thoughts
What about limiting based on complexity or cost of the request? For example, a
request that requires a lot of processing or data transfer could be limited more
than a simple request. This could help prevent abuse of the system while still
allowing users to make simple requests.
The text was updated successfully, but these errors were encountered: