Overview
What this concept solves
The token bucket is the most widely deployed rate-limiting algorithm on the public internet. It powers the request quotas behind APIs at Stripe, AWS, GitHub, Cloudflare, and most service meshes. The reason: it lets you allow short bursts while still enforcing a strict long-term rate — and it costs almost nothing to implement.
The mental model is exactly what the name suggests. A bucket holds tokens. A faucet drips new tokens in at a constant rate. Every incoming request tries to grab one token. If the bucket has one, the request goes through. If it's empty, the request is rejected (or queued).
Mechanics
How it works
Two numbers, that's it
A token bucket is described by two parameters:
- Capacity (B) — the maximum number of tokens the bucket can hold. This is your burst budget.
- Refill rate (r) — how many tokens are added per second. This is your sustained throughput.
Each arriving request costs one token (or sometimes more — for example, a heavy endpoint might cost 5 tokens). The algorithm is:
- On each request, first refill:
tokens = min(B, tokens + elapsed × r). - If
tokens ≥ cost, subtractcostand allow the request. - Otherwise, reject the request (or queue it, depending on the policy).
Why the lazy refill?
Notice the refill is computed on-demand using the elapsed time since the last update — there's no background timer ticking tokens in. This is the trick that makes token bucket O(1) memory and trivially distributed: you only need to store tokens and lastRefill per client.
Bursts vs. sustained rate
The bucket capacity B controls how big a burst can be. The refill rate r controls the sustained rate. These two are independent — you can pick generous bursts with a tight long-run rate, or vice versa.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
A bucket of size 10 refills at one token per second. Each request consumes a token. Hit 'Burst of 10' to drain it instantly — then watch the bucket recover at the steady refill rate.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Drain it on purpose
Click 'Burst of 10' once. Watch the bucket empty and the next clicks get rejected. Time how long until you can send one again — that's the refill rate doing its job.
The 'burst then idle' trick
Send a burst, wait 10 seconds without clicking, then burst again. The bucket refilled fully during your idle period, so you get a fresh burst budget — even though your one-minute total is well above the refill rate.
Predict the steady state
If the bucket starts full and you click 'Send request' once per second forever, what does the token count converge to? Try it for 30 seconds and confirm.
In practice
When to use it — and what you give up
When to reach for it
- Public APIs where occasional bursts are user-friendly but sustained abuse must be capped.
- Per-user or per-API-key quotas at the edge — gateway, reverse proxy, or service mesh.
- Anywhere you want a simple two-number contract: "X requests/sec, with bursts up to Y".
Real-world example
Stripe's API limits are token-bucket: 100 requests/sec sustained, with the ability to burst to ~25 in a short window. AWS API Gateway uses the same model under the hood.
Pros
- Allows short bursts — feels good for real user traffic that is bursty by nature.
- Two counters per client — trivial in memory and easy to distribute via Redis or a counter store.
- Lazy O(1) refill with no background scheduler.
- Easy to tune: capacity = burst budget, refill = sustained rate.
Cons
- Brief over-rate windows are intentional — if downstream cannot handle a burst, this is the wrong tool.
- In distributed setups, contention on the shared counter requires careful design (Redis Lua scripts, sharding, or sloppy approximations).
- Heterogeneous request costs need careful pricing or one expensive call can drain the bucket.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// A minimal token bucket. Lazy refill — no timer needed.
class TokenBucket {
constructor(
private capacity: number, // max burst budget
private refillPerSecond: number, // sustained rate
private tokens = capacity,
private lastRefill = Date.now(),
) {}
tryConsume(cost = 1): boolean {
this.refill();
if (this.tokens < cost) return false;
this.tokens -= cost;
return true;
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillPerSecond,
);
this.lastRefill = now;
}
}
// Usage: 100 req/s sustained, bursts up to 25
const bucket = new TokenBucket(25, 100);
if (bucket.tryConsume()) {
// ... handle the request
} else {
// ... return 429 Too Many Requests
}References & further reading
4 sources- Articleen.wikipedia.org
Wikipedia — Token bucket
Clear canonical description of the algorithm and its bandwidth/burstiness semantics.
- Articlestripe.com
Stripe Engineering — Scaling your API with rate limiters
Public engineering blog walks through their token-bucket choices in production.
- Specrfc-editor.org
RFC 2697 — A Single Rate Three Color Marker
Formal two-token-bucket meter (CIR/CBS/EBS) used in network traffic shaping.
- Articleblog.cloudflare.com
Cloudflare — How we built rate limiting capable of scaling to millions of domains
Distributed limiter with Redis. Worth the read for the consistency trade-offs.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
Which parameter of a token bucket controls how big a sudden burst can be?
question 02 / 03
Why is refill computed on each request instead of by a background timer?
question 03 / 03
A token bucket has capacity 100 and refill rate 10/sec. What is the maximum number of requests that can be served in 1 minute, starting from a full bucket?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.