In this first X-Files post, I’ll cover the simple yet powerful golang.org/x/time/rate. This package provides a
Limiter that controls throughput to an arbitrary resource. There’s a ton of utility in
rate, and to show its virtues, I will use it to enforce an SLA on an HTTP service.
SLAs and Rate Limiting
When creating a new service, it is prudent to specify a service level agreement, or SLA. This promises an expected availability given some throughput from consumers. The benefits of an SLA are twofold. First, it provides a metric by which the success of a service is measurable. Is it exceeding expecations, or are changes necessary to achieve the SLA? Second, it is enforceable as a means to protect upstream dependencies (like a database) from a cascading failure. One of the simplest ways to impose such an SLA is rate limiting incoming requests.
The “Hello, World!” Microservice
The nascent Hello, World! microservice exposes one HTTP endpoint that makes a call to an upstream service and writes (ahem) “Hello, World!” to the response. The
http.HandlerFunc might look something like this:
The upstream dependency has some interesting characteristics. First, it takes an average of about 25ms to complete
upstream.Call(). Next, it concurrently handles up to 10 calls before queueing them. Finally, if a call takes longer than 500ms to complete, the method returns a timeout error.
To protect itself from drowning in traffic, the service should conform to an SLA. As an initial step, performing load tests will help derive the right thresholds.
Woof… At approximately 415rps, the upstream service is unable to handle the volume and begins queueing. As the RPS increases, the amount of calls enqueued must wait longer and longer to complete. When the service reaches about 450rps, the upstream calls start exceeding the 500ms threshold and failing at a logarithmic rate as the throughput continues to grow. Those requests that are lucky enough to make it through have a median latency (P50) of ~275ms and a 95th-percentile (P95) just under 500ms.
This behavior is not ideal and usually indicative of a critical failure upstream. But now there is enough information to make an SLA! While the service may be showing signs of degradation at 425rps, the success rate holds steady at 100%. Likewise, the P95 latency at this rate is ~165ms. For this example, it is safe to assume consumers will tolerate up to a P95 of 275ms. Given these limits, a target SLA can be defined:
The Hello, World! service guarantees at least 99.99% availability up to 425rps. The P95 for all requests will not exceed 275ms.
Trickle Through with
With SLA in hand, preventing a flood against the upstream service is the first order of business. The service remains stable at 425rps and should limit throughput heading upstream. To throttle the
http.HandlerFunc, the channel exposed by a
time.Ticker acts as a semaphore:
Why not outright use
time.Ticker? The channel returned from the ticker is unbuffered and would enforce serial access to the endpoint. This is not the desired behavior for any web service, especially when the upstream service handles up to 10 concurrent requests. So, a buffered channel fronts the ticker, prefilled to the specified burst. A
cancel function is returned in the event the limiter is no longer used. If
cancel is not called, the
time.Ticker, buffered channel, and goroutine would all end up leaking! Since this handler is going to live on for the life of the process, though, it can be ignored.
RateLimit function itself decorates any
http.HandlerFunc with the limiter logic. This is a common middleware pattern in Go. Applying this to the existing handler and registering it with the mux:
Now that everything’s all squared away, running the tests again reveals an improved availability:
Whelp, since only a safe amount of requests are making it through to the upstream service at a time, the service achieves 100% availability through 600rps. This satisfies the 4-nines part of the SLA. Yet, starting at 425rps, latencies show linear growth as requests pile up behind the limiter, exceeding two seconds at 600rps.
Not only does this violate the 275ms P95 stipulation, but it also has the potential to wreak havoc on the service itself. If it’s memory-bound or has limited resources (eg, file descriptors or threads), the service would soon refuse new connections or die. In fact, for reproducibility, the test harness boosts file descriptors from the default 1,024 (on Mac OS) to a million! Regardless, as long as the SLA holds at 425rps, the limiter should drop requests exceeding the deadline beyond that.
To cancel the read on the limiter, a
select statement can short circuit the request with a timeout:
The middleware’s signature now adds a
wait parameter . This specifies how long the
time.Timer should run before terminating an enqueued request. If the limiter lets the request through, the middleware calls
t.Stop(). While this isn’t required, it frees up the timer’s resources immediately.
Using the same
burst as before, the handler now needs a deadline. Given that the P95 at 425rps is ~165ms, a timeout of 75ms should bring it up to about 240ms, safely below the 275ms rule. Running load tests again, this resolves the unbounded latencies:
Sweet! The success rate remains at 100% through 435rps before falling, and the P95 latencies stay below 275ms. All the failures are exactly 75ms, indicating the rate limiter timeout is effective.
Still, 75ms is an eternity for a computer to be waiting, and a traffic spike could yet overwhelm the process. Knowing the limiter’s queue depth, it could determine if a request will even make it through. If there’s no chance, the limiter should cancel the request without waiting for the deadline.
Smarter queueing with the
rate.Limiter token bucket
Alright, hold up. Before implementing this strategy on top of the limiter, there are a couple of things to note. First, the Go wiki suggests avoiding
time.Ticker if the rate exceeds tens per second. This is likely a fidelity issue with either the clock, the performance of channels, or both. At 425rps, the limiter exceeds this heuristic by an order of magnitude. Instead, a token bucket solution is preferable … like
rate package already implements this extrapolation strategy.
Reservations: promised access to a resource after some time, like getting a table at a restaurant. Since the
Reservation encodes the time in which it can be fulfilled, the limiter can jettison requests from the start.
For the purposes of this service,
rate.Limiter#Wait achieves this desired behavior:
This version is the same as the previous, albeit trading channels and selects for
context.Context and error handling. Running the tests, though, shows the benefit of this package:
While the latencies of these graphs match the previous almost exactly, the failures are now sub-millisecond! This indicates requests doomed to fail are terminated early, preventing a potential pileup on the service. It would be prudent, of course, to provide a retry policy and return back off information to the consumers.
The service now conforms to its SLA, protecting itself, its dependencies and its consumers by shedding excess load. Though a contrived example, this strategy is useful in determining boundaries for a real-world service. Adjusting the parameters driving the limiter (
wait) will eke out as much availability and performance as desired.
More Features and Uses of
rate package is not limited to gating HTTP endpoints. Here are some other interesting uses to consider:
Fine-Grained Limitations. The example laid out above treats all requests as equal. But, the middleware can be setup to limit by arbitrary properties on the request, such as header values. The
tollboothpackage abstracts a lot of this behavior.
Runtime Configurable. One powerful feature not covered here is that the
rate.Limitercan be adjusted on the fly to have different fill rates or burst. Controlled with a distributed feature flag tool like
dcdr, these values can be modified on-the-fly without a deployment by calling
SetLimit. This is a powerful way of scaling features, or responding to environmental changes of the system.
vegetais a powerful tool for load testing HTTP services, but only handles HTTP endpoints. For APIs that expose other protocols like raw TCP or UDP,
rate.Limitercan help simulate a sustained load.
statsteeuses one for simulating StatsD traffic.
Migrations Throttling. As systems refactor and grow, data migrations become a neccessity. While the changes resulting from a migration may be simple, the volume can be dangerous. By using a rate limiter in a migration pipeline, throttling becomes a cinch.
Traffic Shaping. Fronting an
rate.Limitercan shape the flow of bytes to a particular bandwidth. This could simulate network conditions (like a 3G connection) when applied as an HTTP middleware.
What about service clusters?
A distributed system here would complicate things, but the mechanisms would remain similar. For instance, in a typical microservice architecture, multiple instances of the same service execute in a cluster with a load balancer spreading the traffic between them. While the strategies above work for a single node, it is not ideal for each to manage its own rates. Using a shared rate limiter avoids this concern, and it just so happens that Lyft recently open-sourced Ratelimit, a very powerful generic rate-limiting service.
I’ve only demonstrated a small number of use cases for
rate.Limiter. I’m curious to see how others have used this spartan but powerful package. Feel free to comment with some novel examples that I haven’t covered here!