As developers/architects, we often use caching mechanisms at our API Layer to reduce the strain we put on our backend services. Some of the backend service operations, however, cannot be cached by nature and we need to pass-through user requests to them. When these operations are considered as “expensive,” we should apply some protection mechanisms to avoid possible service degradations.
The “expensive” service operations which may not be cached are:
- Bulk Processing
- Bulk Insert/Updates
- User-based Transactions involving integrations to supplier/partners
In an ideal world, the Service Owners of these services should do performance baselining before releasing the service to production use. The baselining will reveal how well the service performs under the standard load. The performance metrics may differ by service type, but for a bulk processing service, for example, it can be “maximum response time.”
When communicating this “maximum response time” value with clients, the Service Owner’s aim is to set the expectations on the customer side. The Service Owner also hopes that the customer will understand that this is an expensive operation, and it should not be called excessively. Unfortunately, this expectation is not always fulfilled.
Consider this scenario about a “Bulk Processing Service”: the Service Owner determined that the service can process each bulk request in a maximum of 30 seconds. This is written into a contract and attached to the services’ API’s specification.
The customer, who runs large transactions on the API, though, thinks the throughput of 1 bulk request/30 seconds is not fast enough. They want a throughput of 1 bulk request/10 seconds. Without communicating with the service provider, they decide to deploy 3 API clients and start running them in parallel with the same credentials. This increases the overall throughput to their expected levels. They are happy, but the backend service is not.
The clients will always want more from the service, and the service provider has a limited capacity. The service provider should consider implementing extra measures to maintain an acceptable service level over its available capacity.
In this article, we will discuss how we can enforce certain service levels in our API layer to protect our backend services. Two mechanisms we will introduce are:
- Total API Hit Protection
- Parallel Call Execution Protection
Total API Hit Protection
In this approach, the API provider decides on a maximum API hit limit and starts tracking the usage in the API’s protection layer. Each time there is a hit against the API, the counter is increased by 1. If the counter reaches the predefined limit, the API starts to return 5XX errors stating that the user has reached the API hit limit. At the top of the hour, the counter is reset back to zero.
“Total API Hit Protection” protects the API from buggy clients and Denial of Service (DoS) attacks. A legitimate client would not usually send ten thousand bulk requests per hour unless it implements an erroneous code. It is also common to see extensive amounts of calls in DoS/DDoS cases.
Parallel Call Execution Protection
The Parallel Call Execution Protection mechanism tries to limit the parallel API calls from the same user. As discussed earlier, legitimate API clients can use parallel execution methods to increase their throughput, but this method puts extra strain on the “expensive” backend services. Illegitimate clients can also use this method in a denial of service (DoS) attack.
Parallel execution protection works by assigning a parameter called minimum number of seconds between API calls.
If, for instance, this parameter is set to 5 seconds, the client has to wait 5 seconds before it sends its next request, for the same user. If the client calls the API without waiting 5 seconds, the API will return an HTTP 5XX error while telling the client the number of remaining seconds it has to wait (say, 3 seconds).
The Parallel Call Execution Protection mechanism works by keeping track of the last attempt’s timestamp against a user credential (username, security token, IP address, etc.).
Since these protection mechanisms will apply to all APIs, it would be wise to deploy this as a shared library or a shared cloud function. The shared library/function can then intercept all requests and apply the protection mechanisms. The shared library should also utilize in-memory caches for request tracking as the lookup delays add up to the overall request processing delay.
Service Level Management at the API Layer
Different API users may require different service levels. For example, a system user, representing an internal service, can be given a higher total API hit limit than an external user. The API Protection Layer, therefore, should be able to apply different service-levels per user credential. This is achieved by implementing Service Level Management practices at the API Protection Layer.
A service level (SL) is composed of SLI (Service Level Indicators) which should correspond to protection limits we discussed earlier. For example, a “Gold” service level can include the following KQIs:
- Max API Hit: 1 million/hour
- Min Wait Period: 3 seconds
While a “Silver” service level can include the following:
- Max API Hit: 1 thousand /hour
- Min Wait Period: 5 seconds
The service level, basically tells the service user under which circumstances the service is expected to perform successfully. For the Silver level, for example, service is expected to perform well up to 1000 calls per hour.
When we want to provide access to a new web service client, we usually provide them the API (in YAML, HTML, Word or another format). If we decide to apply SLM to our API Layer, we would also need to assign an initial service level to this client (in any phase of the service lifecycle, the service level can also be re-negotiated). The service levels can also be applied to organizations which are in turn propagated to the user level.
Customer | Username | Service Level |
ABC | murat.balkan | Gold |
XYZ | tcbcan | Silver |
After the service levels are set, the API Protection Layer will then need to look up each user to find their assigned service level. The SLI parameters attached to the service levels can then be applied to the API requests.
The protection implementation should also define a “Default” service level which will be used as a fallback mechanism, in case the lookup returns no records for a given user. In a successfully executed SLM implementation, each service instance, (in this case – API Access Service) should be assigned a service level and tracked under an SLA instance.
The two methods explained above can be used to throttle client usage, especially against the “expensive” API operations we mentioned before. Even for the non-expensive APIs, these mechanisms are still applicable, however, certain design principles need to be followed. For example, if the same UI layer is reusing the same privileges for accessing a backend API, parallel runs would be inevitable. The actual user behavior will follow random patterns, which may result in nearly parallel runs. This may lead to some false-positively blocked requests which we should avoid by not using the same credentials from the UI layer.