The term "Microservice Architecture" has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.
While services were present in the monolith world too; they differ where the business logic processing happens. Since each microservice looks at its business domain a lot of responsibility is offloaded to the clients.
Let us consider a simple search page which shows items matching your search term in a 4 x 5 grid. The search results page (client) may actually end up talking to 3 - 4 different microservices:
- search service : to get the actual search results
- product service : to get the latest product description and assets
- price service : to get the latest prices
- stock service : to get the last minute stock details - in order to get any meaningful performance. Imperative programming styles for call compositions dont scale. The problems are further compounded by the fact that at any given moment in time there could be multiple clients talking to these microservices.
It is thus our assertion that while building microservices is fun; building clients which remain resilient, performing and meaningful over extended periods of time has become much more challenging.
In this blog post we will talk about challenges which such microservice clients face and how has the problem been solved.
Communication mechanism for microservice clients: API Gateway
Pattern
Let’s assume we are building an e-commerce application product detail page for two types of client – namely
- desktop based web browser (HTML5) and
- native mobile app
In addition, the application must expose product details via a REST API for use by 3rd party applications.
A product details UI can display a lot of information about a product. For example:
Basic information about the book such as title, author, price, etc.
Your purchase history for the book
Availability
Buying options
Other items that are frequently bought with this book
Other items bought by customers who bought this book
Customer reviews
Sellers ranking
Since the application uses the Microservices pattern the product details data is spread over multiple services. For example,
Catalog Service - basic information about the product such as title, author
Order service - purchase history for product
Recommendation
Inventory service - product availability
Review service - customer reviews
Customer service – help chat …
Consequently, the code that displays the product details needs to fetch information from all of these services. So the client ends up calling the services as shown in the figure below:
Problem:
How do the clients of a Microservices-based application access the individual services?
The granularity of APIs provided by microservices is often different than what a client needs. Microservices typically provide fine-grained APIs, which means that clients need to interact with multiple services. For example, as described above, a client needing the details for a product needs to fetch data from numerous services.
- Different clients need different data. For example, the desktop browser version of a product details page desktop is typically more elaborate then the mobile version.
- Network performance is different for different types of clients. For example, a mobile network is typically much slower and has much higher latency than a non-mobile network. And, of course, any WAN is much slower than a LAN. This means that a native mobile client uses a network that has very difference performance characteristics than a LAN used by a server-side web application. The server-side web application can make multiple requests to backend services without impacting the user experience where as a mobile client can only make a few.
- The number of service instances and their locations (host+port) changes dynamically
- Partitioning into services can change over time and should be hidden from clients.
·
Solution:
Rather than provide a one-size-fits-all style API a much better approach is for clients to make a small number of requests per-page, perhaps as few as one, over the Internet to a front-end server known as an API gateway (or example, the Netflix API gateway runs client-specific adapter code that provides each client with an API that's best suited to it's requirements), which is shown in the figure below:
The API
gateway might also implement security, e.g. verify that the client is
authorized to perform the request.
The API
gateway sits between the application’s clients and the microservices. It
provides APIs that are tailored to the client. The API gateway provides a
coarse-grained API to mobile clients and a finer-grained API to desktop clients
that use a high-performance network. In this example, the desktop clients makes
multiple requests to retrieve information about a product, where as a mobile
client makes a single request.
The API
gateway handles incoming requests by making requests to some number of
microservices over the high-performance LAN. Netflix, for example, describes
how each request fans out to on average six backend services. In this example,
fine-grained requests from a desktop client are simply proxied to the
corresponding service, whereas each coarse-grained request from a mobile client
is handled by aggregating the results of calling multiple services.
Not only does
the API gateway optimize communication between clients and the application, but
it also encapsulates the details of the microservices. This enables the
microservices to evolve without impacting the clients. For examples, two
microservices might be merged. Another microservice might be partitioned into
two or more services. Only the API gateway needs to be updated to reflect these
changes. The clients are unaffected.
1 Handling Partial Failure: Circuit Breaker
One issue we
have to address when implementing an API Gateway is the problem of partial
failure. This issue arises in all distributed systems whenever one service
calls another service that is either responding slowly or is unavailable. The
API Gateway should never block indefinitely waiting for a downstream service.
However, how it handles the failure depends on the specific scenario and which
service is failing.
If there are
composite services developed (services merged in API gateway) which depends on
other core services, failing of any of the core service will have impact in the
composite service. In general we call this type of problem a chain of failures,
where an error in one component can cause errors to occur in other components
that depend on the failing component. This needs special attention in a
microservice based system landscape where, potentially a large number of,
separately deployed microservices communicate with each other.
Solution:
For example, if
the recommendation service is unresponsive in the product details scenario, the
API Gateway should return the rest of the product details to the client since
they are still useful to the user. The recommendations could either be empty or
replaced by, for example, a hardwired top ten list. If, however, the product
information service is unresponsive then API Gateway should return an error to
the client.
The API
Gateway could also return cached data if that was available. For example, since
product prices change infrequently, the API Gateway could return cached pricing
data if the pricing service is unavailable. The data can be cached by the API
Gateway itself or be stored in an external cache such as Redis or Memcached. By
returning either default data or cached data, the API Gateway ensures that
system failures do not impact the user experience.
Netflix
Hystrix is an incredibly useful library for writing code that invokes remote
services. Hystrix times out calls that exceed the specified threshold. It implements
a circuit breaker pattern, which stops the client from waiting
needlessly for an unresponsive service. A circuit breaker typically applies
state transitions like:
If the error
rate for a service exceeds a specified threshold, Hystrix trips the circuit
breaker and all requests will fail immediately for a specified period of time.
Hystrix lets you define a fallback action when a request fails, such as reading
from a cache or returning a default value. If you are using the JVM you should
definitely consider using Hystrix. And, if you are running in a non-JVM
environment, you should use an equivalent library.
Handling Scale-Out: Service Discovery
While using an
API Gateway is better than having the clients talk directly to the services we
can do a little better. Here is another
problem to consider, what happens when we scale a service in microservice based
architecture? For example consider the following diagram where we have scaled
the catalog service but the API Gateway has no idea about the new service
instance so how can it take advantage of the additional instance?
We would have
to change the API Gateway so it knows about the second instance. With features like auto-scaling provided by
the cloud platform we are deployed to and the fact we might have 100s or
services, modifying the API Gateway would just not scale.
Soution:
To solve this
problem, microservice applications typically use a Service Discovery
application which allows all microservices to register themselves and then
broadcast their existence to other services in the application.
With our
Service Discovery application now deployed, each service instance registers
itself with the Service Discovery application and in turn also queries the
Service Discovery application for what other services are available to it. This solves the scaling problem with our API
Gateway. As we bring up more instances of
XYZ service they will all register themselves with the Service Discovery
application and the API Gateway will periodically query the Service Discovery
application to make sure it has an updated list of services. Once the new instances have been registered
with the Service Discovery application the API Gateway will get them and then
be able to leverage those new instances when making requests to the
service. All this can be done
dynamically without changing and code in the API Gateway, which is exactly what
we want.
There are
additional benefits to using a Service Discovery application as well. Consider the case where we have the same
microservices deployed across multiple datacenters. Each datacenter has a deployment that looks
like the architecture above. The Service
Discovery applications in each datacenter can share data about the services
running in that datacenter with each other.
This would allow the application to be unaffected by a complete outage
of a given service in one datacenter given the service is still available in
another datacenter. This type of functionality will give our application even
more resiliency to failures.
References: