Sunday, 6 December 2015

Challenges Faced by Microservice Client and its Resolution

The term "Microservice Architecture" has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.

While services were present in the monolith world too; they differ where the business logic processing happens. Since each microservice looks at its business domain a lot of responsibility is offloaded to the clients.

Let us consider a simple search page which shows items matching your search term in a 4 x 5 grid. The search results page (client) may actually end up talking to 3 - 4 different microservices:

 - search service : to get the actual search results
 - product service : to get the latest product description and assets
 - price service : to get the latest prices
 - stock service : to get the last minute stock details  - in order to get any meaningful performance. Imperative programming styles for call compositions dont scale. The problems are further compounded by the fact that at any given moment in time there could be multiple clients talking to these microservices. 

It is thus our assertion that while building microservices is fun; building clients which remain resilient, performing and meaningful over extended periods of time has become much more challenging.

In this blog post we will talk about challenges which such microservice clients face and how has the problem been solved.

Communication mechanism for microservice clients: API Gateway Pattern

Let’s assume we are building an e-commerce application product detail page for two types of client – namely
 - desktop based web browser (HTML5) and
 - native mobile app

In addition, the application must expose product details via a REST API for use by 3rd party applications.
A product details UI can display a lot of information about a product. For example:
Basic information about the book such as title, author, price, etc.
Your purchase history for the book
Availability
Buying options
Other items that are frequently bought with this book
Other items bought by customers who bought this book
Customer reviews
Sellers ranking

Since the application uses the Microservices pattern the product details data is spread over multiple services. For example,

Catalog Service - basic information about the product such as title, author
Order service - purchase history for product
Recommendation
Inventory service - product availability
Review service - customer reviews
Customer service – help chat  …

Consequently, the code that displays the product details needs to fetch information from all of these services. So the client ends up calling the services as shown in the figure below:

Problem:

How do the clients of a Microservices-based application access the individual services?

The granularity of APIs provided by microservices is often different than what a client needs. Microservices typically provide fine-grained APIs, which means that clients need to interact with multiple services. For example, as described above, a client needing the details for a product needs to fetch data from numerous services.
 - Different clients need different data. For example, the desktop browser version of a product details page desktop is typically more elaborate then the mobile version.
 - Network performance is different for different types of clients. For example, a mobile network is typically much slower and has much higher latency than a non-mobile network. And, of course, any WAN is much slower than a LAN. This means that a native mobile client uses a network that has very difference performance characteristics than a LAN used by a server-side web application. The server-side web application can make multiple requests to backend services without impacting the user experience where as a mobile client can only make a few.
 - The number of service instances and their locations (host+port) changes dynamically
 - Partitioning into services can change over time and should be hidden from clients.

·      

Solution:

Rather than provide a one-size-fits-all style API a much better approach is for clients to make a small number of requests per-page, perhaps as few as one, over the Internet to a front-end server known as an API gateway (or example, the Netflix API gateway runs client-specific adapter code that provides each client with an API that's best suited to it's requirements), which is shown in the figure below:
 

The API gateway might also implement security, e.g. verify that the client is authorized to perform the request.

The API gateway sits between the application’s clients and the microservices. It provides APIs that are tailored to the client. The API gateway provides a coarse-grained API to mobile clients and a finer-grained API to desktop clients that use a high-performance network. In this example, the desktop clients makes multiple requests to retrieve information about a product, where as a mobile client makes a single request.

The API gateway handles incoming requests by making requests to some number of microservices over the high-performance LAN. Netflix, for example, describes how each request fans out to on average six backend services. In this example, fine-grained requests from a desktop client are simply proxied to the corresponding service, whereas each coarse-grained request from a mobile client is handled by aggregating the results of calling multiple services.

Not only does the API gateway optimize communication between clients and the application, but it also encapsulates the details of the microservices. This enables the microservices to evolve without impacting the clients. For examples, two microservices might be merged. Another microservice might be partitioned into two or more services. Only the API gateway needs to be updated to reflect these changes. The clients are unaffected.

1       Handling Partial Failure: Circuit Breaker

One issue we have to address when implementing an API Gateway is the problem of partial failure. This issue arises in all distributed systems whenever one service calls another service that is either responding slowly or is unavailable. The API Gateway should never block indefinitely waiting for a downstream service. However, how it handles the failure depends on the specific scenario and which service is failing.

If there are composite services developed (services merged in API gateway) which depends on other core services, failing of any of the core service will have impact in the composite service. In general we call this type of problem a chain of failures, where an error in one component can cause errors to occur in other components that depend on the failing component. This needs special attention in a microservice based system landscape where, potentially a large number of, separately deployed microservices communicate with each other.

Solution:

For example, if the recommendation service is unresponsive in the product details scenario, the API Gateway should return the rest of the product details to the client since they are still useful to the user. The recommendations could either be empty or replaced by, for example, a hardwired top ten list. If, however, the product information service is unresponsive then API Gateway should return an error to the client.

The API Gateway could also return cached data if that was available. For example, since product prices change infrequently, the API Gateway could return cached pricing data if the pricing service is unavailable. The data can be cached by the API Gateway itself or be stored in an external cache such as Redis or Memcached. By returning either default data or cached data, the API Gateway ensures that system failures do not impact the user experience.

Netflix Hystrix is an incredibly useful library for writing code that invokes remote services. Hystrix times out calls that exceed the specified threshold. It implements a circuit breaker pattern, which stops the client from waiting needlessly for an unresponsive service. A circuit breaker typically applies state transitions like:

 

If the error rate for a service exceeds a specified threshold, Hystrix trips the circuit breaker and all requests will fail immediately for a specified period of time. Hystrix lets you define a fallback action when a request fails, such as reading from a cache or returning a default value. If you are using the JVM you should definitely consider using Hystrix. And, if you are running in a non-JVM environment, you should use an equivalent library.

Handling Scale-Out: Service Discovery

While using an API Gateway is better than having the clients talk directly to the services we can do a little better.  Here is another problem to consider, what happens when we scale a service in microservice based architecture? For example consider the following diagram where we have scaled the catalog service but the API Gateway has no idea about the new service instance so how can it take advantage of the additional instance? 


We would have to change the API Gateway so it knows about the second instance.  With features like auto-scaling provided by the cloud platform we are deployed to and the fact we might have 100s or services, modifying the API Gateway would just not scale.

Soution:

To solve this problem, microservice applications typically use a Service Discovery application which allows all microservices to register themselves and then broadcast their existence to other services in the application. 

With our Service Discovery application now deployed, each service instance registers itself with the Service Discovery application and in turn also queries the Service Discovery application for what other services are available to it.  This solves the scaling problem with our API Gateway.  As we bring up more instances of XYZ service they will all register themselves with the Service Discovery application and the API Gateway will periodically query the Service Discovery application to make sure it has an updated list of services.  Once the new instances have been registered with the Service Discovery application the API Gateway will get them and then be able to leverage those new instances when making requests to the service.  All this can be done dynamically without changing and code in the API Gateway, which is exactly what we want.

 

There are additional benefits to using a Service Discovery application as well.  Consider the case where we have the same microservices deployed across multiple datacenters.  Each datacenter has a deployment that looks like the architecture above.  The Service Discovery applications in each datacenter can share data about the services running in that datacenter with each other.  This would allow the application to be unaffected by a complete outage of a given service in one datacenter given the service is still available in another datacenter. This type of functionality will give our application even more resiliency to failures.




References:





Understanding of Various Components Provided by Netflix OSS Sack

In this post we will try to understand various components of the Netfix OSS Stack while building bespoke E-commerce shopping application based on microservices & REST principles. 

Various Components provided by Netflix OSS stack:

Spring Cloud integrates the Netflix components in the spring environment in a very nice way using auto configuration and convention over configuration similar to how Spring Boot works.

The table below maps the generic components in the operations model to the actual components that can be used to build a microservice based application:

Operation Component
Netflix, Spring
Service Discovery
Eureka (Netflix)
Dynamic Routing, Client-side load balancing
Ribbon (Netflix)
Circuit Breaker
Hystrix (Netflix)
Monitoring the services
Netflix Hystrix Dashboard and Turbine
Router/Filter/Server-side load balancing
Zuul (Netflix)
External Configuration Management
Archaius (Netflix), Sping cloud config server
OAuth 2.0 protected API*
Spring Security OAuth2


* Protecting service API with OAuth is not specific to microservices and can be applied to any service based architecture.

Eureka

Eureka is Service Discovery component provided by Netflix. Service discovery is one of the important and key needs in a microservices based architecture. It will be tough and error prone to do manual service discovery. Eureka provides a server and a client component. The server component can be configured and deployed to highly available with servers replicating state about the registered services to the others.

Service Discovery: Eureka Client:

When a client i.e. a microservice API registers with Eureka client it provides the basic meta-data in terms of host, port, health-check URL, home page URL etc. Eureka tries to get connected with the client by receiving heartbeat. If the server doesn’t receive heartbeat from a specific client (configurable) the instance is removed from the registry.

Service Discovery: Eureka Server:

The Eureka server does not have a backend store, but the service instances in the registry all have to send heartbeats to keep their registrations up to date (so this can be done in memory). Clients also have an in-memory cache of eureka registrations (so they don’t have to go to the registry for every single request to a service).

By default every Eureka server is also a Eureka client and requires (at least one) service URL to locate a peer. If you don’t provide it the service will run and work, but it will shower your logs with a lot of noise about not being able to register with the peer.

Netflix Ribbon - Dynamic Routing and Load Balancer

Netflix Ribbon can be used by service consumers to lookup services at runtime. Ribbon uses the information available in Eureka to locate appropriate service instances. If more than one instance is found, Ribbon will apply load balancing to spread the requests over the available instances. Ribbon does not run as a separate service but instead as an embedded component in each service consumer.

Netflix Zuul – Filter/Router/Server-side (Edge Server)

Zuul is (of course) our gatekeeper to the outside world, not allowing any unauthorized external requests pass through. Zulu also provides a well-known entry point to the microservices in the system landscape. Using dynamically allocated ports is convenient to avoid port conflicts and to minimize administration but it makes it of course harder for any given service consumer. Zuul uses Ribbon to lookup available services and routes the external request to an appropriate service instance. In this blog post we will only use Zuul to provide a well-known entry point, leaving the security aspects for coming blog posts.

Hystrix – Circuit Breaker:

Netflix Hystrix provides a framework for fault handling using a circuit breaker pattern. If some composite services depend on other core services, then failure of any of the core service can jeopardize the whole composite system if faults are not handled properly.

Netflix Hystrix provides circuit breaker capabilities to a service consumer. If a service doesn’t respond (e.g. due to a timeout or a communication error), Hystrix can redirect the call to an internal fallback method in the service consumer. If a service repeatedly fails to respond, Hystrix will open the circuit and fast fail (i.e. call the internal fallback method without trying to call the service) on every subsequent call until the service is available again. To determine wether the service is available again Hystrix allow some requests to try out the service even if the circuit is open. Hystrix executes embedded within its service consumer.

Netflix Hystrix dashboard and Netflix Turbine - Monitor Dashboard

Hystrix dashboard can be used to provide a graphical overview of circuit breakers and Turbine can, based on information in Eureka, provide the dashboard with information from all circuit breakers in a system landscape.

Archaius: External Configuration Management:

Archaius is the Netflix client side configuration library. It is the library used by all of the Netflix OSS components for configuration. Archaius is an extension of the Apache Commons Configuration project. It allows updates to configuration by either polling a source for changes or for a source to push changes to the client. Archaius uses Dynamic<Type>Property classes as handles to properties.
Archaius has its own set of configuration files and loading priorities. Spring applications should generally not use Archaius directly, but the need to configure the Netflix tools natively remains. Spring Cloud has a Spring Environment Bridge so Archaius can read properties from the Spring Environment.

Following is the diagram depicts the overall ecosystem with various Netflix component interactions.

 

The above diagram explains a microservice based architecture for an e-commerce portal. We have elaborated one main service which deals with product information. The whole service based architecture can be divided into 3 layers logically. The core services are the unique services producing unit information like item information, price information and ratings and review. The main product composite service consumes these core services and produces full set of information related to a product. All the services are registered with Eureka.

Ribbon is responsible to load balance across multiple instances of the core services. To avoid service outage due to a failing service or temporary network problems it is very common to have more than one service instance of the same type running and using a load balancer to spread the incoming calls over the instances. Since we are using dynamic allocated ports and a service discovery server it is very easy to add a new instance. For example simply start a new review service and it will allocate a new port dynamically and register it to the service discovery server.

The service product-composite is also enhanced with a Hystrix based circuit breaker so that if any of the core services fails it will fall back to some default (static) source. The Hystrix dashboard is also configured to monitor the status of the core services.

Similar to composite service we have created one more API service layer for the same to protect the service with OAuth2. So this layer also has circuit breaker and load balancer in order to handle fault tolerance and load balancing for product composite service. The corresponding OAuth client should be configured in the service consumer.

All the components are maintaining their configuration data thru Archaius configuration management component.