Feign, Hystrix, Ribbon, Eureka, are great tools, all nicely packed in Spring Cloud, allowing us to achieve great resilience in our massively distributed applications, with such great ease!!! This is true, at least till the easy part... To be honest, it is easier to get all the great resilience patterns working together with those tools than without, but making everything work as intended needs some studying, time and testing.
Unfortunately (or not) I'm not going to explain how to set all this up here, I'll just point out some tricks with error management with those tools. I chose this topic because I’ve struggled a lot with this (really)!!!
If you are looking for a getting started tutorial on those tools I recommend the following articles:
There will be code in this article, but not that much, you can find the missing parts in this repository
Let's say, after some trouble, you ended up with a dependency set looking like this one:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-eureka</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-hystrix</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-feign</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-ribbon</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.retry</groupId>
<artifactId>spring-retry</artifactId>
</dependency>
Ok, so you are aiming at the full set:
Eureka client
to get your service instances from your Eureka server
Ribbon
can provide a proper client-side load-balancer using service names and not URLs (and decorate RestTemplate
to use names and load-balancing)Hystrix
with lots of built-in anti-fragile patterns, another awesome tool but you need to keep an eye on it (not part of this article...)Feign
for really easy-to-write rest clientsThis article uses the following versions of Spring Cloud:
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.5.13.RELEASE</version>
</parent>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Edgware.SR3</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
These tools need configuration, let's assume you have configured up something similar in your application.yml
:
spring:
application:
name: my-awesome-app
eureka:
client:
serviceUrl:
defaultZone: http://my-eureka-instance:port/eureka/
feign:
hystrix:
enabled: true
hystrix:
threadpool:
default:
coreSize: 15
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 2000
ribbon:
ReadTimeout: 400
ConnectTimeout: 100
OkToRetryOnAllOperations: true
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 1
This configuration will work if your application can register to Eureka using its hostname and application port. For production / cloud / any environment with proxies you need to have additional properties:
eureka.instance.hostname
with the real hostname to use to reach your serviceeureka.instance.nonSecurePort
with the non-secure-port to use or eureka.instance.securePort
with eureka.instance.securePortEnabled=true
Also this configuration isn't authenticated, it can be a good idea to add authentication to Eureka
, depending on your network.
From the Ribbon configuration I see you have confidence in your Web Services, 400ms for a ReadTimeout is quite short, the shorter the better!
We can also notice that all your services are idempotent because you accept to have 4 calls instead of 1 if your network / servers starts to get messy (yes, this Ribbon
configuration will make 4 requests if the response times out because it is actually doing: ( 1 + MaxAutoRetries ) x ( 1 + MaxAutoRetriesNextServer) = 4. So if you set 2 and 3 respectively, you will have up to 12 requests only from Ribbon
).
This gets us to the 2000ms Hystrix timeout, a shorter value will result in requests being done without the application waiting for the result so this seems legit (due to ribbon configuration : (400 + 100) * 4).
Everything goes well, you quickly understand that, for all FeignClient
s without fallback you only get HystrixRuntimeException
for any error. This exception is mainly saying that something went wrong and you don't have a fallback but the cause can tell you a little bit more. You quickly build an ExceptionHandler
to display nicer messages to users (because you don't want to put fallbacks on all FeignClient
).
One day you call a new external service and this service can have normal responses with HTTP 404 for some resources, so you add decode404 = true
to your @FeignClient
to get a response and avoid circuit breaking on those (if this option is not set, a 404 will be counted for circuit breaking). But you don't get responses, what you get is:
...
Caused by: feign.codec.DecodeException: Could not extract response: no suitable HttpMessageConverter found for response type [class ...
...
This is because the 404 from this service has a different form than "normal" responses (can be a simple String saying that the resource wasn't found). A cool idea here would be to allow Optional<?>
and ResponseEntity<?>
types in FeignClient
to get an empty body for those 404s.
AutoConfigured Spring Cloud Feign can map to ResponseEntity<?>
but will fail to deserialize incompatible objects. It cannot, by default, put results in Optional<?>
so it is still a cool feature to implement.
One way to achieve this is to define a Decoder
similar to this:
package fr.ippon.feign;
import java.io.IOException;
import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;
import java.util.Optional;
import org.springframework.http.ResponseEntity;
import org.springframework.util.Assert;
import feign.FeignException;
import feign.Response;
import feign.Util;
import feign.codec.DecodeException;
import feign.codec.Decoder;
public class NotFoundAwareDecoder implements Decoder {
private final Decoder delegate;
public NotFoundAwareDecoder(Decoder delegate) {
Assert.notNull(delegate, "Can't build this decoder with a null delegated decoder");
this.delegate = delegate;
}
@Override
public Object decode(Response response, Type type) throws IOException, DecodeException, FeignException {
if (!(type instanceof ParameterizedType)) {
return delegate.decode(response, type);
}
if (isParameterizedTypeOf(type, Optional.class)) {
return decodeOptional(response, type);
}
if (isParameterizedTypeOf(type, ResponseEntity.class)) {
return decodeResponseEntity(response, type);
}
return delegate.decode(response, type);
}
private boolean isParameterizedTypeOf(Type type, Class<?> clazz) {
ParameterizedType parameterizedType = (ParameterizedType) type;
return parameterizedType.getRawType().equals(clazz);
}
private Object decodeOptional(Response response, Type type) throws IOException {
if (response.status() == 404) {
return Optional.empty();
}
Type enclosedType = Util.resolveLastTypeParameter(type, Optional.class);
Object decodedValue = delegate.decode(response, enclosedType);
if (decodedValue == null) {
return Optional.empty();
}
return Optional.of(decodedValue);
}
private Object decodeResponseEntity(Response response, Type type) throws IOException {
if (response.status() == 404) {
return ResponseEntity.notFound().build();
}
return delegate.decode(response, type);
}
}
Then, a @Configuration
file:
package fr.ippon.feign;
import org.springframework.beans.factory.ObjectFactory;
import org.springframework.boot.autoconfigure.web.HttpMessageConverters;
import org.springframework.cloud.client.circuitbreaker.EnableCircuitBreaker;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;
import org.springframework.cloud.netflix.feign.EnableFeignClients;
import org.springframework.cloud.netflix.feign.support.ResponseEntityDecoder;
import org.springframework.cloud.netflix.feign.support.SpringDecoder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import feign.codec.Decoder;
@Configuration
@EnableCircuitBreaker
@EnableDiscoveryClient
public class FeignConfiguration {
@Bean
public Decoder notFoundAwareDecoder(ObjectFactory<HttpMessageConverters> messageConverters) {
return new NotFoundAwareDecoder(new ResponseEntityDecoder(new SpringDecoder(messageConverters)));
}
}
Of course it is up to you to fit it to your exact needs, but this way you will be able to get proper responses.
All this really cool stuff can change from Spring Cloud one minor version to another (eg : Hystrix enabled by default to Hystrix disabled by default) so unless you aren't missing any update (I don't think it is possible) I strongly recommend adding good integration tests for this stack usage (unit tests will not be of any help here).
But having integration testing for this stack can be quite complicated. If we want to be as close as possible to reality we need:
Eureka
instance.Eureka
.One way to do this is to set up a dynamic test environment with Eureka
and some applications but, depending on your organization, this can be really hard to achieve. Another way is to start all this in a Single JVM managed by JUnit thus integration with any build tool and CI platform will be really easy.
The drawback of this can be strange behaviors due to the Spring auto-configuration mechanism, it’s up to you to choose to make it in containers or this way, depending on what you can do.
To achieve this we will need to solve:
SpringApplication.run(...)
and play with the resulting ConfigurableApplicationContext
.--server.port
in SpringApplication.run(...)
with SocketUtils.findAvailableTcpPort()
, not even a problem.--spring.config.location
with a specific configuration in our SpringApplication.run(...)
and we can have separate configurations.Eureka
server port. For this one we will need to ensure that Eureka
is the first one to start (not needed for production, our client can handle this very well but will be annoying for tests) and then give the Eureka
port one way or another to the other applications.--spring.jmx.enabled=false
(or change the default domain using --spring.jmx.default-domain
with a different name) and we are OK.Archaius
to manage their configuration, not the default Spring configuration system. Archaius
takes Spring Boot configuration into account when the first application starts on the JVM, for the next one they aren't taken into account at the moment I'm writing this (check ArchaiusAutoConfiguration.configureArchaius(...) there is a static AtomicBoolean
used to ensure that the configuration isn't loaded twice and "else" there is a TODO and a warn log). For our tests we will go for an ugly fix for this, reloading this configuration in an ApplicationListener<ApplicationReadyEvent>
will do the trick.I have done this here using mainly JUnitRules to handle the applications parts, feel free to take it if you like it and adapt those tests to your needs.
At the time of this writing, the project takes ~45sec to build, which is very slow considering that most of this time is for integration tests on already battle tested code... but I really don’t want to miss a breaking change in my usage of this great stack so I consider this time to be fair enough.
If you don’t need it remove the part testing circuit breaking on all HTTP error codes since those tests are very slow due to the sleeping phase…
Once again, really take the time to make strong integration tests on your usage of this stack to avoid really bad surprises after some months!!!
Depending on what you want to build, what we have here can be more than enough on the application side but if you are planning to use this in the real world, you really need some good metrics and alerts (at least to keep an eye on your fallbacks and circuit breaker openings).
For this you can check Hystrix dashboard and Turbine to provide you with lots of useful metrics to get dashboards with lots of those:
You will then need to bind it to your alerting system, this will need some work and you are going to need to handle LOTS of data since those tools are really verbose (if you want to persist that data pay attention to your eviction strategy and choose a solid enough timeseries infrastructure). Depending on your needs and organization tools a simple metrics Counter on your fallbacks can do a good job. Once set up in your applications this will only need a @Counted(...)
on your fallbacks methods.
It is also possible that the few tools discussed here are not antifragile enough for your needs, in that case, you can start by checking:
Retryer.Default
to see the default retry strategy but this is kind of misleading in two ways:
Retryer.NEVER_RETRY
(check FeignClientsConfiguration.feignRetryer())Retryer
Bean to Retryer.Default
you won't get feign level retries by default because it is also important to check ErrorDecoder.Default
to see that we have a RetryableException
only when there is a well formatted date in the Retry-After
HTTP header.ErrorDecoder
that ends up in RetryableException
in the cases you want (or add the Retry-After
header in your services).Retryer
to the one actually retrying.Feign.Builder
Bean (be careful to keep the @Scope("prototype")
) to suit your needs.This stack really is great and every developer using it daily will enjoy it, at least after one guy in the team spends days setting all this up to check some “vital” points :
Eureka
is secured and not a SPOF (even without Eureka
up and running the apps can talk to each other, at least for a fair amount of time)In my opinion, this is a really great stack that needs a lot of work and understanding. So, make sure to use it only if you need it and otherwise stick to RestTemplate
until you have time to give it a good try!