Configuring gRPC retries

I never understood the difference between the service oriented architecture and the micro services really. Yeah SOAP is archaic now and JSON saved us all, but in fact, they all look like applications making requests and expecting responses over the network. And similarities don’t end there. Like what to do when the application, a) doesn’t receive a response, or b) receives a response it doesn’t expect.

The problem is because of two basic facts: first, applications are non-sentient things we code with our limited knowledge; and second, they run on computer hardware while trying to communicate over the network, and both are physical things with tendencies to break every now and then.

If at least one side of the communication was a sentient being (like some people), they could easily request whatever they requested in the first place again, by clicking the refresh button in their browser for example. Some of those sentient beings might even be able to decide whether it was safe to refresh, during a credit card transaction for example.

Or if the network and the hardware those applications run on were 100% reliable and there was no randomness or entropy in the entire universe, then all requests would have expected responses. But if everything would work as expected and all results known beforehand, would any application need to send any request at all? … Anyways.

In the real world, engineers don’t throw meaningless philosophy to technical problems. They solve them by increasing their limited knowledge, one drop at a time. In this instance, the solution is to expect some more of the normally unexpected conditions: a) place a timeout for the responses, and b) retry the unexpected ones. To do that they need to learn and code more stuff into their applications.

But software developers are generally lazy. That’s why most of them are in the business to begin with: Program a device to do some stuff so someone doesn’t have to do it manually. So when gRPC says “instead of coding all the error handles, retry counts, delays and changes in delays; just say and I will retry for you” it’s too good an offer to just pass. We still need to figure out how to pass our intention to gRPC though.

Documentation talks about a configuration file to specify during client creation but not the file format. Someone in stackoverflow shows an example but not where you put it. In short it’s easy to get lost and start coding from scratch. Yeah, as lazy as developers might be, they love coding the hell out of everything. Just chill, here it goes.

First of all the config file mentioned above:

{
    "methodConfig": [
        {
            "name": [
                { "service": "mynamespace.MyService" },
                { "service": "mynamespace.MyOtherService" }
            ],
            "timeout": "1s",
            "retryPolicy": {
                "maxAttempts" : 5,
                "initialBackoff" : "0.1s",
                "maxBackoff": "30s",
                "backoffMultiplier": 3,
                "retryableStatusCodes": [ "UNAVAILABLE" ]
            }
        }
    ]
}

We name this file like service-config.json and place it somewhere our application will be able to read. Now we need to make the channel use that config, and it is a bit ugly depending on the language. For c# it goes like:

var config = File.ReadAllText("service_config.json");
var options = new List<ChannelOption>();
options.Add(new ChannelOption("grpc.service_config", config));
var channel = new Channel(
        "my-grpc-server:4456",
        ChannelCredentials.Insecure,
        options);

For python:

config = open("service_config.json").read()
channel = grpc.insecure_channel(
        "my-grpc-server:4456",
        options=(("grpc.service_config", config),))

To check if you really got gRPC to retry your requests for you, you can use some environment variables to get it talking:

export GRPC_VERBOSITY=debug
export GRPC_TRACE=server_channel,client_channel_call

Now get a hammer to break your server/network to see if it really works. Cheers!

UPDATE: Just go and check the official example for java.

UPDATE2: And comes the official example for c#.