Tuesday 28 February 2017

Part1: Scaling micro-services with scribble types

Video

"'Types for documentation' is one of the main benefits of types espoused by practitioners (who don't care so much for formal type safety theorems and such). Explicit languages and tools for protocols are indeed needed to obtain analogous benefits in the application domains... like microservices."
Dr Ray Hu, main scribble contributor.

"Types are the leaven of computer programming; they make it digestible."
Robin Milner, Turing award recipient, Fellow of the Royal Society.

"Types" is one of those words that we often use but do not necessarily understand nor appreciate their role. Types are the basis of all computer languages and as type theory has advanced (i.e. type inference) so have languages (i.e. Java 8, Python, Scala, Go). They were the basis of our move to structured programming embodied by languages such as Pascal and Ada, They were the basis of our move towards object oriented language from SmallTalk to Java.

Our notion of types is usually little more than a doff of the cap towards Alan Turing but without having read the paper to understand quite why. And that is all okay. The practitioners continue to take advantage of results and the academics continue to understand more about what types mean and how wide their scope if to both languages and the very process of developing software solutions. Scribble is the embodiment of a new understanding of types and what they might do for us [Benjamin Pierce]. It is a language for defining protocols, by which we mean the ordered interactions that might be considered valid between two or more participants that wish to exchange messages of some sort. We might consider it as a way of describing a bunch of sequence diagrams that illustrate such a conversation.

"Scale" is a funny old term too. We use it in one way when sometimes the receiver thinks of it in another way. To be clear we look at how scribble can be used to scale our ability to consume and deliver change independent of the increasing velocity of change and independent to the increase in the complexity of the set of micro-services we need.

What this means is that it doesn't matter how fast change needs to happen nor how many micro-services we have nor how complex the interconnectivity between the micro-services,  scribble will maintain that pace of change from the first iteration onwards. All of this set in the context of an agile continuous integration and delivery pipeline and a cloud execution venue.

Scribble is a language for defining multi-party chatter between any number of end-points, - or if you prefer the description of the choreography that describes the same between a set of micro-services. We call this a business protocol rather than a choreography but they are one and the same.

We all know by now how micro-services and the use of stateless service idioms can scale to internet size supporting millions of concurrent users all consuming and providing data. We know this in part because of Netflix, Uber and Spotify as well as many other. We also know that the world does not comprise of a stateless world and so our services need to reflect this.

When we cannot reflect back in the same terms as we understand then the complexity we are operating under reduces our ability to generate value through innovation and customer flexibility reduces because it starts to take every more time to integrate what we need with what we have. A simple new product for a retail company, a bank, an insurance company, in fact more-or-less any enterprise need to market, take orders, book sales using the same systems that are used today.

We might say that we will take a vertical slice through an organisation, taking a bit of the existing business functionality to do the market, and account for sales of the product. But this requires integration if we reuse anything and requires re-development if we do not.

We can use bounded contexts to further split things and set off several tracks of an agile train for each bounded context that deals with the product and it's lifecycle and we can find the tracks and manage the integration and development needed to do it all.

We can decide that the bounded context is the product and set of tracks of related micro-services and away we go.

It all sounds fairly methodical, but the reality is that it all starts well, if you are lucky, and seems to be working until the number of micro-services deployed reaches more than the span of control - normally around 7.

We might provide a summarised, aggregated view of what these micro-services are doing that maps to the business idiom so that the business can understand how things are doing in their own terms. But that aggregated is simply masking the underlying complexity that has arisen. Some of the complexity derives from the chatter that exists between micro-services, the choreography.

We can claim to have made it more robust against failure by using horizontal scaling inherent in stateless services.

We can claim to have made it easier to change through loose coupling and asynchronous messaging.

But in truth these mask the problem. We talk about this chatter and the loose coupling as somehow being the solution to the "choreography" but in fact we have not written it down. Rather it is evolved without much constraint and results in incorrect behaviour,  retries in production, hacked fixes and emergency remediation. If we had only written it down so that we could think of it as a type then all of our API's would be typed correctly, we would have an immediate view of how the system is performing against a described choreography - we choose to call it a business protocol.

The problem today is worse when we try to implement what is a stateful in the real world as stateless in the micro-service world. The additional complexity this adds results in the implementation, and so what is measured and monitored, to be semantically distant from the real world stateful behaviour. With nothing written down and existing as tacit knowledge within teams that own the lifecycle of a set of micro-services the risk and complexity it adds as things scale up becomes a significant business burden. If teams have a high attrition rate knowledge leaches away and over time things just become legacy.

We might of course say, but we can throw away micro-services that no longer fit our needs. And of course we can. But when this becomes the dominant idiom the cost starts to bite. The flexibility, our ability to rapidly evolve micro-services also starts to drop as we are forced into ever more testing cycles to see if we can determine what the choreography rules are and need to be.

We do have the benefit of really good CDCI pipelines and highly flexible cloud execution venues. So much of the complexity needed to support software solutions from requirements through development and deployment find use of the DevOps revolution. But the cost is certainly not optimal. Cost models tend to have a residency charge and as micro-services scale you can thing of paying a residency charge for all of them, including all of the replicants needed for horizontal scaling. That mounts up and when you step back and consider the usage of the set of instantiated micro-services amortised over the business transactions you can identify waste in the system. That waste is the set of micro-services that did not get used in a business transaction. That is a lot of waste.

All of these problems go away if we could only write down the choreography, the business protocol, in a way that we can guide our developers to do the right thing as regards micro-service communication from inception through all of the iteration to delivering software an value out of the end of a CDCI agile pipeline. If we could only write down the business protocol in such a way as to ensure the services are doing what they should at runtime and spot any deviation from it. If we could only .... in such a way as to dynamically instantiate only that that we use to eliminate waste.

What I want to do next is show you how we can write down a choreography and make manifest solutions to these problems. It is not all done yet but I hope you get the idea from the video below:

Video Link