Wednesday 8 March 2017

OSGi, Modularisation and Micro-services

I have now refreshed my memory on OSGi thanks to Richard Nicholson's comments. Richard is correct of course. Modularisation is a key component of scale too. But in todays world of Docker, Kubernetes, OpenShift and the plethora of other DevOps/CDCI related modules, we only deal with the software and hardware infrastructure and not the behaviour of a solution to a business problem. Now that we have execution venues (PaaS and SaaS) we can focus more of our attention in the complexities of delivering that behaviour over a collection of loosely coupled micro-services. 
For read-only (stateless) we have an internet-scalable pattern that works well, is transparent (that is we can monitor and understand in the business domain with little additional effort) and is flexible against changing need, through DevOps and CDCI pipelining.
But for stateful behaviours we are less equipped and often try to map stateful to stateless idioms to achieve scale, but at the cost of flexibility and transparency. The purpose of scribble is to bridge that gap and make it just as easy to build, if not easier, stateful solutions that are closer to the business domain that they support with no less flexible.
That is to say, that with scribble, we can be deployed just as fast as stateless solutions so we do not slow the velocity of change, we reduce any additional effort to monitor in the business domain by keeping ensuring our stateful interactions are a reflection in the business domain, and enabling rapid recovery in the event of a "modular" container by being able to instantiate on the fly and in a specific state.
From a Total Cost of Ownership (TCO) perspective we reduce the cost of development by removing errors related to stateful interaction and moving SIT into a virtual SIT world, we maintain the same cost of deployment into test and production environments, we reduce the cost of monitoring in the business domain and add further value by being able to measure what we observe against what was expected and we lower the operational costs associated with tenancy through lazy, just-in-time behavioural instantiation.

Monday 6 March 2017

A Living Chronology on scaling micro-services

I decided to post this because I wanted to give more context to my previous blog entitled "Scaling micro-services with scribble types".

This chronology, which I kick off today, is all about the journey to solving some of the hardest problems in scaling stateful micro-services, of which the first entry in the chronology was that previous post. In the next entries I will chart the progress, describe the challenges along the way and how they get resolved. It so doing I plan to take input from anyone. The blog is one means of gathering suggestions, which if you do give any, I would like to share.

Having looked long and hard at the gap between practitioners and academics in this digital world and having looked at how we try to overcome and improve through lean and agile ways of working, I think that involving as many as I can will lead to better tangible results for some key academic gains in recent years. Foremost of these gains is session types, which derive from Robin Milners pi-calculus.

In effect we take session types, embodied in scribble and use scribble to help us implement just-in-time, behavioural serverless architectures that lower the cost of ownership, to help us write better collections of micro-services which work faster, to help to lower the cost of value delivered whilst increasing the quality of what get delivered, and to help us understand the processes that get enacted by micro-services on our behalf in real-time, which results in lower cost and higher volumes of business transactions.

There may well be many pitfalls ahead but thus far it is all looking good. I look forward to wider collaboration.

Tuesday 28 February 2017

Part1: Scaling micro-services with scribble types

Video

"'Types for documentation' is one of the main benefits of types espoused by practitioners (who don't care so much for formal type safety theorems and such). Explicit languages and tools for protocols are indeed needed to obtain analogous benefits in the application domains... like microservices."
Dr Ray Hu, main scribble contributor.

"Types are the leaven of computer programming; they make it digestible."
Robin Milner, Turing award recipient, Fellow of the Royal Society.

"Types" is one of those words that we often use but do not necessarily understand nor appreciate their role. Types are the basis of all computer languages and as type theory has advanced (i.e. type inference) so have languages (i.e. Java 8, Python, Scala, Go). They were the basis of our move to structured programming embodied by languages such as Pascal and Ada, They were the basis of our move towards object oriented language from SmallTalk to Java.

Our notion of types is usually little more than a doff of the cap towards Alan Turing but without having read the paper to understand quite why. And that is all okay. The practitioners continue to take advantage of results and the academics continue to understand more about what types mean and how wide their scope if to both languages and the very process of developing software solutions. Scribble is the embodiment of a new understanding of types and what they might do for us [Benjamin Pierce]. It is a language for defining protocols, by which we mean the ordered interactions that might be considered valid between two or more participants that wish to exchange messages of some sort. We might consider it as a way of describing a bunch of sequence diagrams that illustrate such a conversation.

"Scale" is a funny old term too. We use it in one way when sometimes the receiver thinks of it in another way. To be clear we look at how scribble can be used to scale our ability to consume and deliver change independent of the increasing velocity of change and independent to the increase in the complexity of the set of micro-services we need.

What this means is that it doesn't matter how fast change needs to happen nor how many micro-services we have nor how complex the interconnectivity between the micro-services,  scribble will maintain that pace of change from the first iteration onwards. All of this set in the context of an agile continuous integration and delivery pipeline and a cloud execution venue.

Scribble is a language for defining multi-party chatter between any number of end-points, - or if you prefer the description of the choreography that describes the same between a set of micro-services. We call this a business protocol rather than a choreography but they are one and the same.

We all know by now how micro-services and the use of stateless service idioms can scale to internet size supporting millions of concurrent users all consuming and providing data. We know this in part because of Netflix, Uber and Spotify as well as many other. We also know that the world does not comprise of a stateless world and so our services need to reflect this.

When we cannot reflect back in the same terms as we understand then the complexity we are operating under reduces our ability to generate value through innovation and customer flexibility reduces because it starts to take every more time to integrate what we need with what we have. A simple new product for a retail company, a bank, an insurance company, in fact more-or-less any enterprise need to market, take orders, book sales using the same systems that are used today.

We might say that we will take a vertical slice through an organisation, taking a bit of the existing business functionality to do the market, and account for sales of the product. But this requires integration if we reuse anything and requires re-development if we do not.

We can use bounded contexts to further split things and set off several tracks of an agile train for each bounded context that deals with the product and it's lifecycle and we can find the tracks and manage the integration and development needed to do it all.

We can decide that the bounded context is the product and set of tracks of related micro-services and away we go.

It all sounds fairly methodical, but the reality is that it all starts well, if you are lucky, and seems to be working until the number of micro-services deployed reaches more than the span of control - normally around 7.

We might provide a summarised, aggregated view of what these micro-services are doing that maps to the business idiom so that the business can understand how things are doing in their own terms. But that aggregated is simply masking the underlying complexity that has arisen. Some of the complexity derives from the chatter that exists between micro-services, the choreography.

We can claim to have made it more robust against failure by using horizontal scaling inherent in stateless services.

We can claim to have made it easier to change through loose coupling and asynchronous messaging.

But in truth these mask the problem. We talk about this chatter and the loose coupling as somehow being the solution to the "choreography" but in fact we have not written it down. Rather it is evolved without much constraint and results in incorrect behaviour,  retries in production, hacked fixes and emergency remediation. If we had only written it down so that we could think of it as a type then all of our API's would be typed correctly, we would have an immediate view of how the system is performing against a described choreography - we choose to call it a business protocol.

The problem today is worse when we try to implement what is a stateful in the real world as stateless in the micro-service world. The additional complexity this adds results in the implementation, and so what is measured and monitored, to be semantically distant from the real world stateful behaviour. With nothing written down and existing as tacit knowledge within teams that own the lifecycle of a set of micro-services the risk and complexity it adds as things scale up becomes a significant business burden. If teams have a high attrition rate knowledge leaches away and over time things just become legacy.

We might of course say, but we can throw away micro-services that no longer fit our needs. And of course we can. But when this becomes the dominant idiom the cost starts to bite. The flexibility, our ability to rapidly evolve micro-services also starts to drop as we are forced into ever more testing cycles to see if we can determine what the choreography rules are and need to be.

We do have the benefit of really good CDCI pipelines and highly flexible cloud execution venues. So much of the complexity needed to support software solutions from requirements through development and deployment find use of the DevOps revolution. But the cost is certainly not optimal. Cost models tend to have a residency charge and as micro-services scale you can thing of paying a residency charge for all of them, including all of the replicants needed for horizontal scaling. That mounts up and when you step back and consider the usage of the set of instantiated micro-services amortised over the business transactions you can identify waste in the system. That waste is the set of micro-services that did not get used in a business transaction. That is a lot of waste.

All of these problems go away if we could only write down the choreography, the business protocol, in a way that we can guide our developers to do the right thing as regards micro-service communication from inception through all of the iteration to delivering software an value out of the end of a CDCI agile pipeline. If we could only write down the business protocol in such a way as to ensure the services are doing what they should at runtime and spot any deviation from it. If we could only .... in such a way as to dynamically instantiate only that that we use to eliminate waste.

What I want to do next is show you how we can write down a choreography and make manifest solutions to these problems. It is not all done yet but I hope you get the idea from the video below:

Video Link

Friday 31 May 2013

Revolutionising Software Quality Assurance




Executive Summary

In this blog we examine current software quality assurance and software delivery methods and look deeper to understand why software defects occur. Understanding the root causes of why software defects occur is really the only assured way to move to methods of software quality assurance and software delivery that guarantee resulting quality increases with attendant software delivery efficiency. Having identified the root causes we go on to map out how we can revolutionise both our expectations of software quality and software delivery by leveraging automation founded on mathematics and engineering practices.

The context

We are increasingly seeing terms such as "Software Quality Assurance", "Total Quality", "Zero Defects" and "CMMI" in our world of IT. From heads of procurement, quality directors and CIOs the clarion cry for better quality can be heard both in the corridors and the in board rooms with suppliers. We even hear it within the software service industry itself as we all try to deliver faster, cheaper with fewer defects higher quality and lower risk of failure.

Over the past half century our software industry has come up with yet another approach to software delivery, from the standard waterfall, to the V-model to agile delivery. And yet we are not happy. Waterfall projects don't seem to reflect the needs of the business fast enough in an ever-changing world, the V-model is similar and agile delivery is challenging our need for scale, visibility and predictability.

We all want to deliver faster, cheaper without compromising quality and yet we share the view of the great and the good that "quality is the pimple on the arse of progress".

In the IT world we do look at how quality and process efficiency can help manage the tension between velocity of delivery and quality of output in a changing world. The move towards using "lean" execution derives from the automotive industry. The adoption of Quality Function Deployment (QFD) and its sister House of Quality (HoQ) into six-sigma, a sign of quality in itself, was a result of that same lean movement from Toyota.

According to a one-time examiner for the institute of quality assurance, applied largely to the aerospace industry. He says "Quality Assurance is essentially about closing the loop between a customer’s requirements and what is delivered.”8, something we often miss. When we test we test against requirements but are they a true reflection of what the customer wants and needs?

In the practice of Software Quality Assurance we often take a leaf out of the standard works on quality engineering, just as we do with lean. But we fail to understand that "both mechanical and electrical/electronic items or systems exhibit variance which does not exist in the digital software arena and much of traditional Quality Control is about that. A piece of code is non-variant. It isn’t right within defined tolerances it is either right or wrong".

We need to truly understand why software defects occur in order to postulate any solution for software quality assurance and software delivery methods that support our desired notion of quality.

The root cause of software defects

This fundamental observation, that cries out to be heard is that software is either right or wrong. This needs to be at the forefront of our desire for software that is delivered faster such that quality is increased and, by extension, customer wants and needs are satisfied. All of the delivery methods and quality methods that get deployed fail to address this fundamental observation and instead rely on people to interpret customer wants and desires in a succession of refinements from customer interviews to wire frames to the written word, the painted screen, into programming languages or configurations into executable software that does stuff.

They say the devil is in the detail and the devil in this detail is always rooted in translations “from one to another”. The more ambiguous they are, the higher the propensity for errors. We say errors rather than defects because “defect” is too softer a word. A defect in metal casting may occur because air bubbles get trapped. In software we only have errors because the software is wrong. In a perfect world, a computer program is just maths and it can be proven right or wrong against a set of axioms. And if it is wrong it can be said to erroneous.

Thus the root cause of software defects is ambiguity that leads to erroneous interpretations, from one level of refinement to another and in doing so has moved away from the customers requirements.

The way forward

If we step back and consider this problem of ambiguity it lies in the fact that the language employed at any level does not have sufficient semantics and structure to support any formal abstraction or indeed refinement.

On the one hand, refinement is the process we enact when we add detail, but in adding detail we want to preserve the semantics. In our complex world the only languages that have rigor and have really precise semantics are mathematically based; Java has an interpreter and a runtime that enforces semantics; UML class diagrams and the translation into Java is based on formal semantics. On the other hand, abstraction is really an ability to pull out some structure from something more detailed; a code review often looks at the structure to see if it matches to some higher-level description. Ideally we want languages to support requirements at different levels so that we can show formally refinement and abstraction. We want the same for design and we already have it for coding.

JT observed, "There are however methods such as mathematical methods that can provide high confidence that the system will be fault free and that is what QA is about". So what mathematical methods can we use? What are the limitations of those methods and will their use give us total confidence or are we missing something?

If we break the SDLC down into the standard phases and examine them one at time, we start to see that the axioms upon which proof relies are themselves statistical in nature.

Those axioms are an expression of a customer’s wants and needs, whereas designs, code and executables have a more formal algebraic relationship that lends itself to proof.

This latter step we take for granted but consider a simple piece of java (or any programming language). The computer on which that code executes does not understand the programming language, it understands only binary configured for it's specific CPU and operating system. We use a compiler to do the translation and we never question its correctness in so doing. Under the covers the compiler is running mini-proof checkers to make type judgments and that ensures the translation is correct.

There is no reason, if we can find the right maths, why we cannot do the same between requirements and design and design and code. And if we could, and hide it all away just as a compiler does, use it to ensure designs express requirements and code expresses designs. The maths we need to leverage, and hide, has to be "Turing Complete" so that it expresses or captures all that can be computed. It has to be capable of dealing with the complexity of integration so that it captures the way in which components and services talk to each other. The latter should be in that integration to ensure, in this modern mobile and cloud age, that we can capture changing connections (i.e. moving from one cell to another in a mobile network or from one cloud to another). It needs to be able to capture the very basics of business transactions and what they might mean and how we might understand them. The Pi-calculus, which netted the late Prof Robin Milner a Turing award, does all of this. We won't go into detail, it is sufficient to know that such a mathematics exists and can be used.






But such algebraic proofs are not sufficient if the axioms upon which they are based are statistical and not absolute. Thus proving an executable implements some code which implements some design which meets some requirements is all well and good if we can say that the requirements are correct, because then we know through proof, not testing, that the executable is correct. But, alas, we can never prove requirements are correct.

If the fundamental tenant of Software Quality is that the software can be said to meet the wants and needs of the customer who requested it then, we need to look deeper into what sort of requirements need to be met and how we know or have confidence in those requirements. If we can do this, then leveraging an algebraic proof from requirements to executable code can reduce time to deliver and reduce the number of errors because we can get rid of the translation errors and quality as a measure of confidence can rise to the same levels we have for our requirements.

We mentioned HoQ before its adoption with QFD in six-sigma. But few six sigma practitioners, even those that are black belts, use either QFD or HoQ. And yet the seeds of confidence in requirements being correct, lay here. Joining up HoQ and the mathematics of Pi-calculus in such a way provides what is needed. In effect we ensure that the confidence levels are high in the requirements being correct and that means the axioms against which more algebraic proofs are made share the same confidence. This contrasts with how we do things today where the confidence levels that requirements are correct is often not very high and is compounded through design, code and test. This is why testing costs so much and why coding is inefficient with a lot of re-work happening through iterations in test and back to coding.




HoQ works through a simple statistical approach that relates stakeholders to requirements and enables a clear alignment, as requirements are refined back to the stakeholders who are impacted by the change and back to the higher-level requirements, business drivers and goals that necessitate change. Combining both cloud technology and HTML5 enables HoQ to be used on tablets such as iPads and Androids as well as on all other devices. This combination of mobility, low set-up costs provides a way to engage stakeholders in a structured discussion in order to tease on requirements, prioritise them based on the differing views of the stakeholders in a balanced way as well as capturing dependencies along the way. This technique brings well-known, statistical confidence to the process at each stage and through the refinement into actionable requirements. It encourages consensus across the stakeholders that increases the confidence in the actionable requirements.

If our actionable requirements are now deemed to be correct at a high balanced confidence level we have confidence in the axioms against which we can use algebraic proof. If we can then show that a design is correct by proof against the requirements we can assert that the confidence in the design is at the same level of confidence as the requirements. Furthermore we can then generate artefacts to drive coding and structure what is coded from that design and we can check that when the coding comes together in systems integration testing is conformant to that design. This provides us with the same level of confidence that what is coded is also correct against the requirements as it can be algebraically checked against the design itself.

As JT points out “The art of standing in the position of the customer or end user is so often missing. For some reason the independence of thought required to forget the processes that stand between requirement and assurance and to close that loop is often missing in both hardware and software.”. We can of course leverage all of this mathematics, automate things that were not automated before but still deliver something that does not meet the customers expectations. This stems from an inability to put ourselves in the position of the customer. To do this we can leverage HoQ, because one of the stakeholders is the customer or end user of what we produce. So at one level we can ensure that their needs are met and prioritised correctly. At another level, if we ensure alignment both statistically and algebraically of those customer and end-user needs we can drive automation in testing and in conformance checking to ensure that the customer and end user needs are shown to me met.

Conclusions

We have looked at the root cause of software defects (errors) and found it to be based on ambiguous communication and the multiple interpretations that result. We have made clear reference to the customer or end-user and looked at how we can put ourselves in their position and so ensure their needs are met. We have taken the postulation that mathematics, can provide a better solution to ridding or minimizing software defects and we have shown how we might do this using statistical methods for requirements and algebraic proofs thereafter. Thus the use of mathematics, postulated by JT, is a reality not a pipe dream but a practical reality that can help revolutionise both software quality assurance and delivery methods as we use automation techniques based on both statistics and hard-core algebra to reduce ambiguity, speed up delivery and increase the quality of the result.

Last but not least, if you hadn't guessed, the ZDLC Platform does exactly what we have presented. And it's not just is that think this, look at Ovum too.


References

1.     John R. Hauser (1993) How Puritan-Bennet used the house of quality. Sloan Management Review, Spring, 61-70.
2.     John R. Hauser & Don Clausing (1988) The house of quality. Harvard Business Review, May–June, 63-73 
3.     Terninko, John; Step-by-Step QFD Customer Driven Product Design; Second Edition, St. Lucie Press, 1997.
4.     Shillito, Larry M.; Advanced QFD Linking Technology to Market and Company Needs; John Wiley & Sons, Inc.; 1994.
5.     Day, Ronald G.; Quality Function Deployment-Linking a Company with its Customers; ASQC Quality Press; 1993.
6.     “House of Quality”, Jennifer Tapke Allyson Muller Greg Johnson Josh Sieck
7.     “Strategic Priorities and Lean Manufacturing Practices in Automotive Suppliers. Ten Years After.”  Juan A. Marin-Garcia and Tomas Bonavia
8.     Private Letter From John Talbot (former examiner for the institute of  Quality Assurance”.
9.     Private copy of “The Book of Kimbleisms”. Richard Kimble.
10.  “The Polyadic pi-Calculus: A Tutorial” Professor Robin Milner, LFCS report ECS-LFCS-91-180, School of Informatics Edinburgh University



Thursday 30 May 2013

We have busy doing cool stuff

For over 3 years since I joined Cognizant we have been busy building out a new platform for the engineering of software. We call it the Zero Deviation Lifecycle and it comprises all of the work I have been doing since about 2000 and combines it with thinking from Dr Bippin Makoond. Visit the ZDLC blog page.

Wednesday 5 December 2012

The passing of Kohei Honda. A great scientist and a great man

I met Kohei (and Nobuko) through Alexis Richardson who pioneered work on messaging with AMQP. Alexis was a pi-calculus guy as was I and he knew both Kohei and Nobuko (Kohei’s wife). On the other hand I knew Professor Robin Milner quite well, he was my mentor. That all happened in 2002, some 10 years ago now. Kohei joined W3C along with Nobuko and Robin as invited experts to ensure that WS-CDL, the basis of Testable Architecture, Savara and so many things that we take for granted. All based around pi-calculus for which we all owe Robin Milner a huge debt. But pi was always hard to understand except for the gifted mathematician. Kohei with a fellow researcher, Vasco Vasconselas, came up with an elegant addendum to pi called sessions and so began work on session typing. This work, above all else, informed WS-CDL, Savara, Testable Architecture and many others as to how to use pi-calculus so as to gain practical benefits. Indeed it is the core of WS-CDL in it’s search to understand behaviors over a distributed system.

That Kohei became an invited expert on ISO’s UNIFY project to unify messaging standards for financial information exchange and an invited expert to the Japanese banking system all of which demonstrated that Kohei really understood how academia could support business.

Kohei was a picture of enthusiasm with a vision and a feel for aesthetic qualities in all that he did. This was never more apparent than his work on session types and scribble (his language that has become known as son of WS-CDL) applied to business problems.

I have been so lucky to have had the chance to work with Kohei over a long period. I have been even luckier in being able to introduce him to some other great people of whom Dr Bippin Makoond is one along with Matthew Arnott from the Tsunami early warning project in the USA, to Matthew Rawling CTO at UBS and chair of UNIFY, Dr Gary Brown at Redhat (my long time friend and colleague) to all my old colleagues in W3C. Spreading his enthusiasm and explaining complex things in easy ways was a gift that few of us have but that Kohei had in spades.

I was humble and grateful to know him as a colleague and as a friend. He was gentle with the heart of a giant and the mind of a great scientist. His work will stand the test of time and his presence will, for this soul, be forever there.

I shall miss him greatly and my heart and support goes out to Nobuko and Kohei’s wider family for their loss.

Monday 6 June 2011

Albertina Sisulu, Who Helped Lead Apartheid Fight, Dies at 92

Some of you may know that I am half South African. Born of an English/Irish father and a Cape Coloured Mother. I grew up in Dover in the days when racism in England was rife and I like many other suffered at the hands of it.

I started working with Anti-Apartheid organisation in the mid-1980's until we gained freedom and democracy in South Africa. In 1988 I was very luck to meet Ma Sisulu. Not too many people manage to silence me without doing anything but Ma Sisulu I was in complete awe of. I also met Lindwe who at that time as studying in the UK and I met Zwelakhe, who was the subject of one of the longest banning orders meted out by the apartheid regime.

Ma Sisulu for me is a true patriot for all that is fair and just. She was and will remain the mother of the Rainbow nation. And I feel honored to have met her and sad to see he passing.

Amandla!