On Tech

Author: Steve Smith (Page 7 of 10)

Announcing the Build Quality In book

Continuous Delivery and DevOps experience reports from the wild

I am delighted to announce Matthew Skelton and I are co-authors of a new book Build Quality In – a collection of Continuous Delivery and DevOps experience reports from the wild.

The interdependent disciplines of Continuous Delivery and DevOps can transform the fortunes of an organisation, but they are hard. Matthew and I want to help people on their Continuous Delivery and/or DevOps journey, by sharing the experiences of those who have done it – what worked, what didn’t, and the highs and lows of trying to build quality into an organisation.

Build Quality In will be an anthology of first-person narratives from Continuous Delivery and DevOps practitioners. We have an incredible group of contributors who have agreed to give up their own time to tell their stories, with Dave Farley writing the Continuous Delivery foreword. Further announcements on contributors and a DevOps foreword are pending.

Matthew and I are donating 70% of author royalties for Build Quality In to Code Club – a not-for-profit organisation that runs a UK-wide network of free volunteer-led after-school coding clubs for children aged 9-11. We passionately believe that diversity within the IT industry must improve, and we want to contribute in some small way.

Right now we are in our launch phase. If you are interested in learning more about Build Quality In, please visit our Leanpub landing page. If you are interested in purchasing our book, please register on the Leanpub landing page and indicate how much you would be willing to pay. Over the next few weeks the first contributors will submit their draft articles, and we will announce publication via Build Quality In on Twitter and a pending book website.

Register your interest

More releases with less risk

Continuous Delivery reduces defect probability and cost

Continuous Delivery often challenges conventional wisdom within the IT industry, and by advocating the rapid release of value-add to reduce risk it contradicts the traditional belief that a low release cadence is an effective risk reduction strategy. How can releasing software more frequently reduce both defect probability and defect cost?

The probability of a defect is the likelihood of a change within a changeset unexpectedly impeding value-add and imposing an opportunity cost. Given the defect probability of a changeset is proportional to its size we can calculate the defect probability of a change as follows:

Fix More With Less - Defect Probability

n = number of changesets
probability = (1 / 2n) * 100 [percentage]

The above formula indicates that decreasing changeset size by increasing the number of changesets will reduce defect probability, and this is confirmed by Don Reinertsen’s assertion that “many smaller experiments produce less variation than one big one“. For example, if a change is released in 1 changeset there is a 1 in 2 chance or 50% probability of failure. If it was instead released in 3 changesets there would be a 1 in 8 chance or 12.5% probability of failure.

The cost of a defect is the product of cost per unit time and duration, where cost per unit time represents economic impact and duration represents lifetime.

cost = cost per unit time [currency] * duration [unit time]

A defect has an inception date at its outset, a discovery date when diagnosed, and a resolution date when fixed. The interactions between these dates and cost per unit time enable a division of defect cost into sunk cost and opportunity cost. The sunk cost of a defect represents the economic damage already incurred at the point of discovery, while opportunity cost represents the economic damage still to be incurred.

Fix More With Less - Defect Cost

sunk cost duration = discovery date – inception date [unit time]
sunk cost = cost per unit time * sunk cost duration [currency]

opportunity cost duration = resolution date – discovery date [unit time]
opportunity cost = cost per unit time * opportunity cost duration [currency]

cost = sunk cost + opportunity cost [currency]

As cost per unit time is controlled by market conditions it is far easier to reduce opportunity cost duration by shortening lead times. This can be accomplished via batch size reduction, as Mary and Tom Poppendieck have observed that “time through the system is directly proportional to the amount of work-in-process” due to Little’s Law:

lead time = work in progress [units] / completion rate [units per time period]

Little’s Law is universal for all stable systems in which these variables are consistent long-term averages, and it is mathematical proof that reducing batch size will reduce lead time. For example, if a jug contains 4 litres of water and pours 2 litres per second then it will empty in 2 seconds. If instead the jug contained 2 litres of water and still poured 2 litres per second it would empty in 1 second.

Releasing smaller changesets more frequently into production can also reduce sunk cost duration, as small batches accelerate feedback. A smaller batch size will decrease the lead time and complexity associated with each changeset, creating faster feedback loops that will reduce the time required to discover a defect.

Consider an organisation with an average changeset size of 24 changes and an average lead time of 12 days. How can we reduce the defect probability of the next production release R1?

Fix More With Less - Defect Probability Smaller Changeset

n = 1
probability = (1 / 21) * 100 = 50%

Based on the binomial probabilities involved we recommend to the organisation that it reduce defect probability by applying batch size reduction to R1 and splitting its changeset into 2 smaller releases R1 and R2. This would decrease defect probability from 50% to 25%.

Fix More With Less - Defect Probability Larger Changeset

n = 2
probability = (1 / 22) * 100 = 25%

Unfortunately the organisation ignores our advice to release smaller changesets, and the release of R1 at a later date introduces a defect D1 that remains undiscovered for 6 days. D1 impedes a sufficient amount of value-add that a cost per unit time of £20,000 per day is estimated, which means a sunk cost of £120,000 has already been incurred and an opportunity cost of £240,000 is forecast. The organisation immediately triages D1 for a fix, but how can we reduce its opportunity cost?

Fix More With Less - Defect Cost Large

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 12 days * £20,000 = £240,000
overall cost = sunk cost + opportunity cost = £360,000

Given the organisation currently has an average batch size of 24 changes per changeset and a 12 day average lead time, Little’s Law computes an average completion rate of 2 changes per day and informs us that a reduced batch size of 12 changes per changeset would produce a 6 day lead time.

completion rate = work in process / lead time
completion rate = 24 changes per changeset / 12 days = 2 changes per day

lead time = work in process / completion rate
lead time = 12 changes per changeset / 2 changes per day = 6 days

Based on Little’s Law we again recommend to the organisation a halved batch size of 12 changes per changeset, and this time our advice is accepted. A fix for D1 is included in the next changeset released into production in 6 days, which produces an opportunity cost saving of £120,000.

Fix More With Less - Defect Cost Smaller Opportunity Cost

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 6 days * £20,000 = £120,000
overall cost = sunk cost + opportunity cost = £240,000

As well as decreasing the total cost of D1 by 33%, the new lead time of 6 days increases the rate of feedback for future production defects. When a subsequent release introduces defect D2 at a lower cost per unit time of £10,000 per day the reduced size and complexity of the offending changeset means D2 is discovered in only 3 days.

Fix More With Less - Defect Cost Smaller Sunk Cost

cost per unit time = £10,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 6 days * £10,000 = £60,000
overall cost = sunk cost + opportunity cost = £90,000

When we triage D2 we discover its cost per unit time has decreased to £1,000 per day, meaning its sunk cost is a poor indicator of opportunity cost and its Cost of Delay is lower than expected. Based upon the new 6 day lead time we recommend to the organisation that it defer a D2 fix for at least one release in order to implement pending value-add of greater value than the £12,000 opportunity cost of D2.

Fix More With Less - Defect Cost Even Smaller Opportunity Cost

cost per unit time = 3 days * £10,000, 12 days * £1,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 12 days * £1,000 = £12,000
overall cost = sunk cost + opportunity cost = £42,000

The assumption within many IT organisations that risk is directly proportional to rate of change is flawed, as it assumes a constant large batch size. Risk is actually proportional to size of change, and a low release cadence of large changesets is not as effective a risk reduction strategy as a high release cadence of small changesets. Continuous Delivery enables the release of smaller changesets to rapidly release value-add as well as reducing both the probability and cost of defects.

Organisation antipattern: Consumer Release Testing

Consumer Release Testing is high cost, low value risk management theatre

Despite the historical advice of Harold Dodge that “you cannot inspect quality into a product” and the contemporary advice of Don Reinertsen that “testing is probably the single most common critical-path queue” the Release Testing antipattern remains prevalent in the IT industry, and is by no means limited to standalone applications.

Consider the development of a consumer application that requires data from a provider application in order to fulfill its business capabilities. The consumer team contains developers and testers collaborating upon the Testing Pyramid strategy, which recommends unit/acceptance tests over end-to-end tests on the basis that test execution time is proportional to System Under Test scope. This means the necessary provider interactions are test-driven by the consumer team using the Test Stub pattern, which creates a lightweight provider implementation to supply canned responses back to the consumer.

Consumer Release Testing - Product Team Stubbed Provider

By using a stub the consumer interactions with the provider can be tested in a minimal System Under Test, which ensures that changes made by the consumer team produce fast and deterministic feedback. Success and failure scenarios (e.g. socket failure, socket timeout, provider error code) can be rapidly developed without relying upon a running provider instance, and the consumer team should be capable of rapidly responding to changing requirements in the future.

However, in many IT organisations the consumer team will be hindered by Consumer Release Testing – a phase of post-development end-to-end regression testing of the full consumer and provider stack, performed by a segregated testing team on the critical path.

Consumer Release Testing - Consumer Release Testing

The desire for provider risk mitigation is understandable given that consumer revenues are to an extent dependent upon the provider, but Consumer Release Testing exacerbates the original flaws of Release Testing:

  1. Extensive end-to-end testing – including both consumer and provider in System Under Test scope increases test execution time and maintenance costs
  2. Independent testing phase – dividing authority and responsibility for the consumer results in quality issues and feedback delays
  3. Critical path constraints – working on the critical path means the release testers will always be pressured to reduce test coverage to meet pre-agreed deadlines

By extending the Release Testing strategy it is evident that Consumer Release Testing is itself risk management theatre – it is highly unlikely to uncover any substantial defects in consumer/provider interactions without a significant increase in test coverage, which will drive up product lead times and opportunity costs.

A far more effective risk reduction strategy is to accept the conventional wisdom that testing is an activity not a phase, and move the blameless release testers into the consumer product team. This ensures that all team members are equally invested in product quality and empowers testers to focus upon higher-value activities such as exploratory testing, which has been described by Elisabeth Hendrickson as “particularly good at revealing vulnerabilities that no one thought to look for before“. For example, some exploratory testing off the critical path of the consumer against a running provider instance might uncover some additional error scenarios that would then be fed into the automated unit/acceptance tests.

Consumer Release Testing - Product Team Real Provider

A high value, low cost alternative to Consumer Release Testing is for the consumer and provider to actively cooperate in risk reduction, which can result in a substantial reduction in provider risk. The probability of a provider failure can be decreased by independently testing the conflated concerns of end-to-end testing as follows:

  • Connectivity: the consumer can test provider expectations of consumer connections via release time smoke tests and run time monitoring
  • Compatibility: the provider can test consumer expectations of messaging via build time Consumer Driven Contracts issued by the consumer
  • Conduct: the consumer can test its expectations of provider behaviour via build time API Examples issued by the provider

The cost of a provider failure can be reduced via incremental release strategies such as consumer-side Feature Toggles and provider-side Blue-Green Deployments. These practices encourage a provider release to be gradually phased into production usage, so that the consumer can switch back to the previous provider version if necessary.

This approach is a viable alternative to Consumer Release Testing, but it is of limited value without provider cooperation. If the provider cannot or will not participate in risk reduction then the consumer must assess risk based upon historical provider lead times. As large batch sizes increase risk an infrequent provider release schedule is indicative of heightened risk, and if the cost of failure is significant then a limited form of Consumer Release Testing may be deemed justifiable. In those circumstances the consumer development team should perform end-to-end tests off the critical path using a lightweight test client, so that the slow feedback loops and non-determinism of Consumer Release Testing are diminished.

Application pattern: Proxy Consumer Driven Contracts

Proxy Consumer Driven Contracts simulate provider-side testing

When Continuous Delivery is applied to an estate of interdependent applications we want to release as frequently as possible to minimise batch size and lead times, and Consumer Driven Contracts is an effective build-time method of verifying consumer-provider interactions. However, Consumer Driven Contracts assumes the provider will readily share responsibility for conversation integrity with the consumer, and that can be difficult to accomplish if the provider has different practices and/or is a third-party organisation. How can Consumer Driven Contracts be adapted to an environment in which the provider cannot or will not engage in joint risk mitigation?

For example, consider an organisation in which the applications Consumer A and Consumer B are dependent upon a customer details capability offered by a third-party Provider application. The Provider Contract is communicated by the third-party organisation as an API document, and it is used by Consumer A and Consumer B to consume name/address and name/address/email respectively. Consumer Contracts are made available to the provider for testing, but as the third-party organisation has more lucrative customers and a slower release cycle it is is unwilling to engage in Consumer Driven Contracts.

Consumer Driven Contracts - Consumer Contracts

In this situation a new Provider release could cause a runtime communications failure with Consumer A and/or Consumer B with little or no warning. It is likely that some form of consumer-side integration testing will have to be adopted as a risk reduction strategy, unless the provider role within Consumer Driven Contracts can be simulated.

In Proxy Consumer Driven Contracts a provider proxy is created that assumes responsibility for testing the Consumer Driven Contracts as if it was the provider itself. Consumer A and Consumer B supply their Consumer Contracts to a Provider Proxy, which acts in a provider role.

Consumer Driven Contracts - Proxy Consumer Driven Contracts

When a new Provider release is announced by the third-party organisation the updated API document is stored within the Provider Proxy source tree. This triggers a Provider Proxy commit build that will transform the human-readable Provider Contract into a machine-readable equivalent, and use it to validate the Consumer Driven Contracts previously supplied by Consumer A and Consumer B. This means incompatibilities can be detected merely seconds after new Provider documentation is supplied by the third-party organisation.

Proxy Consumer Driven Contracts incur a greater maintenance cost than Consumer Driven Contracts, but if the provider cannot or will not become involved in risk reduction then a simulation of provider-side unit testing should be preferable to a lengthy period of consumer-side integration testing.

Organisation antipattern: Release Testing

Release Testing is high cost, low value risk management theatre

Described by Elisabeth Hendrickson as originating with the misguided belief that “testers test, programmers code, and the separation of the two disciplines is important“, the traditional segregation of development and testing into separate phases has disastrous consequences for product quality and validates Jez Humble’s adage that “bad behavior arises when you abstract people away from the consequences of their actions“. When a development team has authority for changes and a testing team has responsibility for quality, there will be an inevitable increase in defects and feedback loops that will inflate lead times and increase organisational vulnerability to opportunity costs.

Release Testing - Develop and Test

Agile software development aims to solve this problem by establishing cross-functional product teams, in which testing is explicitly recognised as a continuous activity and there is a shared commitment to product quality. Developers and testers collaborate upon a testing strategy described by Lisa Crispin as the Testing Pyramid, in which Test Driven Development drives the codebase design and Acceptance Test Driven Development documents the product design. The Testing Pyramid values unit and acceptance tests over manual and end-to-end tests due to the execution times and well-publicised limitations of the latter, such as Martin Fowler stating that “end-to-end tests are more prone to non-determinism“.

Release Testing - Product Team

Given Continuous Delivery is predicated upon the optimisation of product integrity, lead times, and organisational structure in order to deliver business value faster, the creation of cross-functional product teams is a textbook example of how to optimise an organisation for Continuous Delivery. However, many organisations are prevented from fully realising the benefits of product teams due to Release Testing – a risk reduction strategy that aims to reduce defect probability via manual and/or automated end-to-end regression testing independent of the product team.

Release Testing - Release Testing

While Release Testing is traditionally seen as a guarantee of product quality, it is in reality a fundamentally flawed strategy of disproportionately costly testing due to the following characteristics:

  1. Extensive end-to-end testing – as end-to-end tests are slow and less deterministic they require long execution times and incur substantial maintenance costs. This ensures end-to-end testing cannot conceivably cover all scenarios and results in an implicit reduction of test coverage
  2. Independent testing phase – a regression testing phase brazenly re-segregates development and testing, creating a product team with authority for changes and a release testing team with responsibility for quality. This results in quality issues, longer feedback delays, and substantial wait times
  3. Critical path constraints – post-development testing must occur on the critical path, leaving release testers under constant pressure to complete their testing to a deadline. This will usually result in an explicit reduction of test coverage in order to meet expectations

As Release Testing is divorced from the development of value-add by the product team, the regression tests tend to either duplicate existing test scenarios or invent new test scenarios shorn of any business context. Furthermore, the implicit and explicit constraints of end-to-end testing on the critical path invariably prevent Release Testing from achieving any meaningful amount of test coverage or significant reduction in defect probability.

This means Release Testing has a considerable transaction cost and limited value, and attempts to reduce the costs or increase the value of Release Testing are a zero-sum game. Reducing transaction costs requires fewer end-to-end tests, which will decrease execution time but also decrease the potential for defect discovery. Increasing value requires more end-to-end tests, which will marginally increase the potential for defect discovery but will also increase execution time. We can therefore conclude that Release Testing is an example of what Jez Humble refers to as Risk Management Theatre – a process providing an artificial sense of value at a disproportionate cost:

Release Testing is high cost, low value Risk Management Theatre

To undo the detrimental impact of Release Testing upon product quality and lead times, we must heed the advice of W. Edwards Deming that “we cannot rely on mass inspection to improve quality“. Rather than try to inspect quality into each product increment, we must instead build quality in by replacing Release Testing with feedback-driven product development activities in which release testers become valuable members of the product team. By moving release testers into the product team everyone is able to collaborate in tight feedback loops, and the existing end-to-end tests can be assessed for removal, replacement, or retention. This will reduce both the wait waste and overprocessing waste in the value stream, empowering the team to focus upon valuable post-development activities such as automated smoke testing of environment configuration and the manual exploratory testing of product features.

Release Testing - Final Product Team

A far more effective risk reduction strategy than Release Testing is batch size reduction, which can attain a notable reduction in defect probability with a minimal transaction cost. Championed by Eric Ries asserting that “small batches reduce risk“, releasing smaller change sets into production more frequently decreases the complexity of each change set, therefore reducing both the probability and cost of defect occurrence. In addition, batch size reduction also improves overheads and product increment flow, which will produce a further improvement in lead times.

Release Testing is not the fault of any developer, or any tester. It is a systemic fault that causes blameless teams of individuals to be bedevilled by a sub-optimal organisational structure, that actively harms lead times and product quality in the name of risk management theatre. Ultimately, we need to embrace the inherent lessons of Agile software development and Continuous Delivery – product quality is the responsibility of everyone, and testing is an activity not a phase.

Application pattern: Consumer Driven Contracts

Consumer Driven Contracts enable independent, interdependent releases

When applying Continuous Delivery to an application estate, our ability to rapidly release individual applications is constrained by inter-application dependencies. We want to release applications independently in order to minimise batch size and cycle time, but the traditional risk reduction methods associated with inter-application dependencies compromise lead times and are fundamentally incompatible with Continuous Delivery. How can we enable the independent evolution of interdependent applications with minimal risk?

For example, consider an estate initially comprised of two independently versioned applications – a Provider application that offers name/address/phone details, and a Consumer A application that uses name/address details.

Consumer Driven Contracts - Consumer and Provider Contract

In this scenario, the interdependent relationship between Provider and Consumer A means an independent release of either application will increase the probability of a runtime communications failure between the applications. As Consumer A has little knowledge of the Provider a new Consumer A artifact might unintentionally alter its usage of the Provider API, and conversely as the Provider has zero knowledge of Consumer A a new Provider artifact might break API compatibility for an unchanged version of Consumer A.

The fear of change and uncertainty associated with this inter-application dependency is exacerbated when another consumer with different requirements is added to the application estate, such as a Consumer B application requesting name/address/email details from the Provider.

Consumer Driven Contracts - Consumers and Provider Contract

The introduction of Consumer B adds another inter-application dependency to our estate and increases the complexity of our independent release strategy. We must be able to release a new Provider artifact capable of servicing Consumer B without endangering the integrity of existing API conversations with Consumer A, but this is difficult when the Provider is unaware of Consumer A and Consumer B requirements. To solve this problem a number of risk reduction methods may be proposed, each of which comes at a significant cost:

  • Coupled releases. Releasing new versions of the Provider, Consumer A, and Consumer B simultaneously is a direct impediment to Continuous Delivery, with increased holding costs and transactions costs resulting in artificially inflated batch sizes and lengthy lead times
  • End-to-end testing. An end-to-end test suite covering all possible API interactions would be costly to maintain and a drain upon lead times, hence Jez Humble arguing it “delays feedback on the effect of the change on the system as a whole” and JB Rainsberger stating “integrated tests are a scam
  • Multiple Producer API operations. This shares the cost of change between the Provider and Consumer B without impacting Consumer A, but there is an increase in Provider application complexity and there is no incentive for Consumer A to migrate to the same service as Consumer B

Given that all of the above methods will increase the cost of change and cause the evolutionary design of our application estate to stagnate, we need a more adaptive solution that will favour Build Quality In over Risk Management Theatre. We can reduce the consumer-side risk of each inter-application dependency by implementing the Tolerant Reader and API Examples patterns within Consumer A and Consumer B, but this solution is incomplete as it cannot address the provider-side risk – a new Provider artifact could still harm unchanged versions of Consumer A and/or Consumer B running in production.

Consumer Driven Contracts is a pattern originally described by Ian Robinson for Service-Oriented Architectures that has since proven applicable to any scenario in which evolving consumer-provider relationships need to be documented and continually verified. Characterised by Ian as an attempt to create a vocabulary for describing the roles of the participants in a loosely coupled relationship, the Consumer Driven Contracts pattern defines three related contract types:

  1. A Provider Contract is a description of a service offered by a provider
  2. A Consumer Contract is a description of the extent to which a consumer utilises a Provider Contract
  3. A Consumer Driven Contract is a description of how a provider satisfies an aggregate of Consumer Contracts

With Consumer Driven Contracts, the subset of information within the Provider Contract that a consumer is actually dependent upon forms a Consumer Contract that is communicated back to the Provider at build time. It is then the responsibility of the Provider to run a suite of unit tests in its commit build against each of its Consumer Contracts to ensure that no provider change could unexpectedly cause a communications failure at run time.

Consumer Driven Contracts - Contract Exchange

Consumer Driven Contracts are relatively cheap to implement, yet enormously powerful. As the Consumer Contracts supplied to the provider are executable specifications they provide an automated documentation mechanism that increases the visibility of existing inter-application relationships, and enable a testing strategy that favours the fast feedback of unit tests over the complexity and unreliability of end-to-end tests. Given that Consumer Driven Contracts enable us to visualise inter-application relationships and detect potentially harmful changes within seconds, Ian was not exaggerating when he stated that Consumer Driven Contracts “give us the fine-grained insight and rapid feedback we require to plan changes and assess their impact“.

If we apply the Consumer Driven Contracts vocabulary to our earlier example, it is apparent that our Provider application is offering a name/address/phone/email Provider Contract, with Consumer A encapsulating a private name/address Consumer Contract and Consumer B encapsulating a private name/address/email Consumer Contract.

Consumer Driven Contracts - Consumer Contracts

These Consumer Contracts should be elevated to first class concepts and persisted within the Provider application as a Consumer Driven Contract, so that the Provider is made aware of consumer expectations and can be independently released without fear of harming any inter-application dependencies.

Consumer Driven Contracts - Consumer Driven Contracts

Regardless of original intent, the Consumer Driven Contracts pattern is of immense value to Continuous Delivery and is an essential tool when we wish to independently release interdependent applications.

Application pattern: Vertical Divide and Conquer

Divide and conquer vertically to reduce cost of change

Application architecture is an oft-overlooked aspect of Continuous Delivery, and an application that encapsulates orthogonal business capabilities is a direct impediment to our stated aim of rapidly releasing small changesets to improve our cycle time. How can we better align technical capabilities with business capabilities, and consequently improve our release cadence?

For example, consider a Fruit Basket application that contains unrelated Apples and Bananas business capabilities, both of which rely upon a messaging service to integrate with a third-party endpoint.

Divide Conquer Monolith

This is clearly an ineffective architecture as it encapsulates unrelated business capabilities with different rates of change, violating Bob Martin’s Single Responsibility Principle and Kevlin Henney’s assertion that “an effective architecture is one that generally reduces the significance of design decisions“. A Fruit Basket release that contains new Apples functionality must also include the unchanged Bananas functionality, creating an inflated changeset that increases transaction costs and the probability of failure.

We can reduce the cost of change and enable Continuous Delivery by using Divide and Conquer to split Fruit Basket into independent applications, but it is important to assess the merits of different strategies. Horizontal Divide and Conquer favours a division of technical capabilities, and would result in separate Fruit Basket and Messaging applications.

Horizontal Divide Conquer

While Horizontal Divide and Conquer allows for independent releases of Fruit Basket and Messaging, it ignores the fact that the variation between individual business capabilities will be greater than the variation between business and technical capabilities. Over time there will be far more orthogonal Apples/Bananas requirements than orthogonal Fruit Basket/Messaging requirements, with each Fruit Basket release still a bloated changeset – and when an Apples or Bananas requirement needs a Messaging change, the changeset will grow even larger.

In order to have a significant impact upon cycle time, we must value the decoupling of business capabilities over the deduplication of technical capabilities and adopt a Vertical Divide and Conquer strategy. Vertical Divide and Conquer favours a division of business capabilities, and would result in separate Apples and Bananas applications.

Vertical Divide Conquer

By creating independent Apples and Bananas applications we align our technical capabilities with our business capabilities, and respect the different rates of change in the Apples and Bananas business domains. This will ensure each Apples and Bananas release consists of a minimal changeset, unlocking batch size reduction benefits such as improved transaction costs, improved risk, and improved cycle time.

If we identify Messaging duplication as an issue after Apple and Bananas have been decoupled, we can further improve our architecture by extracting Messaging as an independently versioned library. This will further shrink Apples and Bananas changesets, and the introduction of a Messaging Published Interface will make it easier to reason about responsibilities and collaborators.

Optimise Vertical Divide Conquer

The corollary to Vertical Divide and Conquer is Conway’s Law, which tells us for our vertically aligned business and technical capabilities to be truly successful we must also re-structure our organisation so that our development teams are vertically aligned with singular responsibility for specific business capabilities.

Application pattern: API Examples

API Examples enable consumer unit testing of producer APIs

When an application consumes data from a remote service, we wish to verify the correctness of consumer-producer interactions via a testing strategy that encompasses the following characteristics:

  • Fast feedback
  • 100% scenario coverage
  • Representative test data
  • Auto-detect API changes

The simplest method of verifying parser behaviour would be to use Test Driven Development and create a suite of unit tests reliant upon self-generated test data. These tests could provide feedback in milliseconds, and would be able to cover all happy/sad path scenarios. However, consumer ownership of test data increases the probability of errors as highlighted by Brandon Byars warning that “hard-coding assumptions about data available at the time of the test can be a fragile approach“, and it leaves the consumer entirely unaware of API changes when a new producer version is released.

Consumer Producer Unit Testing

To address these concerns, we could write some integration tests to trigger interactions between running instances of the producer and consumer applications to implicitly test parser behaviour. This could encourage the use of representative test data and warn the consumer of producer API changes, but the increase in run time from milliseconds to minutes would result in significant feedback delays and a corresponding reduction in scenario coverage. Given JB Rainsberger’s oft-quoted assertion that “integrated tests are a scam… you write integrated tests because you can’t write perfect unit tests“, it seems prudent to explore how we might equip our unit testing strategy with representative test data and an awareness of API changes.

Consumer Producer Integration Testing

API Examples is an application pattern originally devised by Daniel Worthingon-Bodart, in which a new version of a producer application is accompanied by a sibling artifact that solely contains example API requests and example API responses. These example files should be raw textual data recorded from the acceptance tests of the producer application, meaning that all happy/sad path scenarios known to the producer become freely available for unit testing within the consumer commit build without any binary dependencies or feedback delays. This approach satisfies Brandon’s recommendation that “each service publish a cohesive set of golden test data that it guarantees to be stable“, and when combined with a regular update policy ensures new versions of the consumer application will have  early warning of API changes.

Consumer Producer API Examples

As API Examples are exercised within the consumer commit build, they can warn a new consumer version of an API change but cannot warn an existing consumer version already in production. The solution to this problem is for the consumer to derive its parser behaviour from the API Examples and publish it as a Consumer Driven Contract – a testable specification embedded within the producer commit build to document how the consumer uses the API and to immediately warn the producer if an API change will harm a consumer.

Consumer Producer Examples and Contracts

Application pattern: Verify Branch By Abstraction

Verify Branch By Abstraction extends Branch By Abstraction to reduce risk

A fundamental requirement of Continuous Delivery is that the codebase must always be in a releasable state, with each successful pipeline commit resulting in an application artifact ready for production use. This means major changes to application architecture are a significant challenge – how can an architecture evolve without impacting the flow of customer value or increasing the risk of production failure?

In the Continuous Delivery book, Dave Farley and Jez Humble strongly advocate Trunk-Based Development and recommend the Branch By Abstraction pattern by Paul Hammant for major architectural changes. Branch By Abstraction refers to the creation of an abstraction layer between the desired changes and the remainder of the application, enabling evolutionary design of the application architecture while preserving the cadence of value delivery.

For example, consider an application consuming a legacy component that we wish to replace with a newer alternative. The scope of this change spans multiple releases, and has the potential to impede the development of other business features.

Verify Branch by Abstraction 1

To apply Branch By Abstraction we model an abstraction layer in front of the old and new component entry points, and delegate to the old component at build-time while the new component is under development. When development is complete, we release an application artifact that delegates to the new component and delete the old component.

Verify Branch by Abstraction 2

While this form of Branch By Abstraction successfully decouples design lead time from release lead time, there is still an increased risk of production failure when we switch to the new component. Dave and Jez have shown that “shipping semi-complete functionality along with the rest of your application is a good practice because it means you’re always integrating and testing“, but our new component could still cause a production failure. This can be alleviated by introducing a run-time configuration toggle behind the abstraction layer, so that for an agreed period of time we can dynamically revert to the old component in the event of a failure.

Verify Branch by Abstraction 3

Enhancing Branch By Abstraction with a run-time configuration toggle reduces the cost of failure when switching to the new component, but there remains a heightened probability of failure – although our abstraction layer compels the same behaviour, it cannot guarantee the same implementation. This can be ensured by applying Verify Branch By Abstraction, in which our run-time configuration toggle delegates to a verifying implementation that calls both components with the same inputs and fails fast if they do not produce the same output. This reduces the feedback loop for old and new component incompatibilities, and increases confidence in our evolving architecture.

Verify Branch by Abstraction 4

The value of Branch By Abstraction has been summarised by Jez as “your code is working at all times throughout the re-structuring, enabling continuous delivery“, and Verify Branch By Abstraction is an extension that provides risk mitigation. As with its parent pattern, it requires a substantial investment from the development team to accept the increased maintenance costs associated with managing multiple technologies for a period of time, but the preservation of release cadence and reduced risk means that Verify Branch By Abstraction is a valuable Trunk Based Development strategy.

Organisation antipattern: Project Teams

Projects kill teams and flow

Given the No Projects definition of a project as “a fixed amount of time and money assigned to deliver a large batch of value add“, it is not surprising that for many organisations a new project heralds the creation of a Project Team:

A project team is a temporary organisational unit responsible for the implementation and delivery of a project

When a new project is assigned a higher priority than business as usual and the Iron Triangle is in full effect, there can be intense pressure to deliver on time and on budget. As a result a Project Team appears to be an attractive option, as costs and progress can be monitored in isolation, and additional personnel can be diverted to the project when necessary. Unfortunately, in addition to managing the increased risk, variability, and overheads associated with a large batch of value-add, a Project Team is fatally compromised by its coupling to the project lifecycle.

The process of forming a team of complementary personnel that establish a shared culture and become highly productive is denied to Project Teams from start to finish. At the start of project implementation, the presence of a budget and a deadline means a Project Team is formed via:

  1. Cannibalisation – impairs productivity as entering team members incur a context switching overhead
  2. Recruitment – devalues cultural fit and required skills as hiring practices are compromised

Furthermore, at the end of project delivery the absence of a budget or a deadline means a Project Team is disbanded via:

  1. Cannibalisation – impairs productivity as exiting team members incur a context switching overhead
  2. Termination – devalues cultural fit and acquired skills as people are undervalued

This maximisation of resource efficiency clearly has a detrimental effect upon flow efficiency. Cannibalising a team member objectifies them as a fungible resource, and devalues their mastery of a particular domain. Project-driven recruitment of a team member ignores Johanna Rothman’s advice that “when you settle for second best, you often get third or fourth best” and “if a candidate’s cultural preferences do not match your organisation, that person will not fit“. Terminating a team member denigrates their accumulated domain knowledge and skills, and can significantly impact staff morale. Overall this strategy is predicated upon the notion that there will be no further business change, and as Allan Kelly warns that “the same people are unlikely to work together again“, it is an extremely dangerous assumption.

The inherent flaws in the Project Team model can be validated by an examination of any professional sports team that has enjoyed a period of sustained success. For example, when Sir Alex Ferguson was interviewed about his management style at Manchester United he described his initial desire to create a “continuity of supply to the first team… the players all grow up together, producing a bond“. This approach fostered a winning culture that valued long-term goals over short-term gains, and led to 20 years of unrivalled dominance. It is unlikely that Manchester United would have experienced the same amount of success had their focus been upon a particular season at the expense of others.

Therefore, the alternative to building a Project Team is to grow a Product Team:

A product team is a permanent organisational unit responsible for the continuous improvement of a product

Following Johanna’s advice to “keep teams of people together and flow the projects through cross-functional teams“, Product Teams are decoupled from project lifecycles and are empowered to pull in work as required. This enables a team to form a shared culture that reduces variability and improves stability, which as observed by Tobias Mayer “leads to enhanced focus and high performance“. Over a period of time a Product Team will master the relevant business and technical domains, which will fuel product innovation and produce a return on investment that rewards us for making the correct strategic decision of favouring products over projects.

« Older posts Newer posts »

© 2024 Steve Smith

Theme by Anders NorénUp ↑