No Release Testing

This series of articles explains why Release Testing – end-to-end regression testing on the critical path – is a wasteful practice that impedes Continuous Delivery and is unlikely to uncover business critical defects.

  1. Organisation Antipattern: Release Testing – introduces the Release Testing antipattern and why it cannot discover defects
  2. Organisation Antipattern: Consumer Release Testing – introduces the consumer-side variant of the Release Testing antipattern
  3. More Releases With Less Risk – describes how releasing smaller changesets more frequently can reduce probability and cost of failure
  4. Release Testing Is Risk Management Theatre – explains why Release Testing is so ineffective, and offers batch size reduction as an alternative

Application antipattern: Hardcoded Stub

A Hardcoded Stub constrains test determinism and execution times

When testing interactions between interdependent applications we always want to minimise the scope of the System Under Test to ensure deterministic and rapid feedback. This is often accomplished by creating a Stub of the provider application – a lightweight implementation of the provider that supplies canned API responses on demand.

For example, consider an ecommerce website with a microservice architecture. The estate includes a customer-facing Books frontend that relies upon a backend Authentication service for user access controls.

Hardcoded Stub - No Stub

As the Authentication service makes remote calls to a third party, an Authentication Stub is supplied to Books for its automated acceptance testing and manual exploratory testing.

Hardcoded Stub - Stub

A common Stub implementation is a Hardcoded Stub, in which provider behaviour is defined at build time and controlled at run time by magic inputs. For the Authentication Stub that would mean a static pool of pre-authenticated users [1], accessed by magic username via the standard Authentication API [2].

Hardcoded Stub - Hardcoded Stub Single Consumer

While the Authentication Stub has the advantage of not requiring any test setup, the implicit Books dependence upon pre-defined Authentication behaviours will impair Books test determinism and execution times:

  • Changes in the Authentication Stub can cause one to many Books tests to fail unexpectedly, increasing rework
  • Adding/removing/updating Authentication behaviours requires a new Authentication Stub release, increasing feedback loops
  • Concurrent test scenarios are constrained by the size of the Authentication Stub user pool, increasing test execution times

An inability to perform concurrent testing will have a significant impact upon lead times – parallel acceptance tests reduce build times, and parallel exploratory tests speed up tester feedback. This problem is exacerbated when multiple consumers rely on the same Hardcoded Stub, such as a Music frontend tested against the same Authentication Stub as the Books frontend. The same pool of pre-authenticated users [1] is offered to both consumers [2 and 3]

Hardcoded Stub - Hardcoded Stub Multiple Consumers

In this situation the simultaneous testing of Books and Music is bottlenecked by the pre-defined capacity of the Authentication Stub, despite their real-world independence. Test data management becomes a key issue, as testers will have to manually coordinate their use of the pre-authenticated users. A Books test could easily impact a Music test or vice versa – for example, a Books tester could accidentally lock out a user about to used by Music. Such problems can easily lead to wait times within the value stream and inflated lead times.

The root cause of these problems is the overly-contextual nature of a Hardcoded Stub. Rather than predicting test scenarios upfront and providing tightly controlled pathways through provider behaviours, a better approach is to use a Configurable Test Stub – a Configurable Test Double primed by different automated tests and/or exploratory testers to compose provider behaviours. This would mean an Authentication Stub with a private, test-only API able to create users in a desired authentication state and return their generated credentials [1a and 2a] before the standard Authenticatino API is used [1b and 2b].

Hardcoded Stub - Configurable Stub Multiple Consumers

By pushing responsibility for Authentication behaviours onto Books and Music, test data management is decentralised and tests become atomic. The Authentication Stub will have a much lower rate of change, Consumer Driven Contracts can be used to safeguard conversation integrity, and both Books and Music can parallelise their test suites to substantially reduce execution times.

A Hardcoded Stub may be an acceptable starting point for testing consumer/provider interactions, but it is unwieldy with a large test suite and unscalable with multiple consumers. A Configurable Test Stub will prevent nondeterministic test results from creeping into consumers and ensure fast feedback.

Organisation antipattern: Passive Disaster Recovery

Passive Disaster Recovery is Risk Management Theatre

When an IT organisation is vulnerable to a negative Black Swan – an extremely low probability, extremely high cost event causing ruinous financial loss – a traditional countermeasure to minimise downtime and opportunity costs is Passive Disaster Recovery. This is where a secondary production environment is established in a separate geographic location to the primary production environment, with every product increment released into Production and Disaster Recovery retained in a cold standby state.

For example, consider an organisation hosting version v1040 of a customer-facing service in its Production environment. In the event of a catastrophic failure, customers should be immediately routed to the Disaster Recovery environment and receive the same quality of service.

Organisation Antipattern - Disaster Recovery Environment - Vision

Regardless of physical/virtual hosting and manual/automated infrastructure provisioning, Passive Disaster Recovery is predicated upon the fundamentally flawed assumption that active and passive environments will be identical at any given point in time. Over time the unused Disaster Recovery environment will suffer from hardware, infrastructure, configuration, and software drift until it consists of Snowflake Servers that will likely require significant manual intervention if and when Disaster Recovery is activated. With negative Black Swan opportunity costs incurred at a rapid pace the entire future of the organisation might be placed in jeopardy.

Organisation Antipattern - Disaster Recovery Environment - Failover Drift

Passive Disaster Recovery remains common due to an industry-wide underestimation of negative Black Swan events. It is easier for an individual or an organisation to appreciate the extremely low probability of a disastrous business event rather than the extremely high opportunity cost, and as a result a Disaster Recovery environment tends to be procured when a business project begins and left to decay into Risk Management Theatre when the capex funding ends.

Continuous Delivery advocates a radically different approach to Disaster Recovery as it is explicitly focussed upon reducing the time, risk, and opportunity cost of delivering high quality services. One of its principles is Bring The Pain Forward – increasing the cadence of high cost, low frequency events to drive down transaction costs – and applying it to Disaster Recovery means moving from passive to active standby via Blue Green Releases and rotating production responsibility between two near-identical environments.

Organisation Antipattern - Disaster Recovery Environment - Blue Green Releases

In the above diagram, the Blue production environment is currently hosting v1040 and the Green environment is being upgraded with v1041. Once v1041 passes its automated smoke tests and manual exploratory tests it is signed off and customers are seamlessly rerouted from Blue to Green. A short period of time afterwards Blue is upgraded in the background and awaits the next production release.

Organisation Antipattern - Disaster Recovery Environment - Green Blue Releases

As well as enabling zero downtime releases and a cheap rollback mechanism, Blue Green Releases provides an effective Disaster Recovery strategy as the standby production environment is always active and in a known good state. If the Green environment suffers a complete outage customers can be switched to the Blue environment with complete confidence, and vice versa.

Organisation Antipattern - Disaster Recovery Environment - Blue Green Failover

By practicing Blue Green Releases an organisation is effectively rehearsing its Disaster Recovery strategy on every production release, and this can lead to advanced practices such as Chaos Engineering , Fault Injection , and Game Days. It requires a continuous investment in hardware and infrastructure, but it will reduce exposure to negative Black Swans and may even offer a strategic advantage over competitors.

Pipeline antipattern: Artifact Promotion

Promoting artifacts between repositories is a poor man’s metadata

Note: this antipattern used to be known as Mutable Binary Location

A Continuous Delivery pipeline is an automated representation of the value stream of an organisation, and rules are often codified in a pipeline to reflect the real-world journey of a product increment. This means artifact status as well as artifact content must be tracked as an artifact progresses towards production.

One way of implementing this requirement is to establish multiple artifact repositories, and promote artifacts through those repositories as they successfully pass different pipeline stages. As an artifact enters a new repository it becomes accessible to later stages of the pipeline and inaccessible to earlier stages.

For example, consider an organisation with a single QA environment and multiple repositories used to house in-progress artifacts. When an artifact is committed and undergoes automated testing it resides within the development repository.

Pipeline Antipattern Artifact Promotion - Development

When that artifact passes automated testing it is signed off for QA, which will trigger a move of that artifact from the development repository to the QA repository. It now becomes available for release into the QA environment.

Pipeline Antipattern Artifact Promotion - QA

When that artifact is pulled into the QA environment and successfully passes exploratory testing it is signed off for production by a tester. The artifact will be moved from the QA repository to the production repository, enabling a production release at a later date.

Pipeline Antipattern Artifact Promotion - Production

A variant of this strategy is for multiple artifact repositories to be managed by a single repository manager, such as Artifactory or Nexus.

Pipeline Antipattern Artifact Promotion - Repository Manager

This strategy fulfils the basic need of restricting which artifacts can be pulled into pre-production and production environments, but its reliance upon repository tooling to represent artifact status introduces a number of problems:

  • Reduced feedback – an unknown artifact can only be reported as not found, yet it could be an invalid version, an artifact in an earlier stage, or a failed artifact
  • Orchestrator complexity – the pipeline runner has to manage multiple repositories, knowing which repository to use for which environment
  • Inflexible architecture – if an environment is added to or removed from the value stream the toolchain will have to change
  • Lack of metrics – pipeline activity data is limited to vendor-specific repository data, making it difficult to track wait times and cycle times

A more flexible approach better aligned with Continuous Delivery is to establish artifact status as a first-class concept in the pipeline and introduce per-binary metadata support.

Pipeline Antipattern Artifact Promotion - Metadata

When a single repository is used, all artifacts reside in the same location alongside their versioned metadata, which provides a definitive record of artifact activity throughout the pipeline. This means unknown artifacts can easily be identified, the complexity of the pipeline orchestrator can be reduced, and any value stream design can be supported over time with no changes to the repository itself.

Furthermore, as the collection of artifact metadata stored in the repository indicates which artifact passed/failed which environment at any given point in time, it becomes trivial to pipeline dashboards that can display pending releases, application cycle times, and where delays are occurring in the value stream. This is a crucial enabler of organisational change for Continuous Delivery, as it indicates where bottlenecks are occurring in the value stream – likely between people working in separate teams in separate silos.

Organisation antipattern: Dual Value Streams

Dual Value Streams conceal transaction and opportunity costs

The goal of Continuous Delivery is to optimise cycle time in order to increase product revenues, and cycle time is measured as the average lead time of the value stream from code checkin to production release. This was memorably summarised by Mary and Tom Poppendieck as the Poppendieck Question:

“How long would it take your organization to deploy a change that involves just one single line of code? Do you do this on a repeatable, reliable basis?”

The Poppendieck Question is an excellent lead-in to the Continuous Delivery value proposition, but the problem with using it to assess the cycle time of an organisation yet to adopt Continuous Delivery is there will often be two very different answers – one for features, and one for fixes. For example, consider an organisation with a quarterly release cycle. The initial answer to the Poppendieck Question would be “90 days” or similar. Dual Value Streams -  Feature Value Stream However, when the transaction cost of releasing software is disproportionately high a truncated value stream will often emerge for production defect fixes, in which value stream activities are deliberately omitted to slash cycle time. This results in Dual Value Streams – a Feature Value Stream with a cycle time of months, and a Fix Value Stream with a cycle time of days. If our example organisation can release a defect fix in a few days, the correct answer to the Poppendieck Question becomes “90 days or 3 days”. Dual Value Streams - Fix Value Stream Fix Value Streams exist because production defect fixes have a clear financial value that is easily communicated and outweighs the high transaction cost of Feature Value Streams. An organisation will be imbued with a sense of urgency, as a sunk cost has demonstrably been incurred and by releasing a fix faster an opportunity cost can be reduced. People in siloed teams will collaborate upon a fix, and by using a minimal changeset it becomes possible to reason about which value stream activities can be discarded e.g. omitting capacity testing for a UI fix.

Dual Value Streams is an organisational antipattern because it is a local optimisation with little overall benefit to the organisation. There has been an investment in a release mechanism with a smaller batch size and a lower transaction cost, but as it is reserved for defect fixes it cannot add new customer value to the product. The long-term alternative is for organisations to adopt Continuous Delivery and invest in a single value stream with a minimal overall transaction cost. If our example organisation folded its siloed teams into cross-functional teams and moved activities off the critical path a fortnightly release cycle would become a distinct possibility. Dual Value Streams - Value Stream Dual Value Streams is an indicator of organisational potential for Continuous Delivery. When people are aware of the opportunity costs associated with releasing software as well as the transaction costs they are more inclined to work together in a cross-functional manner. When changesets contain a small number of changes it becomes easier to collectively reason about which value stream activities are useful and which should be moved off the critical path or retired.

Furthermore, a Fix Value Stream implicitly validates the use of smaller batch sizes as a risk reduction strategy. Defect fixes are released in small changes to minimise both opportunity costs and the probability of any further errors. Given that strategy works for fixes, why not release features more frequently and measure an organisation against a value-centric Poppendieck Question?

“How long would it take your organization to release a single value-adding line of code? Do you do this on a repeatable, reliable basis?”

Pipeline pattern: Analysis Stage

Separate out analysis to preserve commit stage processing time

The entry point of a Continuous Delivery pipeline is its Commit Stage, and as such manages the compilation, unit testing, analysis, and packaging of source code whenever a change is committed to version control. As the commit stage is responsible for identifying defective code it represents a vital feedback loop for developers, and for that reason Dave Farley and Jez Humble recommend a commit stage that is “ideally less than five minutes and no more than ten” – if the build process is too slow or non-deterministic, the pace of development can soon grind to a halt.

Both compilation and unit testing tasks can be optimised for performance, particularly when the commit stage is hosted on a multi-processor Continuous Integration server. Modern compilers require only a few seconds for compilation, and a unit test suite that follows the Michael Feathers strategy of no database/filesystem/network/user interface access should run in parallel in seconds. However, it is more difficult to optimise analysis tasks as they tend to involve third-party tooling reliant upon byte code manipulation.

When a significant percentage of commit stage time is consumed by static analysis tooling, it may become necessary to trade-off unit test feedback against static analysis feedback and move the static analysis tooling into a separate Analysis Stage. The analysis stage is triggered by a successful run of the commit stage, and analyses the uploaded artifact(s) and source code in parallel to the acceptance testing stage. If a failure is detected the relevant pipeline metadata is updated and Stop The Line applies. That artifact cannot be used elsewhere in the pipeline and further development efforts should cease until the issue is resolved.

For example, consider an organisation that has implemented a standard Continuous Delivery pipeline. The commit stage has an average processing time of 5 minutes, of which 1 minute is spent upon static analysis.

Over time the codebase grows to the extent that commit stage time increases to 6 minutes, of which 1 minute 30 seconds is spent upon static analysis. With static analysis time growing from 20% to 25% the decision is made to create a separate Analysis stage, which reduces commit time to 4 minutes 30 seconds and improves the developer feedback loop.

Static analysis is the definitive example of an automated task that periodically needs human intervention. Regardless of tool choice there will always be a percentage of false positives and false negatives, and therefore a pipeline that implements an Analysis Stage must also offer a capability for an authenticated human user to override prior results for one or more application versions.

Organisation pattern: Trunk Based Development Branching

Trunk Based Development supports Optimistic and Pessimistic Release Branching

Trunk Based Development is a style of software development in which all developers commit their changes to a single shared trunk in source control, and every commit yields a production-ready build. It is a prerequisite for Continuous Delivery as it ensures that all code is continuously integrated into a single workstream, that developers always work against the latest code, and that merge/integration pain is minimised. Trunk Based Development is compatible with a Release Branching strategy of short-lived release branches that are used for post-development defect fixes. That strategy might be optimistic and defer branch creation until a defect occurs, or be pessimistic and always create a release branch.

For example, consider an application developed using Trunk Based Development. The most recent commits to trunk were source revisions a and b which yielded application versions 610 and 611 respectively, and version 610 is intended to be the next production release.

Trunk Based Development Branching - Optimistic Release Branching

With Optimistic Release Branching, the release of version 610 is immediate as there is no upfront branching. If a defect is subsequently found then a decision must be made where to commit the fix, as trunk has progressed since 610 from a to b. If the risk of pulling forward from a to b is acceptable then the simple solution is to commit the fix to trunk as c, and consequently release version 612.

Trunk Based Development Branching - Optimistic Release Branching Low Risk Defect

However, if the risk of pulling forward from a to b is unacceptable then a 610.x release branch is created from a, with the fix committed to the branch as c and released as version 610.1. That fix is then merged back into trunk as d to produce the next release candidate 612, and the 610.x branch is earmarked for termination.

Trunk Based Development Branching - Optimistic Release Branching High Risk Defect

With Pessimistic Release Branching, the release of version 610 is accompanied by the upfront creation of a 610.x release branch in anticipation of defect(s). If a defect is found in version 610 then as with Optimistic Branching a decision must be made as to where the defect fix should be committed. If the risk of pulling forward from a to b is deemed insignificant then trunk can be pulled forward from a to b and the fix committed to trunk as c for release as version 612. The 610.x branch is therefore terminated without ever being used.

Trunk Based Development Branching - Pessimistic Release Branching Low Risk Defect

If on the other hand the risk is deemed significant then the fix is committed to the 610.x branch as c and released as version 610.1. The fix is merged back into trunk as d and version 612, which will also receive its own branch upon release.

Trunk Based Development Branching - Pessimistic Release Branching High Risk Defect

The choice between Optimistic Branching and Pessimistic Branching for Trunk Based Development is dependent upon product quality and lead times. If product quality is poor and lead times are long, then the upfront cost of Pessimistic Branching may be justifiable. Alternatively, if post-development defects are rare and production releases are frequent then Optimistic Branching may be preferable.

Announcing the Build Quality In book

Continuous Delivery and DevOps experience reports from the wild

I am delighted to announce Matthew Skelton and I are co-authors of a new book Build Quality In – a collection of Continuous Delivery and DevOps experience reports from the wild.

The interdependent disciplines of Continuous Delivery and DevOps can transform the fortunes of an organisation, but they are hard. Matthew and I want to help people on their Continuous Delivery and/or DevOps journey, by sharing the experiences of those who have done it – what worked, what didn’t, and the highs and lows of trying to build quality into an organisation.

Build Quality In will be an anthology of first-person narratives from Continuous Delivery and DevOps practitioners. We have an incredible group of contributors who have agreed to give up their own time to tell their stories, with Dave Farley writing the Continuous Delivery foreword. Further announcements on contributors and a DevOps foreword are pending.

Matthew and I are donating 70% of author royalties for Build Quality In to Code Club – a not-for-profit organisation that runs a UK-wide network of free volunteer-led after-school coding clubs for children aged 9-11. We passionately believe that diversity within the IT industry must improve, and we want to contribute in some small way.

Right now we are in our launch phase. If you are interested in learning more about Build Quality In, please visit our Leanpub landing page. If you are interested in purchasing our book, please register on the Leanpub landing page and indicate how much you would be willing to pay. Over the next few weeks the first contributors will submit their draft articles, and we will announce publication via Build Quality In on Twitter and a pending book website.

Register your interest

More releases with less risk

Continuous Delivery reduces defect probability and cost

Continuous Delivery often challenges conventional wisdom within the IT industry, and by advocating the rapid release of value-add to reduce risk it contradicts the traditional belief that a low release cadence is an effective risk reduction strategy. How can releasing software more frequently reduce both defect probability and defect cost?

The probability of a defect is the likelihood of a change within a changeset unexpectedly impeding value-add and imposing an opportunity cost. Given the defect probability of a changeset is proportional to its size we can calculate the defect probability of a change as follows:

Fix More With Less - Defect Probability

n = number of changesets
probability = (1 / 2n) * 100 [percentage]

The above formula indicates that decreasing changeset size by increasing the number of changesets will reduce defect probability, and this is confirmed by Don Reinertsen’s assertion that “many smaller experiments produce less variation than one big one“. For example, if a change is released in 1 changeset there is a 1 in 2 chance or 50% probability of failure. If it was instead released in 3 changesets there would be a 1 in 8 chance or 12.5% probability of failure.

The cost of a defect is the product of cost per unit time and duration, where cost per unit time represents economic impact and duration represents lifetime.

cost = cost per unit time [currency] * duration [unit time]

A defect has an inception date at its outset, a discovery date when diagnosed, and a resolution date when fixed. The interactions between these dates and cost per unit time enable a division of defect cost into sunk cost and opportunity cost. The sunk cost of a defect represents the economic damage already incurred at the point of discovery, while opportunity cost represents the economic damage still to be incurred.

Fix More With Less - Defect Cost

sunk cost duration = discovery date – inception date [unit time]
sunk cost = cost per unit time * sunk cost duration [currency]

opportunity cost duration = resolution date – discovery date [unit time]
opportunity cost = cost per unit time * opportunity cost duration [currency]

cost = sunk cost + opportunity cost [currency]

As cost per unit time is controlled by market conditions it is far easier to reduce opportunity cost duration by shortening lead times. This can be accomplished via batch size reduction, as Mary and Tom Poppendieck have observed that “time through the system is directly proportional to the amount of work-in-process” due to Little’s Law:

lead time = work in progress [units] / completion rate [units per time period]

Little’s Law is universal for all stable systems in which these variables are consistent long-term averages, and it is mathematical proof that reducing batch size will reduce lead time. For example, if a jug contains 4 litres of water and pours 2 litres per second then it will empty in 2 seconds. If instead the jug contained 2 litres of water and still poured 2 litres per second it would empty in 1 second.

Releasing smaller changesets more frequently into production can also reduce sunk cost duration, as small batches accelerate feedback. A smaller batch size will decrease the lead time and complexity associated with each changeset, creating faster feedback loops that will reduce the time required to discover a defect.

Consider an organisation with an average changeset size of 24 changes and an average lead time of 12 days. How can we reduce the defect probability of the next production release R1?

Fix More With Less - Defect Probability Smaller Changeset

n = 1
probability = (1 / 21) * 100 = 50%

Based on the binomial probabilities involved we recommend to the organisation that it reduce defect probability by applying batch size reduction to R1 and splitting its changeset into 2 smaller releases R1 and R2. This would decrease defect probability from 50% to 25%.

Fix More With Less - Defect Probability Larger Changeset

n = 2
probability = (1 / 22) * 100 = 25%

Unfortunately the organisation ignores our advice to release smaller changesets, and the release of R1 at a later date introduces a defect D1 that remains undiscovered for 6 days. D1 impedes a sufficient amount of value-add that a cost per unit time of £20,000 per day is estimated, which means a sunk cost of £120,000 has already been incurred and an opportunity cost of £240,000 is forecast. The organisation immediately triages D1 for a fix, but how can we reduce its opportunity cost?

Fix More With Less - Defect Cost Large

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 12 days * £20,000 = £240,000
overall cost = sunk cost + opportunity cost = £360,000

Given the organisation currently has an average batch size of 24 changes per changeset and a 12 day average lead time, Little’s Law computes an average completion rate of 2 changes per day and informs us that a reduced batch size of 12 changes per changeset would produce a 6 day lead time.

completion rate = work in process / lead time
completion rate = 24 changes per changeset / 12 days = 2 changes per day

lead time = work in process / completion rate
lead time = 12 changes per changeset / 2 changes per day = 6 days

Based on Little’s Law we again recommend to the organisation a halved batch size of 12 changes per changeset, and this time our advice is accepted. A fix for D1 is included in the next changeset released into production in 6 days, which produces an opportunity cost saving of £120,000.

Fix More With Less - Defect Cost Smaller Opportunity Cost

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 6 days * £20,000 = £120,000
overall cost = sunk cost + opportunity cost = £240,000

As well as decreasing the total cost of D1 by 33%, the new lead time of 6 days increases the rate of feedback for future production defects. When a subsequent release introduces defect D2 at a lower cost per unit time of £10,000 per day the reduced size and complexity of the offending changeset means D2 is discovered in only 3 days.

Fix More With Less - Defect Cost Smaller Sunk Cost

cost per unit time = £10,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 6 days * £10,000 = £60,000
overall cost = sunk cost + opportunity cost = £90,000

When we triage D2 we discover its cost per unit time has decreased to £1,000 per day, meaning its sunk cost is a poor indicator of opportunity cost and its Cost of Delay is lower than expected. Based upon the new 6 day lead time we recommend to the organisation that it defer a D2 fix for at least one release in order to implement pending value-add of greater value than the £12,000 opportunity cost of D2.

Fix More With Less - Defect Cost Even Smaller Opportunity Cost

cost per unit time = 3 days * £10,000, 12 days * £1,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 12 days * £1,000 = £12,000
overall cost = sunk cost + opportunity cost = £42,000

The assumption within many IT organisations that risk is directly proportional to rate of change is flawed, as it assumes a constant large batch size. Risk is actually proportional to size of change, and a low release cadence of large changesets is not as effective a risk reduction strategy as a high release cadence of small changesets. Continuous Delivery enables the release of smaller changesets to rapidly release value-add as well as reducing both the probability and cost of defects.

Organisation antipattern: Consumer Release Testing

Consumer Release Testing is high cost, low value risk management theatre

Despite the historical advice of Harold Dodge that “you cannot inspect quality into a product” and the contemporary advice of Don Reinertsen that “testing is probably the single most common critical-path queue” the Release Testing antipattern remains prevalent in the IT industry, and is by no means limited to standalone applications.

Consider the development of a consumer application that requires data from a provider application in order to fulfill its business capabilities. The consumer team contains developers and testers collaborating upon the Testing Pyramid strategy, which recommends unit/acceptance tests over end-to-end tests on the basis that test execution time is proportional to System Under Test scope. This means the necessary provider interactions are test-driven by the consumer team using the Test Stub pattern, which creates a lightweight provider implementation to supply canned responses back to the consumer.

Consumer Release Testing - Product Team Stubbed Provider

By using a stub the consumer interactions with the provider can be tested in a minimal System Under Test, which ensures that changes made by the consumer team produce fast and deterministic feedback. Success and failure scenarios (e.g. socket failure, socket timeout, provider error code) can be rapidly developed without relying upon a running provider instance, and the consumer team should be capable of rapidly responding to changing requirements in the future.

However, in many IT organisations the consumer team will be hindered by Consumer Release Testing – a phase of post-development end-to-end regression testing of the full consumer and provider stack, performed by a segregated testing team on the critical path.

Consumer Release Testing - Consumer Release Testing

The desire for provider risk mitigation is understandable given that consumer revenues are to an extent dependent upon the provider, but Consumer Release Testing exacerbates the original flaws of Release Testing:

  1. Extensive end-to-end testing – including both consumer and provider in System Under Test scope increases test execution time and maintenance costs
  2. Independent testing phase – dividing authority and responsibility for the consumer results in quality issues and feedback delays
  3. Critical path constraints – working on the critical path means the release testers will always be pressured to reduce test coverage to meet pre-agreed deadlines

By extending the Release Testing strategy it is evident that Consumer Release Testing is itself risk management theatre – it is highly unlikely to uncover any substantial defects in consumer/provider interactions without a significant increase in test coverage, which will drive up product lead times and opportunity costs.

A far more effective risk reduction strategy is to accept the conventional wisdom that testing is an activity not a phase, and move the blameless release testers into the consumer product team. This ensures that all team members are equally invested in product quality and empowers testers to focus upon higher-value activities such as exploratory testing, which has been described by Elisabeth Hendrickson as “particularly good at revealing vulnerabilities that no one thought to look for before“. For example, some exploratory testing off the critical path of the consumer against a running provider instance might uncover some additional error scenarios that would then be fed into the automated unit/acceptance tests.

Consumer Release Testing - Product Team Real Provider

A high value, low cost alternative to Consumer Release Testing is for the consumer and provider to actively cooperate in risk reduction, which can result in a substantial reduction in provider risk. The probability of a provider failure can be decreased by independently testing the conflated concerns of end-to-end testing as follows:

  • Connectivity: the consumer can test provider expectations of consumer connections via release time smoke tests and run time monitoring
  • Compatibility: the provider can test consumer expectations of messaging via build time Consumer Driven Contracts issued by the consumer
  • Conduct: the consumer can test its expectations of provider behaviour via build time API Examples issued by the provider

The cost of a provider failure can be reduced via incremental release strategies such as consumer-side Feature Toggles and provider-side Blue-Green Deployments. These practices encourage a provider release to be gradually phased into production usage, so that the consumer can switch back to the previous provider version if necessary.

This approach is a viable alternative to Consumer Release Testing, but it is of limited value without provider cooperation. If the provider cannot or will not participate in risk reduction then the consumer must assess risk based upon historical provider lead times. As large batch sizes increase risk an infrequent provider release schedule is indicative of heightened risk, and if the cost of failure is significant then a limited form of Consumer Release Testing may be deemed justifiable. In those circumstances the consumer development team should perform end-to-end tests off the critical path using a lightweight test client, so that the slow feedback loops and non-determinism of Consumer Release Testing are diminished.

