On Tech

Author: Steve Smith (Page 9 of 10)

The Strangler Pipeline – Challenges

The Strangler Pipeline introduced a Repeatable Reliable Process for start/stop, deployment, and database migration

Previous entries in the Strangler Pipeline series:

  1. The Strangler Pipeline – Introduction

To start our Continuous Delivery journey at Sky Network Services, we created a cross-team working group and identified the following challenges:

  • Slow platform build times. Developers used brittle, slow Maven/Ruby scripts to construct platforms of applications
  • Different start/stop methods. Developers used a Ruby script to start/stop individual applications, server administrators used a Perl script to start/stop platforms of applications
  • Different deployment methods. Developers used a Ruby script to deploy applications, server administrators used a Perl script to deploy platforms of applications driven by a Subversion tag
  • Different database migration methods. Developers used Maven to migrate applications, database administrators used a set of Perl scripts to migrate platforms of applications driven by the same Subversion tag

As automated release management is not our core business function, we initially examined a number of commercial and open-source off-the-shelf products such as ThoughtWorks GoLinkedIn GluAnt Hill Pro, and Jenkins. However, despite identifying Go as an attractive option we reluctantly decided to build a custom pipeline. As our application estate already consisted of ~30 applications, we were concerned that the migration cost of introducing a new release management product would be disproportionately high. Furthermore, a well-established Continuous Integration solution of Artifactory Pro and a 24-agent TeamCity build farm was in situ, and to recommend discarding such a large financial investment with no identifiable upfront value would have been professional irresponsibility bordering upon consultancy. We listened to Bodart’s Law and reconciled ourselves to building a low-cost, highly scalable pipeline capable of supporting our applications in order of business and operational value.

With trust between Development and Operations at a low ebb, our first priority was to improve platform build times. With Maven used to build and release the entire application estate, the use of non-unique snapshots in conjunction with the Maven Release plugin meant that a platform build could take up to 60 minutes, recompiled the application binaries, and frequently failed due to transitive dependencies. To overcome this problem we decreed that using the Maven Release plugin violated Build Your Binaries Only Once, and we placed Maven in a bounded CI context of clean-verify. Standalone application binaries were built at fixed versions using the Axel Fontaine solution, and a custom Ant script was written to transform Maven snapshots into releasable artifacts. As a result of these changes platform build times shrank from 60 minutes to 10 minutes, improving release cadence and restoring trust between Development and Operations.

In the meantime, some of our senior Operations staff had been drawing up a new process for starting/stopping applications. While the existing release procedure of deploy -> stop -> migrate -> set current version -> start was compatible with the Decouple Deployment From Release principle, the start/stop scripts used by Operations were coupled to Apache Tomcat wrapper scripts due to prior use. The Operations team were aware that new applications were being developed for Jetty and Java Web Server, and collectively it was acknowledged that the existing model left Operations in the undesirable state of Responsibility Without Authority. To resolve this Operations proposed that all future application binaries should be ZIP archives containing zero-parameter start and stop shell scripts, and this became the first version of our Binary Interface. This strategy empowered Development teams to choose whichever technology was most appropriate to solve business problems, and decoupled Operations teams from knowledge of different start/stop implementations.

Although the Binary Interface proved over time to be successful, the understandable desire to decommission the Perl deployment scripts meant that early versions of the Binary Interface also called for deployment, database migration, and symlinking scripts to be provided in each ZIP archive. It was successfully argued that this conflated the need for binary-specific start/stop policies with application-neutral deploy/migrate policies, and as a result the latter responsibilities were earmarked for our pipeline.

Implementing a cross-team plan of action for database migration has proven far more challenging. The considerable amount of customer-sensitive data in our Production databases encouraged risk aversion, and there was a sizeable technology gap. Different Development teams used different Maven plugins and database administrators used a set of unfathomable Perl scripts run from a Subversion tag. That risk aversion and gulf in knowledge meant that a cross-team migration strategy was slow to emerge, and its implementation remains in progress. However, we did experience a Quick Win and resolve the insidious Subversion coupling when a source code move in Subversion caused an unnecessary database migration failure. A pipeline stage was introduced to deliver application SQL from Artifactory to the Perl script source directories on the database servers. While this solution did not provide full database migration, it resolved an immediate problem for all teams and better positioned us for full database migration at a later date.

With the benefit of hindsight, it is clear that the above tooling discrepancies, disparate release processes, and communications issues were rooted in Development and Operations historically working in separate silos, as forewarned by Conway’s Law. These problems were solved by Development and Operations teams coming together to create and implement cross-team policies, and this formed a template for future co-operation on the Strangler Pipeline.

Pipeline Pattern: Stage Strangler

The Strangler Pattern reduces the pipeline entry cost for multiple applications

When adding an application into a Continuous Delivery pipeline, we must assess its compatibility with the Repeatable Reliable Process already used by the pipeline to release application artifacts. If the new application produces artifacts that are deemed incompatible, then we can use a Artifact Interface to hide the implementation details. However, if the new application has an existing release mechanism that is radically different, then we must balance our desire for a uniform Repeatable Reliable Process with business expectations.

Assuming that the rationale for pipelining the new application is to de-risk its release process and improve its time-to-market, spending a significant amount of time re-engineering the pipeline and/or application would conflict with Bodart’s Law and harm our value proposition. In this situation we should be pragmatic and adopt a separate, application-specific Repeatable Reliable Process and manage the multiple release mechanisms within the pipeline via a Stage Interface and the Strangler Pattern.

The Strangler Pattern is a legacy code pattern named after Strangler Fig plants, which grow in rainforests where there is intense competition for sunlight. Strangler plants germinate in the rainforest canopy, growing down and around a host tree an inch at a time until the roots are reached and the host tree dies. The Strangler Pattern uses this as an analogy to describe how to replace legacy systems, with a Strangler application created to wrap around the legacy application and gradually replace it one feature at a time until decommissioning. The incremental progress of the Strangler Pattern facilitates a higher release cadence and de-risks system cutover, as well as allowing new features to be developed alongside the transfer of existing features.

To use the Strangler Pattern in Continuous Delivery, we first define a Stage Interface as follows:

Stage#run(Application, Version, Environment)

For each pipeline stage we can then create a default implementation to act as the Repeatable Reliable Process for as many applications as possible, and consider each incoming application on its merits. If the existing release mechanism of a new application is unwanted, then we can use our default stage implementation. If the legacy release mechanism retains some value or is too costly to replace at this point in time, then we can use our Stage Interface to conceal a fresh implementation that wraps around the legacy release mechanism until a strangulation time of our choosing.

In the below example, our pipeline supports three applications – Apples, Oranges, and Pears. Apples and Oranges delegate to their own specific implementations, whereas Pears uses our standard Repeatable Reliable Process. A deploy of Apples will delegate to the Apples-specific pipeline stage implementation, which wraps the Apples legacy release mechanism.

In a similar fashion, deploying Oranges to an environment will delegate to the Oranges-specific pipeline stage implementation and its legacy release mechanism.

Whereas deploying Pears to an environment uses the standard Repeatable Reliable Process.

If and when we consider it valuable, we can update the pipeline and/or Apples application to support the standard Repeatable Reliable Process and subsequently strangle the Apples-specific pipeline stage implementation. Both Apples and Pears are unaffected by this change.

Finally, we can strange the Oranges-specific pipeline stage implementation at a time of our choosing and attain a single Repeatable Reliable Process for all applications.

It is important to note that if the legacy pipeline stage implementations are never strangled, it is unimportant as a significant amount of return on investment has still been delivered. Our applications are managed by our Continuous Delivery pipeline with a minimum of integration effort and a minimum of impact upon both applications and pipeline.

Continuous Delivery and organisational change

Continuous Delivery unaccompanied by organisational change will not reduce cycle time

Our Continuous Delivery value proposition describes a goal of reducing cycle time – the average time for a software release to propagate through to Production – in order to improve our time-to-market, saving time and money that can be invested back into product development and growing revenues. However, it is important to bear in mind that like any cross-organisation transformational programme Continuous Delivery is susceptible to Conway’s Law:

Any organisation that designs a system (defined broadly) will produce a design whose structure is a copy of the organisation’s communication structure

This extraordinary sociological observation predicts that multiple teams working on the same problem will produce disparate solutions, and that the structure of an organisation must be adaptable if product development is to remain sustainable. As a Continuous Delivery pipeline will likely traverse multiple organisational units (particularly in silo-based organisations), these are pertinent warnings that were addressed by Dave Farley and Jez Humble in the principles of Continuous Delivery:

  1. Repeatable Reliable Process
  2. Automate Almost Everything
  3. Keep Everything In Version Control
  4. Bring The Pain Forward
  5. Build Quality In
  6. Done Means Released
  7. Everybody Is Responsible
  8. Continuous Improvement

The majority of these principles are clearly focussed upon culture and behaviours, yet some Continuous Delivery implementations are entirely based upon Reliable Repeatable Process and Automate Almost Everything at the expense of more challenging principles such as Everybody Is Responsible.

For example, in our siloed organisation we are asked to improve the cycle time of an application from 28 days to 14 days, with the existing deployment and migration mechanisms manual processes that each take 20 minutes to perform. We introduce a Continuous Delivery pipeline in which we Automate Almost Everything, we Keep Everything In Version Control, and we establish our Repeatable Reliable Process. However, despite deployment and migration now taking only 5 minutes each, our cycle time is unaffected! How is this possible?

To explain this disheartening situation, we need to use Lean Thinking and examine the value stream of our application. While our new release mechanism has reduced the machine time of each pipeline stage (i.e. time releasing an artifact), the process lead time (i.e. time required to release and sign off a artifact) is largely unaffected. This is because process lead time includes wait time, and in a siloed organisation there are likely to be significant handoff periods both during and between pipeline stages which are “fraught with opportunities for waste“. If the deployment and migration mechanisms have each been reduced to 5 minutes but a 3 hour handoff from server administrator to database administrator remains, our Repeatable Reliable Process will never affect our cycle time.

To accomplish organisational change alongside Continuous Delivery, the most effective method of breaking down silo barriers is to visualise your value stream and act upon waste. Donella Meadows recommended that to effect organisational change you must “arrange the structures and conditions to reduce the probability of destructive behaviours and to encourage the possibility of beneficial ones“, and a pipeline containing a Repeatable Reliable Process is an excellent starting point – but it is not the end. Visualise your pipeline, educate people on the unseen inefficiencies caused by your organisational structure, and encourage an Everybody Is Responsible mentality.

Updating a Pipeline

Pipeline updates must minimise risk to protect the Repeatable Reliable Process

We want to quickly deliver new features to users, and in Continuous Delivery Dave Farley and Jez Humble showed that “to achieve these goals – low cycle time and high quality – we need to make frequent, automated releases“. The pipeline constructed to deliver those releases should be no different and frequently, automatically released into Production itself. However, this conflicts with the Continuous Delivery principle of Repeatable Reliable Process – a single application release mechanism for all environments, used thousands of times to minimise errors and build confidence – leading us to ask:

Is the Repeatable Reliable Process principle endangered if a new pipeline version is released?

To answer this question, we can use a risk impact/probability graph to assess if an update will significantly increase the risk of a pipeline operation becoming less repeatable and/or reliable.

Pipeline Risk

This leads to the following assessment:

  1. An update is unlikely to increase the impact of an operation failing to be repeatable and/or reliable, as the cost of failure is permanently high due to pipeline responsibilities
  2. An update is unlikely to increase the probability of an operation failing to be repeatable, unless the Published Interface at the pipeline entry point is modified. In that situation, the button push becomes more likely to fail, but not more costly
  3. An update is likely to increase the probability of an operation failing to be reliable. This is where stakeholders understandably become more risk averse, searching for a suitable release window and/or pinning a particular pipeline version to a specific artifact version throughout its value stream. These measures may reduce risk for a specific artifact, but do not reduce the probability of failure in the general case

Based on the above, we can now answer our original question as follows:

A pipeline update may endanger the Repeatable Reliable Process principle, and is more likely to impact reliability than repeatability

We can minimise the increased risk of a pipeline update by using the following techniques:

  • Change inspection. If change sets can be shown to be benign with zero impact upon specific artifacts and/or environments, then a new pipeline version is less likely to increase risk aversion
  • Artifact backwards compatibility. If the pipeline uses a Artifact Interface and knows nothing of artifact composition, then a new pipeline version is less likely to break application compatibility
  • Configuration static analysis. If each defect has its root cause captured in a static analysis test, then a new pipeline version is less likely to cause a failure
  • Increased release cadence. If the frequency of pipeline releases is increased, then a new pipeline version is more likely to possess shallow defects, smaller feedback loops, and cheaper rollback

Finally, it is important to note that a frequently-changing pipeline version may be a symptom of over-centralisation. A pipeline should not possess responsibility without authority and should devolve environment configuration, application configuration, etc. to separate, independently versioned entities.

Pipeline Pattern: Artifact Container

A pipeline should be decoupled from artifact content

Note – this pattern was previously known as Binary Interface

In a Continuous Delivery pipeline, a simple Commit stage implementation may equate an application artifact with the compiled artifact(s) e.g. a JAR or a WAR:

Binaries in Single Application Pipeline

This approach may suffice for a single application pipeline, but the coupling between start/stop behaviour and artifact file type means that details of java -jar, $CATALINA_HOME/bin/startup.sh, etc. seep into the pipeline start/stop stages and Operations documentation for manually starting/stopping artifacts. This becomes more of an issue when a pipeline manages multiple applications comprised of different web server technologiesdifferent build tools, and/or different programming languages:

Each new artifact type introduced into the pipeline requires a notable increase in complexity, as conditional behaviour must be incorporated into different pipeline stages and Operations must retain knowledge of multiple start/stop methods. This threatens the Continuous Delivery principle of Repeatable Reliable Process and is a significant barrier to pipeline scalability.

The solution is to introduce a Artifact Container as the output of the Commit Stage, so that artifacts appear identical to the pipeline:

The advantage of this strategy is that it minimises the amount of application-specific knowledge that can leak into the pipeline, empowering development teams to use whatever tools they deem necessary regardless of release management. A change in web server, build tool, or programming language should not necessitate a new pipeline version.

Pipeline Antipattern: Deployment Build

Continuous Integration “Has A” Continuous Delivery is the wrong way around

Eric Minick has written a thought-provoking assessment of Continuous Delivery and Continuous Integration tooling, which includes a variant of The Golden Hammer:

“When all you have is a Continuous Integration system, everything looks like a build”

This leads to an antipattern Eric and I refer to as Deployment Build, in which application deployments are tacked onto a Continuous Integration system by treating them as pseudo-builds. While this approach may be cheap to set up, it creates a number of problems:

  • Ambiguous language – mis-communication is more likely when a deployment button is mis-labelled as a build
  • Noisy user interface – endless buttons such as “Deploy Apples To QA”, “Deploy Apples To Production”, and “Deploy Oranges To QA” hinder feedback
  • Lax security – all downstream servers must be accessible including Production
  • Increased risk – a system failure will impede Operations as well as Development

Eric describes how Deployment Build drove UrbanCode to create uDeploy independent of AntHillPro, and ThoughtWorks Go has Continuous Delivery at its heart. Jenkins now has a Continuous Delivery plugin, although to say Continuous Integration “has a” Continuous Delivery capability is incorrect. The correct relationship is the inverse.

Pipeline Pattern: Aggregate Artifact

Aggregate Artifacts can incrementally deliver complex applications

When pipelining inter-dependent applications, the strength of the pipeline architecture directly correlates to the assembly cost and scalability of the packaging solution. If the Uber-Artifact approach is tacitly accepted as a poor implementation choice, is there an alternative?

The inherent value of any packaging solution is the version manifest mapping of package name/version to constituent artifacts, and there is no reason why that manifest cannot be managed as an artifact itself. In terms of Domain-Driven Design a version manifest is a naturally occurring Aggregate, with the package name/version equating to an Aggregate Root and the constituent artifacts represented as Entities, suggesting a name of Aggregate Artifact.

In an Aggregation Pipeline, the multiple pipelines of an Integration Pipeline are collapsed into a single pipeline with multiple commit stages. A successful commit of a constituent artifact triggers the commit of an Aggregate Artifact containing the new constituent version to the binary repository. At a later date the release stage fetches the aggregate artifact and examines the pipeline metadata for each constituent. Each constituent already known to the target environment is ignored, while the previously unknown constituents are released.

There are a number of advantages to this approach:

  • Consistent release mechanism. Whether a artifact is released independently or as part of an aggregate, the same process can be used
  • No duplication of artifact persistence. Committing an aggregate artifact to the binary repository does not necessitate the re-persistence of its constituents
  • High version visibility. An aggregate artifact is human and machine readable and can be published in multiple formats e.g. email, PDF/HTML release notes
  • Lightweight incremental release process. As an aggregate artifact is a manifest a version diff with earlier releases is easy to implement

As Aggregate Artifact persistence can be as low-tech as a properties file, the cost of the aggregate commit stage is extremely low. This means that a single Aggregate Artifact can scale to support many constituents (of which some may be Aggregate Artifacts themselves), and that failure scenarios can be easily handled.

For example, if a release of Fruit Basket 1.0 fails with the successful constituent Apples 23 and the unsuccessful constituent Oranges 49, then Stop The Line applies to Fruit Basket 1.0 and Oranges 49. Once a fix has been committed for Oranges 49, a new Fruit Basket 1.1 aggregate containing Oranges 50 and the previously successful Apples 23 can be quickly created and incrementally released to the environment.

Pipeline Antipattern: Uber-Artifact

Pipelining inter-dependent applications as uber-artifacts is unscalable

Achieving the Continuous Delivery of an application is a notable feat in any organisation, but how do we build on such success and pipeline more complex, inter-dependent applications? In Continuous Delivery, Dave Farley and Jez Humble suggest an Integration Pipeline architecture as follows:

Integration Pipeline

In an Integration Pipeline, the successful commit of a set of related application artifacts triggers their packaging as a single releasable artifact. That artifact then undergoes the normal test-release cycle, with an increased focus upon fast feedback and visibility of binary status.

Although Eric Minick’s assertion that this approach is “broken for complex architectures” seems overly harsh, it is true that its success is predicated upon the quality of the tooling, specifically the packaging implementation.

For example, a common approach is the Uber-Artifact (also known as Build Of Builds or Mega Build), where an archive is created containing the application artifacts and a version manifest. This suffers from a number of problems:

  • Inconsistent release mechanism. The binary deploy process (copy) differs from the uber-binary deploy process (copy and unzip)
  • Duplicated artifact persistence. Committing an uber-artifact to theartifact repository re-commits the constituent artifacts within the archive
  • Lack of version visibility. The version manifest must be extracted from the uber-artifact to determine constituent versions
  • Non-incremental release mechanism. An uber-artifact cannot easily diff constituent versions and must be fully extracted to the target environment

Of the above, the most serious problem is the barrier to incremental releases, as it directly impairs pipeline scalability. As the application estate grows over time in size and/or complexity, an inability to identify and skip the re-release of unchanged application artifacts can only increase cycle time.

Returning to the intent of the Integration Pipeline architecture, we simply require a package that expresses the relationship between the related application artifacts. In an uber-artifact, the value resides in the version manifest – so why not make that the artifact?

The Strangler Pipeline – Introduction

Continuously Delivering greenfield and legacy applications en masse

I recently gave a talk at Agile Horizons 2012 on behalf of my amazing employer Sky Network Services, detailing our yearlong Continuous Delivery journey and the evolution of our Strangler Pipeline. As a follow-up I intend to write a series of articles on our pipeline, as it is a narrative far removed from the “pipelining a single greenfield application” model often found in Continuous Delivery experience reports.

Sky Network Services is an agile, innovative technology company that produces telecommunications middleware for BSkyB. Despite a plethora of talented technical/non-technical staff and an enviable reputation for delivering quality software, an in-house analysis in mid-2011 identified a number of problems:

  • Many applications used different methods of deployment, start, stop, and database migration in different environments
  • There was little visibility of which application versions were progressing through the test environments at any given time
  • Releasing a minor bug fix for an application necessitated a re-release of the parent platform
  • Development teams and Operations teams were constrained to separate silos

At this point we were attracted to the Continuous Delivery value proposition, albeit with the additional challenge of scaling our pipeline to manage an estate of legacy/greenfield applications that in the past year has doubled in size.

In this series of articles I aim to cover:

  1. Challenges – how we solved the more common Continuous Delivery challenges
  2. Scaling Up – how we scaled our pipeline to manage our ever-growing application estate
  3. Legacy and Greenfield – how we simultaneously release legacy and greenfield applications
  4. Autonomation – how we established a Continuous Delivery transformation across a Waterscrumfall organisation

The Merit of Metadata

Metadata increases feedback and ensures value stream integrity

In Continuous Delivery, Dave Farley and Jez Humble describe the Lean production principles that underpin Continuous Delivery, and how a pipeline encapsulates a value stream – the journey a customer feature undertakes from discovery to real world consumption.

In a pipeline each stage represents a step in the value stream, meaning that for application XYZ an example value stream of [Development -> Acceptance -> UAT -> Performance -> Production] could be defined as follows:

Pipeline with Metadata

In the above pipeline, each stage ends with a discrete piece of metadata (“created XYZ 2.1”, “XYZ 2.1 passed acceptance tests”, etc.) being written back to the binary repository, indicating that one or more new customer features have progressed in the value stream.

Unfortunately, pipelines are often constructed without metadata support:

Pipeline without Metadata

In this situation the lack of activity data reduces each stage to a fire-and-forget operation, constraining feedback and unnecessarily exposing the value stream to obtuse, time-consuming errors. For example, QA could mistakenly test new features that have not passed automated regression tests, or Operations could mistakenly release features that have not been signed off.

With metadata the following safeguards can be easily implemented:

  • Check if binary actually exists e.g. “can XYZ 1.2 be retrieved for deploy to Production”
  • Prevent binary re-entering a previous stage e.g. “once XYZ 1.2 has passed or failed Acceptance, the result is final”
  • Ensure binary has successfully passed sufficient dependencies to enter a stage e.g. “XYZ 1.2 can only enter Production once it has successfully passed UAT and Performance”
  • Introduce a manual sign-off process for critical environments e.g. “XYZ 1.2 can only pass UAT when exploratory testing is complete”
  • Visualise pipeline activity e.g. “I can see XYZ 1.2 was successfully released to Production a week ago, and that 1.4 is the next viable candidate as 1.3 failed Acceptance”

These features ensure fast feedback is always available and that the pipeline is an accurate representation of the underlying value stream. An absence of metadata unnecessarily hinders these goals and suggests a failure to understand the core values of Continuous Delivery.

« Older posts Newer posts »

© 2024 Steve Smith

Theme by Anders NorénUp ↑