Pipeline

You are currently browsing articles tagged Pipeline.

Promoting artifacts between repositories is a poor man’s metadata

Note: this antipattern used to be known as Mutable Binary Location

A Continuous Delivery pipeline is an automated representation of the value stream of an organisation, and rules are often codified in a pipeline to reflect the real-world journey of a product increment. This means artifact status as well as artifact content must be tracked as an artifact progresses towards production.

One way of implementing this requirement is to establish multiple artifact repositories, and promote artifacts through those repositories as they successfully pass different pipeline stages. As an artifact enters a new repository it becomes accessible to later stages of the pipeline and inaccessible to earlier stages.

For example, consider an organisation with a single QA environment and multiple repositories used to house in-progress artifacts. When an artifact is committed and undergoes automated testing it resides within the development repository.

Pipeline Antipattern Artifact Promotion - Development

When that artifact passes automated testing it is signed off for QA, which will trigger a move of that artifact from the development repository to the QA repository. It now becomes available for release into the QA environment.

Pipeline Antipattern Artifact Promotion - QA

When that artifact is pulled into the QA environment and successfully passes exploratory testing it is signed off for production by a tester. The artifact will be moved from the QA repository to the production repository, enabling a production release at a later date.

Pipeline Antipattern Artifact Promotion - Production

A variant of this strategy is for multiple artifact repositories to be managed by a single repository manager, such as Artifactory or Nexus.

Pipeline Antipattern Artifact Promotion - Repository Manager

This strategy fulfils the basic need of restricting which artifacts can be pulled into pre-production and production environments, but its reliance upon repository tooling to represent artifact status introduces a number of problems:

  • Reduced feedback – an unknown artifact can only be reported as not found, yet it could be an invalid version, an artifact in an earlier stage, or a failed artifact
  • Orchestrator complexity – the pipeline runner has to manage multiple repositories, knowing which repository to use for which environment
  • Inflexible architecture – if an environment is added to or removed from the value stream the toolchain will have to change
  • Lack of metrics – pipeline activity data is limited to vendor-specific repository data, making it difficult to track wait times and cycle times

A more flexible approach better aligned with Continuous Delivery is to establish artifact status as a first-class concept in the pipeline and introduce per-binary metadata support.

Pipeline Antipattern Artifact Promotion - Metadata

When a single repository is used, all artifacts reside in the same location alongside their versioned metadata, which provides a definitive record of artifact activity throughout the pipeline. This means unknown artifacts can easily be identified, the complexity of the pipeline orchestrator can be reduced, and any value stream design can be supported over time with no changes to the repository itself.

Furthermore, as the collection of artifact metadata stored in the repository indicates which artifact passed/failed which environment at any given point in time, it becomes trivial to pipeline dashboards that can display pending releases, application cycle times, and where delays are occurring in the value stream. This is a crucial enabler of organisational change for Continuous Delivery, as it indicates where bottlenecks are occurring in the value stream – likely between people working in separate teams in separate silos.

Tags: , ,

Separate out analysis to preserve commit stage processing time

The entry point of a Continuous Delivery pipeline is its Commit Stage, and as such manages the compilation, unit testing, analysis, and packaging of source code whenever a change is committed to version control. As the commit stage is responsible for identifying defective code it represents a vital feedback loop for developers, and for that reason Dave Farley and Jez Humble recommend a commit stage that is “ideally less than five minutes and no more than ten” – if the build process is too slow or non-deterministic, the pace of development can soon grind to a halt.

Both compilation and unit testing tasks can be optimised for performance, particularly when the commit stage is hosted on a multi-processor Continuous Integration server. Modern compilers require only a few seconds for compilation, and a unit test suite that follows the Michael Feathers strategy of no database/filesystem/network/user interface access should run in parallel in seconds. However, it is more difficult to optimise analysis tasks as they tend to involve third-party tooling reliant upon byte code manipulation.

When a significant percentage of commit stage time is consumed by static analysis tooling, it may become necessary to trade-off unit test feedback against static analysis feedback and move the static analysis tooling into a separate Analysis Stage. The analysis stage is triggered by a successful run of the commit stage, and analyses the uploaded artifact(s) and source code in parallel to the acceptance testing stage. If a failure is detected the relevant pipeline metadata is updated and Stop The Line applies. That artifact cannot be used elsewhere in the pipeline and further development efforts should cease until the issue is resolved.

For example, consider an organisation that has implemented a standard Continuous Delivery pipeline. The commit stage has an average processing time of 5 minutes, of which 1 minute is spent upon static analysis.

Over time the codebase grows to the extent that commit stage time increases to 6 minutes, of which 1 minute 30 seconds is spent upon static analysis. With static analysis time growing from 20% to 25% the decision is made to create a separate Analysis stage, which reduces commit time to 4 minutes 30 seconds and improves the developer feedback loop.

Static analysis is the definitive example of an automated task that periodically needs human intervention. Regardless of tool choice there will always be a percentage of false positives and false negatives, and therefore a pipeline that implements an Analysis Stage must also offer a capability for an authenticated human user to override prior results for one or more application versions.

Tags: , ,

Our founder Steve Smith has written a detailed introduction to Continuous Delivery for the DZone 2014 report on Continuous Delivery.

“Introducing Continuous Delivery” describes the origins of Continuous Delivery, explores the problems with a manual release process, and outlines the key principles that underpin Continuous Delivery.

Read the full article – “Introducing Continuous Delivery” (external)

Tags: , , ,

Use Cost of Delay to value Continuous Delivery features

When building a Continuous Delivery pipeline, we want to value and prioritise our backlog of planned features to maximise our return on investment. The time-honoured, ineffective IT approach of valuation by intuition and prioritisation by cost is particularly ill-suited to Continuous Delivery, due to its focus upon one-off infrastructure improvements to enable product flow. How can we value and prioritise our backlog of planned pipeline features to maximise economic benefits?

To value our backlog, we can calculate the Cost of Delay of each feature – its economic value over a period of time if it was immediately available. Described by Don Reinertsen as “the golden key that unlocks many doors“, Cost of Delay can be calculated by quantifying the value of change or the cost of the status quo via the following economic benefit types:

  • Increase Revenue – improve profit margin
  • Protect Revenue – sustain profit margin
  • Reduce Costs – reduce costs currently incurred
  • Avoid Costs – reduce costs potentially incurred

Cost of Delay allows us to quantify the opportunity cost between a feature being available now or later, and using money as the unit of measurement transforms stakeholder conversations from cost-cutting to delivering value. Calculation accuracy is less important than the process of collaborative information discovery, with assumptions and probabilities preferably co-owned by stakeholders and published via information radiator.

Cost of Delay = economic value over time if immediately available

To prioritise our backlog, we can use Cost of Delay Divided By Duration (CD3) – a variant of the Weighted Shortest Job First scheduling policy. With CD3 we divide Cost of Delay by duration, with a higher score resulting in a higher priority. This is an effective scheduling policy as the duration denominator promotes batch size reduction.

CD3 = Cost of Delay / Duration

As the goal of Continuous Delivery is to decrease cycle time by reducing the transaction cost of releasing software, a pipeline feature will likely yield an Avoid Cost or Reduce Cost benefit intrinsically linked to release cadence. We can therefore calculate the Cost of Delay as one of the below:

  1. Reduce Cost: Automate action(s) to decrease wait times within release processing time

    = (wait time in minutes / cycle time in days) * minute price in £

  2. Avoid Cost: Automate action(s) to decrease probability of repeating release processing time due to rework

    = (processing time in minutes / cycle time in days) * minute price in £ * % cost probability per year

For example, consider an organisation building a Continuous Delivery pipeline to support its Apples, Bananas, and Oranges applications by fully automating its release scripts. The rate of business change is variable, with an Apples cycle time of 1 month, a Bananas cycle time of 2 months, and an Oranges cycle time of 3 months. Our pipeline has already fully automated the deploy, stop, and start actions for our Apples and Bananas applications but lacks support for our Oranges application, our test framework, and our database migrator.
Application Estate Once our development team have provided their cost estimates, how do we determine which feature to implement next without resorting to intuition?

Backlog Duration We begin by agreeing with our pipeline stakeholders an arbitrary price for a minute of our time of £10000, and calculate the Cost of Delay for supporting the Oranges application as:
Support Oranges application

= (wait time / cycle time) * minute price
= (20 + 20 + 20 / 90) * 10000
= 0.67 * 10000
= £6700 per day

Given the test framework has failed twice in the past year and caused a repeat of release processing time specifically due to its lack of pipeline support, the Cost of Delay is:
Support test framework

= (100 / months in a year) * occurrences
= (100 / 12) * 2
= 16% cost probability per year

= (processing time / cycle time) * minute price * % cost probability
= ((100 / 30) + (100 / 60) + (160 / 90)) * 10000 * 16%
= 6.78 * 10000 * 16%
= £10848 per day (£5328 Apples, £2672 Bananas, £2848 Oranges)

The Cost of Delay for supporting the database migrator is:

Support database migrator

= (wait time / cycle time) * minute price
= ((45 / 30) + (45 / 60) + (45 / 90)) * 10000
= 2.75 * 10000
= £27500 per day (£15000 Apples, £7500 Bananas, £5000 Oranges)

Now that we have established the value of the planned pipeline features, we can use CD3 to produce an optimal work queue. CD3 confirms that support for the database migrator is our most urgent priority:

Backlog CD3

This example shows that using Cost of Delay and CD3 within Continuous Delivery validates Mary Poppendieck’s argument that “basing development decisions on economic models helps the development team make good tradeoff decisions“. As well as learning support for the database migrator is twice as valuable as any current alternative, we can offer new options to our pipeline stakeholders – for example, if an Apples-specific database migrator required only 5 days, it would become our most desirable feature (£15000 per day / 5 days = CD3 score of 3000).

Tags: , ,

Pipeline updates must minimise risk to protect the Repeatable Reliable Process

We want to quickly deliver new features to users, and in Continuous Delivery Dave Farley and Jez Humble showed that “to achieve these goals – low cycle time and high quality – we need to make frequent, automated releases“. The pipeline constructed to deliver those releases should be no different and frequently, automatically released into Production itself. However, this conflicts with the Continuous Delivery principle of Repeatable Reliable Process – a single application release mechanism for all environments, used thousands of times to minimise errors and build confidence – leading us to ask:

Is the Repeatable Reliable Process principle endangered if a new pipeline version is released?

To answer this question, we can use a risk impact/probability graph to assess if an update will significantly increase the risk of a pipeline operation becoming less repeatable and/or reliable.

Pipeline Risk

This leads to the following assessment:

  1. An update is unlikely to increase the impact of an operation failing to be repeatable and/or reliable, as the cost of failure is permanently high due to pipeline responsibilities
  2. An update is unlikely to increase the probability of an operation failing to be repeatable, unless the Published Interface at the pipeline entry point is modified. In that situation, the button push becomes more likely to fail, but not more costly
  3. An update is likely to increase the probability of an operation failing to be reliable. This is where stakeholders understandably become more risk averse, searching for a suitable release window and/or pinning a particular pipeline version to a specific artifact version throughout its value stream. These measures may reduce risk for a specific artifact, but do not reduce the probability of failure in the general case

Based on the above, we can now answer our original question as follows:

A pipeline update may endanger the Repeatable Reliable Process principle, and is more likely to impact reliability than repeatability

We can minimise the increased risk of a pipeline update by using the following techniques:

  • Change inspection. If change sets can be shown to be benign with zero impact upon specific artifacts and/or environments, then a new pipeline version is less likely to increase risk aversion
  • Artifact backwards compatibility. If the pipeline uses a Artifact Interface and knows nothing of artifact composition, then a new pipeline version is less likely to break application compatibility
  • Configuration static analysis. If each defect has its root cause captured in a static analysis test, then a new pipeline version is less likely to cause a failure
  • Increased release cadence. If the frequency of pipeline releases is increased, then a new pipeline version is more likely to possess shallow defects, smaller feedback loops, and cheaper rollback

Finally, it is important to note that a frequently-changing pipeline version may be a symptom of over-centralisation. A pipeline should not possess responsibility without authority and should devolve environment configuration, application configuration, etc. to separate, independently versioned entities.

Tags: , ,

« Older entries