Pretty Good Pipeline (PGP)

With apologies to Phil Zimmermann, I’ve put together a “pretty good pipeline” using Azure Pipelines, a multi-stage YAML file, and a couple of ARM templates to deploy a typical .NET Core 3.1 API to an Azure App Service. This won’t be a walkthrough or a “how to” so much as general guidance with links to relevant documentation for more information. From a high-level the pipeline will look something like this diagram:

I am not going to talk about containerization, A/B testing, the latest trends in DevOps, or continuous delivery where a dozen production releases are all in a day’s work.  This is just vanilla guidance for a typical mid-sized shop. If this still interests you then read on.

Why would we want to go to the trouble to implement a pipeline? The short answer is we want to reduce risk. The long answer is we’re professionals who care about our craft and so we want a repeatable and automated process to build stable, tested, and reliable environments. Reliability must be built into every step of the pipeline. These are “lines of defense” to maintain quality. My example code for this pipeline is TimeApi, a small .NET Core 3.1 microservice that returns the current Unix  time. It’s publicly visible and hosted on Azure DevOps:

Pull Requests

Pull requests or “PRs” are an excellent workflow method and they are your first line of defense for maintaining quality in a production-ready master branch. In this project I have configured a branch policy to require a minimum number of reviewers. I have one reviewer (me) and I’m allowed to approve my own changes. On your team you might choose a senior developer or the team lead in this role. If I were you I’d require unit tests in the PR commit.

Build Stage

My multi-stage YAML file is configured with four stages: Build, Development, Sandbox, and Staging.  In this stage I’ve set a continuous integration (CI) trigger on the master branch. When the PR is approved it will merge the code into the master and kick off the CI build. This is important because as you can see below all unit tests will run. This is your second line of defense. If the new commit fails to pass your unit tests then red alert. Fire up the lava lamp. Fix it immediately.

# -----------------------------------------
# TimeApi
# -----------------------------------------

  trigger:
    - master

  pool:
    vmImage: 'ubuntu-latest'

  stages:
    - stage: Build

      jobs:
      - job: Build

        steps:
      
          - task: UseDotNet@2   
            displayName: 'Use .Net Core 3.x'
            inputs:  
              version: 3.x
      
          - task: DotNetCoreCLI@2
            displayName: Restore    
            inputs:     
              command: restore     
              projects: '**/*.csproj'
      
          - task: DotNetCoreCLI@2
            displayName: Build
            inputs:
              projects: '**/*.csproj'
              arguments: '--configuration Release'

          - task: DotNetCoreCLI@2
            displayName: Unit Tests
            inputs:
              command: test
              projects: '**/*Tests/*.csproj'
              arguments: '--configuration Release'

          # Publish app service to drop point
          - task: DotNetCoreCLI@2
            displayName: Publish 
            inputs:   
              command: publish   
              arguments: '--configuration Release --output $(build.artifactstagingdirectory)'
      
          # Copy ARM Template(s) to drop point
          - task: CopyFiles@2
            displayName: 'Copy ARM Templates'  
            inputs:  
              Contents: '**/azuredeploy*.json'
              TargetFolder: '$(build.artifactstagingdirectory)' 

          # Push all artifacts to drop point
          - task: PublishBuildArtifacts@1
            displayName: 'Publish Artifacts'

Development Stage

The development stage has a deployment job to exercise and test the YAML file and ARM templates. This pipeline creates or updates a resource group named services-dev-rg followed by the app service itself. Your goal is an idempotent infrastructure that gets created exactly the same way on every run. Feel free to blow away the resource group and try again. The important point here is you are not creating resources in the portal manually but instead automating the process with an ARM template. When you’ve got it right you can add the Sandbox job deployment stage to the YAML file. At that point everything is scripted (infrastructure as code) and there are no surprises.

I’m taking advantage of an Azure Key Vault to pull secrets into my pipeline. See my previous article on JSON variable substitution for detailed walkthrough of using a key vault.

Sandbox Stage

By now the CI build and development stage deployment job are running great. Now you can extend that stage and create an identical sandbox (QA/Test) environment for QA or end user testing. Although I could have I didn’t actually need to create environments in Azure DevOps.  The first time the YAML file ran the tool set those up for me.  Once they were set up I added an approval gate using myself in the role of QA:

This allows the person in the role of QA to kick off the deployment process into the Sandbox environment. Without this gate the CI build would send all approved PR commits all the way up through development and into sandbox. Now maybe this is what you want. If this isn’t an active project and there’s only one PR every once in a while then I’d remove the approval gate. But having this in place gives your tester more control on when to accept the new release in the sandbox environment.

The tester is going to do manual exploratory testing of course. But as a third line of defense I recommend using Postman in this stage for running integration tests. If you look in the project repo you can see two Postman files in the solution root. You can also automate your Postman tests in your pipeline using newman as a test runner.  Here are the three tests I have that run against the api/v1/time endpoint:

pm.test("Status code is 200", function() {
    pm.response.to.have.status(200);
});

pm.test("Response body contains value not null", function() {
    var jsonData = pm.response.json();
    pm.expect(jsonData.value).not.eql(null);
});

pm.test("Response body contains value not 0", function() {
    var jsonData = pm.response.json();
    pm.expect(jsonData.value).not.eql("0");
});

Putting on my QA hat, I open Postman and run them manually:

As QA I’m going to do this automatically on every new release. If a test fails I want to know why. For example the third test checks for a long integer that is not zero. If I get back a string like “abc” instead this will fail my test. And if that gets into production it breaks the contract of the API endpoint.

Staging (Pre-Prod)

My staging environment is actually a slot in the production app service with its own host name:

- task: AzureRmWebAppDeployment@4
  displayName: 'Azure App Service Deploy'
  inputs:
    azureSubscription: 'My Service Connection'
    resourceGroupName: 'services-prd-rg'
    appType: 'webApp'
    WebAppName: 'timeapi-prd'
    deployToSlotOrASE: true
    slotName: staging
    package: '$(Pipeline.Workspace)/**/*.zip'
    JSONFiles: '**/appsettings.json'

In my production environment I see the slot alongside my production app service. You have to set your service plan to S1 or greater. With an S1 plan you get five free slots.

I can go to the staging app at URL https://timeapi-prd-staging.azurewebsites.net/swagger and see the API running there. TimeApi does not have a database. (See my previous article on adding a database to your pipeline.) If you have a database in your project you have a decision to make. Do you have your staging app use the same connection string and test in production? Or do you have a separate database? Not that long ago it would have been a non-starter to test in production. But with techniques like feature flags and canary releases this is a viable option today. I’m not going to address these advanced techniques here. This is a “pretty good” pipeline after all.

Production

At this point going to production is a very low risk blue-green deployment pattern. All we have to do is swap the two slots and we’re live. There will be some downtime as the swap occurs and the app service restarts. You can configure for zero-downtime swaps but I’m not going to address that technique here. Let’s assume you have a maintenance window where a few minutes of downtime is good enough for your shop.

Have we achieved a reliable and stable release? Well PR commits are reviewed, unit tests are automatically run in the build stage, the infrastructure is identical from development on up through production, we have both exploratory and automated Postman tests, and a final “sanity check” in staging before the swap to go live. That’s a pretty good pipeline. Even so, it would be prudent to plan for a rollback.

Rolling Back

Rolling back the app service is trivial. You just swap the two slots again and you’re back to the previous release. But Donovan Brown is way ahead of us and points out that rarely is it that simple. We have to consider dependencies like the database too.  For this reason he recommends always making sure your database is one version backward compatible. Let’s take an example. Suppose I have a table that looks like this in production:

CREATE TABLE [dbo].[Widget]
(
    [Id] UNIQUEIDENTIFIER NOT NULL,
    [Name] NVARCHAR(50) NULL, 
    [Shape] NVARCHAR(50) NULL,
    CONSTRAINT [PK_Widget] PRIMARY KEY NONCLUSTERED ([Id])
)

And I have this domain object mapped to that table:

public class Widget
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public string Shape { get; set; }
}

Suppose in the next release I introduce a new column called Color to the Widget class. That’s not going to be a problem when I roll back. But suppose I’ve completely deprecated the Shape column in favor of a new business domain concept my customers call Contour. My object in v2 now looks like this:

public class Widget 
{ 
    public Guid Id { get; set; } 
    public string Name { get; set; } 
    public string Contour { get; set; } 
}

If I merely update my SQL script and change the column name of “Shape” to “Contour” then I have introduced risk. I cannot roll back the app service. To stay one version backward compatible my new script should look like this:

CREATE TABLE [dbo].[Widget] ( 
    [Id] UNIQUEIDENTIFIER NOT NULL, 
    [Name] NVARCHAR(50) NULL, 
    [Shape] NVARCHAR(50) NULL,   -- v1
    [Contour] NVARCHAR(20) NULL  -- v2
    CONSTRAINT [PK_Widget] PRIMARY KEY NONCLUSTERED ([Id]) 
)

When I release v3 I can safely remove the Shape column. This adds more work and takes careful planning on your part because you have to keep track of the versioning in the database scripts. But it’s a small price to pay because you can always swap back from blue to green (or green to blue) and know that your database is good to go.