CI/CD & Release Process

A deployment pipeline is a bet on repeatability. Every manual step (a hand-edited config, a whispered "don't forget to run migrations") is a bet against it. The pipeline should encode everything the team knows about shipping safely, so that releasing is a boring, predictable event rather than a high-stakes ceremony.

Why this matters

Releasing software is where Integrity meets the real world. The code can be clean, the tests can pass, the architecture can be sound, but if the path from merge to production is fragile, manual, or poorly understood, none of that matters when something goes wrong at 5pm on a Friday. A well-built pipeline means the team ships with confidence at the end of every sprint, and every release is traceable, reversible, and auditable.

CI/CD is also where Teamwork becomes structural. When the pipeline is the single path to production, every developer follows the same process. There are no shortcuts, no "just this once" manual deploys, no tribal knowledge about which buttons to press. The pipeline is the shared understanding of how software gets released.

The standard

Pipeline platform

Bitbucket Pipelines and CircleCI are the primary CI/CD platforms for S&P projects. Bitbucket Pipelines integrates directly with Bitbucket repositories. CircleCI is used increasingly: especially for projects that need more complex pipeline orchestration, better parallelism, or GCP OIDC integration. GitHub Actions is used for projects hosted on GitHub (including some client projects). GitLab CI applies to GitLab-hosted client projects.

Regardless of platform, pipeline configuration lives in version-controlled files alongside the code, never in a web UI:

Platform	Config file
Bitbucket Pipelines	`bitbucket-pipelines.yml`
CircleCI	`.circleci/config.yml`
GitHub Actions	`.github/workflows/*.yml`
GitLab CI	`.gitlab-ci.yml`

When setting up a project on a different platform (client GitHub/GitLab, or choosing CircleCI over Bitbucket Pipelines), translate the S&P pipeline stage by stage, don't redesign it:

S&P default	Equivalent on other platforms
Bitbucket Pipelines / CircleCI	GitHub Actions / GitLab CI
`bitbucket-pipelines.yml` / `.circleci/config.yml`	`.github/workflows/*.yml` / `.gitlab-ci.yml`
Pipeline variables / CircleCI contexts	Repository secrets / CI/CD variables
Deployment environments	GitHub environments / GitLab environments

The pipeline stages, quality gates, and deployment strategy remain the same regardless of platform. Only the YAML syntax and platform-specific features (caching, parallelism, OIDC) change.

Pipeline architecture

Every S&P project follows a four-stage pipeline. Stages run sequentially because each stage validates a prerequisite for the next. There is no value in building an artifact that fails linting, or deploying an artifact that fails tests.

lint --> test --> build --> deploy

For the runnable stage definitions (caching, service containers, image build and push), see DevOps Reference. Pipeline configuration. The four stages below explain what each does and why.

Stage 1: Lint

Run static analysis and formatting checks. This catches issues that don't require the application to compile or run. This stage includes:

ESLint with the project's ruleset (see Code Standards)
Prettier format verification (not auto-fix. CI should verify, not modify)
TypeScript type checking (tsc --noEmit)
Commitlint for commit message format (if configured)

Stage 2: Test

Run the automated test suite. This is where the Testing Strategy pays off.

For backend services (NestJS), this runs unit tests and integration tests against a PostgreSQL service container. For frontend applications (React/Next.js), this runs component tests and any Playwright or Cypress E2E tests. Coverage reports are stored as artifacts for later review.

If the test suite takes longer than 10 minutes, investigate. Long test suites slow the feedback loop and encourage developers to skip running tests locally. Common culprits: unnecessary database resets between tests, test files that import the entire application, serial execution of tests that could run in parallel.

Stage 3: Build

Build the deployment artifact. For S&P projects, this means a Docker container image.

Tag every image with the short commit SHA -- this creates a direct, unambiguous link between the code and the artifact and keeps rollback unambiguous. Tagging the same image as latest is also permitted: it gives tooling a stable, predictable pull target without needing to pin a specific version. Always push both tags together. Deploy and roll back using the commit SHA tag, not latest.

Build artifacts are immutable. The same image that passes tests in CI is the image that deploys to staging and then to production. If you need different configuration per environment, use environment variables, not different builds.

Stage 4: Deploy

Deploy the artifact to the target environment. The deploy stage is triggered differently depending on the branch:

Branch	Deploys to	Trigger
`development`	Development environment	Automatic on merge
`staging`	Staging environment	Automatic on merge
`main`	Production environment	Manual trigger (with approval)

Production deploys are never automatic. They require an explicit manual trigger in the pipeline UI, ensuring a human has decided this is the right time to release. Development and staging deploys are automatic because they're internal environments where fast feedback matters more than ceremony.

Environment strategy

S&P projects use three environments. Each environment has a distinct purpose and its own infrastructure, credentials, and data.

Development: The integration environment. Code merged to the development branch deploys here automatically. This is where the team validates that features work together before they're considered release candidates. Data is synthetic, refreshed regularly. Breaking things here is expected and acceptable.

Staging (The pre-production environment. Mirrors production infrastructure (same Terraform modules, same services, same scaling configuration) but with its own database, credentials, and secrets. This is where QA validates the sprint's work, where penetration testing happens (see Security), and where the release candidate is proven. Data is synthetic or anonymized) never production data.

Production: The live environment serving real users. Deploys happen via manual pipeline trigger at the end of the sprint. Access is restricted, changes are auditable, and rollback procedures are documented and tested.

Environment isolation is a hard rule. Production credentials cannot access staging resources. Staging data never flows to development. Each environment has its own secrets manager paths, service accounts, and database instances. See Security -- Environment isolation for the full policy.

Sprint-based release cadence

S&P ships at the end of each sprint: typically every two weeks. This is a deliberate choice, not a limitation.

Why not continuous deployment? Because S&P builds client-facing products where release coordination matters. Clients need predictable release schedules for their own planning. QA needs a stable window to validate the sprint's work. And the team needs a rhythm that separates "writing code" from "shipping code" so that neither activity suffers from the distraction of the other.

The sprint release cycle:

Sprint days 1-8: Development. Features merge to development branch, auto-deploy to dev environment.
Sprint days 8-9: Code freeze on staging branch. QA validates the release candidate on the staging environment. Bug fixes for the release candidate merge to staging directly.
Sprint day 10 (release day): Staging is validated. The staging branch merges to main. The production deployment pipeline is triggered manually.

Code freeze does not mean all development stops. It means the staging branch is stabilised for release while new development continues on feature branches targeting the next sprint.

Hotfixes bypass the sprint cadence. When a production issue requires an immediate fix:

Branch from main (not development: the development branch may contain unreleased work).
Fix, test, get code review.
Merge to main and trigger production deployment.
Cherry-pick the fix back to development and staging to prevent regression in the next release.

Hotfixes are the exception, not the process. If we're shipping hotfixes every sprint, the testing and QA process needs attention, not the deployment pipeline.

Release day and time. The default release window is the last working day of the sprint, during business hours in the client's primary timezone. Agree the exact day and time at project kickoff and document it in the project's communication agreement: clients need predictability more than they need a specific hour. Avoid Friday afternoon releases unless the client explicitly prefers them and the team has weekend on-call coverage. Hotfixes ship when the fix is validated, not on a schedule.

Release management

Release procedures are deliberately prescriptive. The cost of ambiguity when we're shipping to production is high.

A successful release is not "the pipeline turned green." It is a coordinated event where the team, stakeholders, and client know what is shipping, how it was validated, and what to do if something goes wrong. The technical pipeline handles the deploy; release management handles everything around it.

Stakeholder alignment

Communicate before you deploy, not after something breaks.

At sprint planning (or when scope stabilises):

Confirm which stories are in the release scope and which are deferred.
Flag anything that needs client awareness: breaking API changes, downtime, new third-party dependencies, data migrations, or UX changes that affect training materials.

3-5 business days before release day:

Share the deployable scope document (see below) with the project DM, client PO, and any other stakeholders listed in the project's communication agreement.
Confirm the release date and time still work. If scope has grown or critical bugs remain open, negotiate a deferral early, not on release morning.
Surface known risks explicitly: "This release includes a database migration that adds a column: rollback is application-only" is the kind of detail stakeholders need.

For client-facing products: Treat release communication with the same care as incident communication. Surprises erode trust faster than a one-sprint delay.

Release documentation

Every production release produces three documents. They can live in Confluence, Jira, or the project repo, the location matters less than that they exist, are linked together, and are shared before deployment.

1. Deployable scope document

A single source of truth for what is (and is not) in this release. Create it when development merges to staging and update it until release day.

Include:

Release version (v1.4.0) and sprint number
Jira release version with the list of included stories (and explicitly excluded stories that were originally planned)
Database migrations in this release and whether they are backward-compatible
New environment variables, secrets, or infrastructure changes required in production
Known limitations, feature flags, or partial rollouts
Dependencies on external systems (client-side changes, third-party API updates, app store submissions)

Link this document in the project Slack channel when you share it with stakeholders.

2. Test plan

The test plan describes how the release candidate will be validated before production. It is not a wish list, it is the evidence that justifies the deploy.

Include:

Link to the Qase test run for this release (mandatory on all S&P projects, see Testing Strategy)
Scope of automated testing: which suites ran, on which branch, with what result
Scope of manual and exploratory testing: bug bash date, targeted exploratory sessions, areas tested vs out of scope
Regression coverage: which critical user flows were verified on staging
Sign-off criteria and who provides sign-off (QA engineer, developer wearing the QA hat, or client UAT where applicable)
Known gaps: anything not tested and why (e.g., "payment flow not tested: no staging credentials for Stripe; verified in dev only")

The test plan should be shared with stakeholders alongside the deployable scope document. On client projects with formal UAT, the test plan is the basis for client sign-off.

3. Release notes

User-facing summary of what changed. Written for the client and their users, not for the engineering team.

Include:

New features and improvements (what users will notice)
Bug fixes (grouped, not one line per Jira ticket)
Breaking changes or required user actions ("Users will need to re-authenticate after this release")
Known issues deferred to the next sprint

Maintain a CHANGELOG.md in the repo for projects that need a persistent history. For client-facing releases, publish release notes in Confluence or deliver them via the channel agreed in the communication agreement.

Testing by environment

Each environment has a distinct testing purpose. Do not skip a layer because "it passed in dev."

Environment	When	Who	What gets validated
Development	Automatic on merge to `development`	Developer who built the feature	Feature works end-to-end with realistic data; no regressions in adjacent features; environment config is correct. Automated tests (lint, unit, integration) run in CI on every PR.
Staging	Automatic on merge to `staging`; release candidate soaks for at least 24 hours	QA (or developer wearing QA hat) + team	Full release testing gates: CI green, smoke tests, bug bash or targeted exploratory testing, regression on critical flows, QA sign-off. Migrations tested with production-like data volumes.
Production	Manual trigger on release day	Release owner (typically the tech lead or on-call engineer)	Post-deploy smoke tests, health checks, 30-minute monitoring window. Error rates at or below pre-deployment baseline. Client communication sent.

Development is where individual features are proven. Staging is where the sprint's work is proven as a release candidate. Production verification confirms the deploy succeeded, it is not a substitute for staging validation.

If staging and production differ in any way that affects testing (missing integration, different feature flag defaults, smaller database), document the difference in the deployable scope document. Undocumented environment differences are the most common cause of "it worked in staging" production failures.

Definition of a successful release

A release is successful when all of the following are true:

Scope matches expectation. Everything in the deployable scope document is live in production. Nothing shipped that was not in scope (unless a hotfix was explicitly agreed).
Validation is complete. All release testing gates passed on staging. QA sign-off is recorded. The test plan gaps are documented and accepted.
Production is healthy. Smoke tests pass. Application health checks are green. Error rates and key business metrics are at or below the pre-deployment baseline for at least 30 minutes.
Stakeholders are informed. Pre-release communication was sent. Post-release confirmation was sent with the release version and link to release notes.
The release is traceable. The production deployment maps to a specific commit SHA, a semantic version tag, and a Jira release version. Release notes and the deployable scope document are archived.
Rollback is ready. The previous production image tag is known and verified available. The team reviewed the rollback procedure before triggering the deploy.

If any criterion fails after production deployment, initiate rollback immediately (see Rollback procedures). Do not declare the release successful and debug in production while users are affected.

Release day communication

Release day communication follows a fixed sequence. Adapt the channels (Slack, email, client portal) to the project's communication agreement, the sequence does not change.

Before deployment (release owner sends):

Confirmation that release is proceeding as scheduled (or notification of deferral with reason)
Link to deployable scope document and release notes
Expected duration and any user-visible impact (downtime, maintenance window, required user actions)
Who is the release owner and who to contact if issues arise

During deployment:

Post when deployment starts
Post when deployment completes and smoke tests begin
Post if deployment is paused, rolled back, or delayed: silence during a release is as bad as silence during an incident

After successful deployment (within 1 hour):

Confirmation that the release is live, with version number
Link to release notes
Any post-release actions for the client (clear cache, notify users, update training docs)
Mark the Jira release as shipped

After a failed deployment or rollback:

Notify stakeholders immediately with what happened and current system status
Do not wait for root cause analysis before communicating: "We rolled back to v1.3.0, production is stable, investigating" is sufficient
Follow the Observability & Incidents process if user impact occurred

Versioning and rollback (summary)

Semantic versioning: Every production release gets a vMAJOR.MINOR.PATCH tag on main after successful deployment. Sprint releases increment MINOR. Hotfixes increment PATCH. See Versioning and tagging for full rules.
Rollback: Rollback is the first response to post-deployment user impact, not a last resort. The release owner reviews the rollback procedure before every production deploy. See Rollback procedures for execution steps.

Release checklist

This checklist is prescriptive. Release day is not the time for judgment calls about what can be skipped.

Before triggering production deployment:

After production deployment:

Smoke tests pass in production (critical user flows verified manually or via automated smoke suite).
Application health checks are green (API responds, database connected, external integrations reachable).
Error rates in monitoring are at or below pre-deployment baseline for at least 30 minutes.
Stakeholders are notified of successful deployment with version number and link to release notes.
The Jira release is marked as shipped. Release notes are published.

If any post-deployment check fails, initiate the rollback procedure immediately. Do not debug in production while users are affected. Roll back, stabilise, then investigate.

Versioning and tagging

S&P projects use semantic versioning (vMAJOR.MINOR.PATCH):

MAJOR: Breaking changes to the API or user-facing behaviour. Rare for most S&P projects.
MINOR: New features, non-breaking. This is the typical sprint release increment.
PATCH: Bug fixes and hotfixes.

Tags are created on the main branch after a successful production deployment, not before. The tag marks the exact commit that is running in production.

# After successful production deployment
git tag -a v1.4.0 -m "Sprint 12 release: user dashboard, notification preferences"
git push origin v1.4.0

The tag message should be a brief summary of what's in the release, not a changelog, but enough context that someone reading the tag list can understand what each release contained.

For projects that produce client-facing changelogs, maintain a CHANGELOG.md following Keep a Changelog format. Update it as part of the release preparation, not retroactively.

Database migration strategy

Database migrations are the riskiest part of most deployments. A migration that locks a table, drops a column, or corrupts data can take down production in ways that a code rollback cannot fix. Treat migrations with more caution than code changes.

ORM migration tools:

S&P projects use TypeORM or Prisma with PostgreSQL. Both provide migration frameworks:

Tool	Migration command	Generate	Run
TypeORM	`typeorm migration:generate`	Auto-generates from entity changes	`typeorm migration:run`
Prisma	`prisma migrate dev`	Generates from schema changes	`prisma migrate deploy`

Migration rules:

Migrations run before the application starts. The deployment pipeline executes migrations as a separate step before rolling out new application instances. Never rely on application startup to run migrations, if two instances start simultaneously, you get a race condition.
Migrations must be backward-compatible. The new code and the old code must both work with the migrated database schema. This is critical for zero-downtime deployments where old and new instances coexist during the rollout.

Backward-compatible migrations follow this pattern:
- Adding a column: add it as nullable or with a default value.
- Removing a column: stop reading it in the code first (deploy), then remove it in the next release.
- Renaming a column: add the new column, deploy code that writes to both, migrate data, deploy code that reads from the new column, drop the old column. This takes 2-3 releases.
- Adding a constraint: add it as NOT VALID first, then validate in a separate migration.
Never run destructive migrations without a backup. Before any migration that drops tables, columns, or modifies data, ensure a point-in-time database backup exists and has been verified.
Test migrations on staging with production-like data volumes. A migration that takes 100ms on a dev database with 10 rows can lock a production table with 10 million rows for minutes. Test with realistic data volumes.
Migrations are versioned and sequential. Never modify a migration that has already been applied to staging or production. If you need to change something, write a new migration.

Pipeline integration. The migration runs as a Cloud Run job (or equivalent one-off task) that completes before the application instances are updated. If the migration fails, the deployment stops: old application instances continue serving traffic with the old schema. For the pipeline step, see DevOps Reference. Database migrations in the pipeline.

Deployment strategies

The deployment strategy determines how new code replaces old code in a running environment. The right strategy depends on the risk tolerance and the infrastructure.

Rolling update (default for most S&P projects)

New instances start alongside old instances. Traffic shifts gradually as new instances pass health checks and old instances drain. If a new instance fails its health check, the rollout stops automatically.

This is the default on Google Cloud Run, Kubernetes, and most container orchestrators. It works well when:

Migrations are backward-compatible (old and new code coexist during rollout).
Health checks are meaningful (not just "the process started" but "the app can serve requests").
The rollout can be monitored in real time.

Blue-green deployment (for zero-downtime, instant rollback)

Two identical environments exist: blue (current production) and green (new version). Traffic switches from blue to green atomically. If something goes wrong, switch back to blue.

Use blue-green when:

The client requires zero-downtime deployments with instant rollback.
The application cannot tolerate mixed-version traffic during a rolling update.
The project justifies the cost of running two production environments.

Blue-green is more expensive (double infrastructure during deployment) and more complex (database migrations must be compatible with both versions). Don't use it by default: use it when rolling updates are insufficient.

Feature flags (decoupling deploy from release)

Feature flags allow code to be deployed without being activated. This separates "the code is in production" from "the feature is available to users."

// Feature flag check, the flag provider determines who sees the feature
if (featureFlags.isEnabled('new-dashboard', { userId: user.id })) {
  return this.newDashboardService.getData(user.id);
}
return this.dashboardService.getData(user.id);

Use feature flags when:

A feature spans multiple sprints and you want to merge incrementally without exposing incomplete work.
A feature needs gradual rollout (percentage-based, by user segment, by client).
A feature needs a kill switch, the ability to disable it in production without a deployment.

Feature flag tooling options for S&P projects: LaunchDarkly (managed, full-featured), Unleash (self-hosted, open source), or a simple database-backed implementation for projects that only need basic on/off flags.

Clean up flags after rollout. Feature flags that live forever become dead code and conditional complexity. When a feature is fully rolled out and stable, remove the flag and the old code path. Track flag cleanup as a tech debt item with a target date.

Rollback procedures

Rollback is not a last resort. It is the first response to any post-deployment issue that affects users.

When to roll back:

Error rates spike above the pre-deployment baseline.
Health checks fail on new instances.
Users report critical functionality is broken.
Monitoring alerts fire for issues that didn't exist before deployment.

Do not attempt to debug and fix forward while users are affected. Roll back first, stabilise, then investigate. The only exception is when the rollback itself would cause more damage than the issue (e.g., a data migration that cannot be reversed).

Application rollback (the common case). Redeploy the previous image tag, or re-run the last successful production deployment in your CI platform. For the exact commands, see DevOps Reference. Rollback commands.

Application rollback is fast because the previous container image still exists in the registry. This is why image tags use commit SHAs, you always know exactly which version to roll back to.

Database rollback (the dangerous case):

If a database migration needs to be reversed:

Check if the migration has a down method. TypeORM migrations generate up and down methods. Prisma migrations do not generate rollback SQL by default: write it manually for any non-trivial migration.
Test the rollback migration on staging first. Never run an untested rollback migration against production.
If no rollback migration exists, restore from backup. This is why pre-migration backups are non-negotiable. Point-in-time recovery to the moment before the migration ran.

Database rollbacks are inherently riskier than application rollbacks. This is why backward-compatible migrations are so important, if the migration is backward-compatible, you can roll back the application without touching the database.

Document every rollback. After a rollback, write a brief incident note: what was deployed, what went wrong, when the rollback was initiated, how long users were affected, and what will change to prevent recurrence. This feeds into the Observability & Incidents process.

Artifact management

Container images are stored in Google Container Registry (GCR) or Artifact Registry. Every image is tagged with the commit SHA. Production images are retained for at least 90 days to support rollbacks. Non-production images can be garbage-collected more aggressively (30 days).

Build artifacts (coverage reports, test results, security scan outputs) are stored as CI platform artifacts (Bitbucket Pipelines artifacts, CircleCI workspaces/artifacts, GitHub Actions artifacts). Retention depends on the platform and plan: default is often 14-30 days. For audit purposes, security scan results should also be archived to longer-term storage.

npm packages (for shared internal libraries): Use a private registry (npm organization, GitHub Packages, or Artifact Registry). Publish from CI, never from a developer's machine. The CI pipeline is the only path from code to artifact.

Pipeline security

The CI/CD pipeline has access to credentials that can deploy code to production. It is a high-value target. Treat pipeline security with the same rigour as application security.

Secrets management in CI:

Store secrets in the CI platform's secrets management, scoped to the appropriate environment. Bitbucket deployment variables, CircleCI contexts/project variables, or GitHub Actions secrets/environments. Never hardcode secrets in pipeline config files (bitbucket-pipelines.yml, .circleci/config.yml, .github/workflows/*.yml).
Use environment-scoped secrets for environment-specific values (production database URL, API keys). On Bitbucket Pipelines, these are only available to steps with the matching deployment directive. CircleCI and GitHub Actions have equivalent environment/context scoping.
For projects using GCP, authenticate via Workload Identity Federation (keyless) rather than long-lived service account keys. The key itself is a secret that can leak.
Review pipeline variable access periodically. When a team member leaves the project, audit which secrets they had access to.

SAST and dependency scanning in CI:

The security scan pipeline defined in Security runs as part of the CI process. This includes:

pnpm audit for known npm vulnerabilities
Trivy for dependency scanning and container image scanning
Semgrep for static application security testing
Gitleaks for secret detection in commits

These gates run on every PR. A failing security scan blocks the merge. See Security -- Automated security pipeline for configuration details.

Pipeline permissions:

Production deployment steps require manual approval. No automated process should deploy to production without a human decision.
Branch permissions protect main, staging, and development from direct pushes. All changes go through pull requests.
Pipeline configuration changes (bitbucket-pipelines.yml, .circleci/config.yml, .github/workflows/*.yml) should be reviewed with the same scrutiny as application code, a malicious pipeline change can exfiltrate secrets or deploy compromised code.

Performance budgets in CI

Performance regressions are deployment bugs. Catching them in CI is cheaper than catching them in production.

Bundle size checks (frontend). Configure bundle size limits in package.json or a bundlesize config file. The CI step fails if any bundle exceeds its budget. Budgets should be based on real performance targets, a 200KB JavaScript budget is not arbitrary if it's derived from a 3G mobile loading time target.

For Next.js projects, @next/bundle-analyzer generates a visual breakdown of what's in each bundle. Use it to investigate when a budget is exceeded.

Lighthouse CI (for web applications):

Run Lighthouse in CI to catch performance, accessibility, and SEO regressions, with minimum score thresholds that fail the build below target. For the bundlesize and Lighthouse CI config, see DevOps Reference. Performance budgets.

Performance budgets are not one-time configurations. As the application grows, budgets may need adjustment, but always with a conscious decision and a documented reason, not a silent increase. If a budget is consistently exceeded, investigate the cause rather than raising the limit.

API response time checks (backend):

For backend services, include response time assertions in integration tests. If a critical endpoint's P95 response time exceeds 500ms in the test environment, that's worth investigating before it becomes a production problem. This isn't a replacement for production monitoring (see Observability), it's an early warning system.

Multi-project and monorepo pipelines

Some S&P projects use a monorepo structure (API + Web + shared libraries in one repository). In a monorepo, the pipeline should be selective, don't rebuild and retest the frontend when only backend code changed.

Change detection. Filter pipeline steps by changed path so the frontend is not rebuilt when only backend code changed. For the path-filtering config, see DevOps Reference. Monorepo change detection.

For shared library changes, rebuild and test all consuming applications. A change to packages/shared/ should trigger both the API and web pipelines. Keep pipeline configuration DRY by extracting common steps into YAML anchors, a monorepo with three applications should not have three copies of the same lint step.

Critical thinking

When to invest in pipeline sophistication

A two-person project that ships a single service does not need blue-green deployments, canary releases, and a feature flag platform. Start with the basics (lint, test, build, deploy with rolling updates) and add complexity when you have evidence that the basics are insufficient.

Signs that you need more:

Rollbacks happen frequently and take too long (consider blue-green).
Features span multiple sprints and block other work (consider feature flags).
The test suite takes 20+ minutes and developers avoid running it (consider parallelisation, selective testing).
Deployment failures are hard to diagnose (consider better health checks, structured deployment logs).

Pipeline as code vs pipeline UI

The pipeline definition lives in a version-controlled config file in the repository (bitbucket-pipelines.yml, .circleci/config.yml, or .github/workflows/*.yml), not in a web UI. This is non-negotiable. Pipeline-as-code means the pipeline is versioned, reviewable, and reproducible. If the pipeline breaks, you can git blame to find when and why.

Some CI platforms tempt you with visual pipeline editors or UI-only configuration. Resist. UI-configured pipelines can't be code-reviewed, can't be branched, and can't be audited. The only pipeline configuration that should exist outside the repo is secrets (stored in the CI platform's secrets management).

The flaky test problem

Flaky tests (tests that pass and fail intermittently without code changes) erode trust in the pipeline. When the pipeline fails randomly, the team learns to ignore failures and retry until it passes. This is exactly the behaviour that lets real bugs through.

When a test is flaky:

Quarantine it immediately: move it to a separate test suite that doesn't block the pipeline.
File a ticket to fix it within the current sprint. Flaky tests are bugs, not tech debt.
Fix the root cause (race condition, time-dependent logic, external service dependency, shared test state).
Move it back to the main suite.

Never "fix" a flaky test by adding retries to the test itself. Retries mask the problem and make the test suite slower.

Build time discipline

CI minutes cost money (Bitbucket Pipelines, CircleCI, and GitHub Actions all bill by usage on paid tiers) and they cost the team's attention. A 20-minute pipeline means 20 minutes between "I pushed" and "I know if it's okay." That delay compounds across the team and across the day.

Keep the pipeline fast:

Use dependency caching aggressively (pnpm cache, Docker layer cache).
Run independent stages in parallel where the platform supports it.
Avoid installing unnecessary dependencies. If the lint step doesn't need sharp or puppeteer, don't install them.
Use Docker multi-stage builds to keep the build context small.
For monorepos, only rebuild what changed.

If the pipeline consistently takes longer than 15 minutes for a PR check, treat it as a problem to solve, not a cost of doing business.

Environment parity

Staging should mirror production as closely as possible. The most common deployment failures come from differences between environments, a missing environment variable, a different database version, a service that exists in production but not in staging.

But perfect parity is expensive. A pragmatic approach: use the same Terraform modules for both environments, the same Docker images, and the same pipeline. Allow differences only in scale (fewer instances, smaller databases) and in data (synthetic vs real). Document every known difference between staging and production, if it's documented, it's a conscious trade-off. If it's not documented, it's a latent failure.

Checklist

For every project

For every release

For every sprint

Pipeline build time is under 15 minutes for PR checks.
No new flaky tests have been added (existing flaky tests are being addressed).
Dependency caches are functioning (not reinstalling from scratch every build).
Pipeline variable access has been reviewed if team membership changed.

AI tips

Generate pipeline configuration. Describe your project structure (NestJS monorepo, Next.js frontend, PostgreSQL), CI platform (Bitbucket Pipelines, CircleCI, or GitHub Actions), and deployment target (Cloud Run, Vercel). Ask AI to generate the pipeline config. AI handles YAML syntax well but often misses platform-specific features: deployment environments, caches, OIDC auth, and artifact passing between steps need careful review.
Debug pipeline failures. Paste the failing pipeline log and ask AI to identify the root cause. AI is good at parsing dense CI output, spotting version mismatches, and recognising common issues (missing environment variables, Docker build context problems, permission errors).
Write database migration rollback scripts. Describe the forward migration and ask AI to generate the corresponding rollback SQL. AI is reliable for simple migrations (add/drop column, create index) but needs careful review for data migrations where the reverse operation may lose information.
Optimise Docker build times. Share your Dockerfile and ask AI to identify layer caching opportunities, unnecessary COPY operations, and multi-stage build improvements. AI consistently catches common Docker antipatterns (copying node_modules before package.json, missing .dockerignore).
Draft release notes from git history. Provide the git log between two release tags and ask AI to categorise changes (features, fixes, internal improvements) and generate human-readable release notes. Review for accuracy. AI may misclassify a refactor as a feature or miss the user-facing impact of a change.
Draft deployable scope and test plan. Provide the Jira release version, list of merged PRs, and Qase test run summary. Ask AI to generate a deployable scope document (what's in, what's out, migration notes, env changes) and a test plan outline. Review for accuracy. AI won't know about verbal deferrals or untested areas unless you tell it.
Review pipeline security. Paste your pipeline YAML and ask AI to audit it for security issues: exposed secrets, missing environment scoping, overly broad permissions, unsigned artifacts. This is a useful pre-review check before the pipeline goes through code review.

Resources

S&P internal:

S&P Git Workflow (Confluence) -- Existing branching strategy documentation
Engineering-forward backend template -- Pipeline configuration templates
Source Control -- Branching strategy and merge conventions
Security -- Automated security pipeline -- SAST, SCA, and secret scanning configuration
Testing Strategy -- Test suite structure that feeds the CI test stage
Observability & Incidents -- Post-deployment monitoring and incident response

Industry references:

12-Factor App -- Foundational principles for cloud-deployed applications (particularly III: Config, V: Build/release/run, X: Dev/prod parity)
Microsoft Code-with-Engineering Playbook -- CI/CD -- Engineering best practices for CI/CD
Bitbucket Pipelines documentation
CircleCI documentation
GitHub Actions documentation
Keep a Changelog -- Changelog formatting standard
Semantic Versioning -- Versioning convention
k6 -- Load testing tool (referenced in pipeline performance testing)
LaunchDarkly -- Feature flag management platform
Unleash -- Open-source feature flag platform

Why this matters​

The standard​

Pipeline platform​

Pipeline architecture​

Environment strategy​

Sprint-based release cadence​

Release management​

Stakeholder alignment​

Release documentation​

Testing by environment​

Definition of a successful release​

Release day communication​

Versioning and rollback (summary)​

Release checklist​

Versioning and tagging​

Database migration strategy​

Deployment strategies​

Rollback procedures​

Artifact management​

Pipeline security​

Performance budgets in CI​

Multi-project and monorepo pipelines​

Critical thinking​

When to invest in pipeline sophistication​

Pipeline as code vs pipeline UI​

The flaky test problem​

Build time discipline​

Environment parity​

Checklist​

For every project​

For every release​

For every sprint​

AI tips​

Resources​

Why this matters

The standard

Pipeline platform

Pipeline architecture

Environment strategy

Sprint-based release cadence

Release management

Stakeholder alignment

Release documentation

Testing by environment

Definition of a successful release

Release day communication

Versioning and rollback (summary)

Release checklist

Versioning and tagging

Database migration strategy

Deployment strategies

Rollback procedures

Artifact management

Pipeline security

Performance budgets in CI

Multi-project and monorepo pipelines

Critical thinking

When to invest in pipeline sophistication

Pipeline as code vs pipeline UI

The flaky test problem

Build time discipline

Environment parity

Checklist

For every project

For every release

For every sprint

AI tips

Resources