CI/CD & Release Process
A deployment pipeline is a bet on repeatability. Every manual step (a hand-edited config, a whispered "don't forget to run migrations") is a bet against it. The pipeline should encode everything the team knows about shipping safely, so that releasing is a boring, predictable event rather than a high-stakes ceremony.
Why this matters
Releasing software is where Integrity meets the real world. The code can be clean, the tests can pass, the architecture can be sound, but if the path from merge to production is fragile, manual, or poorly understood, none of that matters when something goes wrong at 5pm on a Friday. A well-built pipeline means the team ships with confidence at the end of every sprint, and every release is traceable, reversible, and auditable.
CI/CD is also where Teamwork becomes structural. When the pipeline is the single path to production, every developer follows the same process. There are no shortcuts, no "just this once" manual deploys, no tribal knowledge about which buttons to press. The pipeline is the shared understanding of how software gets released.
The standard
Pipeline platform
Bitbucket Pipelines and CircleCI are the primary CI/CD platforms for S&P projects. Bitbucket Pipelines integrates directly with Bitbucket repositories. CircleCI is used increasingly: especially for projects that need more complex pipeline orchestration, better parallelism, or GCP OIDC integration. GitHub Actions is used for projects hosted on GitHub (including some client projects). GitLab CI applies to GitLab-hosted client projects.
Regardless of platform, pipeline configuration lives in version-controlled files alongside the code, never in a web UI:
| Platform | Config file |
|---|---|
| Bitbucket Pipelines | bitbucket-pipelines.yml |
| CircleCI | .circleci/config.yml |
| GitHub Actions | .github/workflows/*.yml |
| GitLab CI | .gitlab-ci.yml |
When setting up a project on a different platform (client GitHub/GitLab, or choosing CircleCI over Bitbucket Pipelines), translate the S&P pipeline stage by stage, don't redesign it:
| S&P default | Equivalent on other platforms |
|---|---|
| Bitbucket Pipelines / CircleCI | GitHub Actions / GitLab CI |
bitbucket-pipelines.yml / .circleci/config.yml | .github/workflows/*.yml / .gitlab-ci.yml |
| Pipeline variables / CircleCI contexts | Repository secrets / CI/CD variables |
| Deployment environments | GitHub environments / GitLab environments |
The pipeline stages, quality gates, and deployment strategy remain the same regardless of platform. Only the YAML syntax and platform-specific features (caching, parallelism, OIDC) change.
Pipeline architecture
Every S&P project follows a four-stage pipeline. Stages run sequentially because each stage validates a prerequisite for the next. There is no value in building an artifact that fails linting, or deploying an artifact that fails tests.
lint --> test --> build --> deploy
Stage 1: Lint
Run static analysis and formatting checks. This catches issues that don't require the application to compile or run.
# bitbucket-pipelines.yml (excerpt: same stages apply in .circleci/config.yml and GitHub Actions workflows)
lint:
step:
name: Lint & format check
caches:
- node
script:
- pnpm install --frozen-lockfile
- pnpm lint
- pnpm format:check
- pnpm typecheck
This stage includes:
- ESLint with the project's ruleset (see Code Standards)
- Prettier format verification (not auto-fix. CI should verify, not modify)
- TypeScript type checking (
tsc --noEmit) - Commitlint for commit message format (if configured)
Stage 2: Test
Run the automated test suite. This is where the Testing Strategy pays off.
test:
step:
name: Test
caches:
- node
services:
- postgres
script:
- pnpm install --frozen-lockfile
- pnpm test:ci
- pnpm test:e2e:ci
artifacts:
- coverage/**
For backend services (NestJS), this runs unit tests and integration tests against a PostgreSQL service container. For frontend applications (React/Next.js), this runs component tests and any Playwright or Cypress E2E tests. Coverage reports are stored as artifacts for later review.
If the test suite takes longer than 10 minutes, investigate. Long test suites slow the feedback loop and encourage developers to skip running tests locally. Common culprits: unnecessary database resets between tests, test files that import the entire application, serial execution of tests that could run in parallel.
Stage 3: Build
Build the deployment artifact. For S&P projects, this means a Docker container image.
build:
step:
name: Build & push image
services:
- docker
script:
- export IMAGE_TAG="${BITBUCKET_COMMIT:0:8}" # CircleCI: ${CIRCLE_SHA1:0:8}; GitHub Actions: ${GITHUB_SHA:0:8}
- docker build -t ${REGISTRY}/${APP_NAME}:${IMAGE_TAG} .
- docker push ${REGISTRY}/${APP_NAME}:${IMAGE_TAG}
The image tag is the short commit SHA. This creates a direct, unambiguous link between the code and the artifact. Never use latest as a deployment tag: it tells you nothing about what is actually running.
Build artifacts are immutable. The same image that passes tests in CI is the image that deploys to staging and then to production. If you need different configuration per environment, use environment variables, not different builds.
Stage 4: Deploy
Deploy the artifact to the target environment. The deploy stage is triggered differently depending on the branch:
| Branch | Deploys to | Trigger |
|---|---|---|
development | Development environment | Automatic on merge |
staging | Staging environment | Automatic on merge |
main | Production environment | Manual trigger (with approval) |
Production deploys are never automatic. They require an explicit manual trigger in the pipeline UI, ensuring a human has decided this is the right time to release. Development and staging deploys are automatic because they're internal environments where fast feedback matters more than ceremony.
Environment strategy
S&P projects use three environments. Each environment has a distinct purpose and its own infrastructure, credentials, and data.
Development: The integration environment. Code merged to the development branch deploys here automatically. This is where the team validates that features work together before they're considered release candidates. Data is synthetic, refreshed regularly. Breaking things here is expected and acceptable.
Staging (The pre-production environment. Mirrors production infrastructure (same Terraform modules, same services, same scaling configuration) but with its own database, credentials, and secrets. This is where QA validates the sprint's work, where penetration testing happens (see Security), and where the release candidate is proven. Data is synthetic or anonymized) never production data.
Production: The live environment serving real users. Deploys happen via manual pipeline trigger at the end of the sprint. Access is restricted, changes are auditable, and rollback procedures are documented and tested.
Environment isolation is a hard rule. Production credentials cannot access staging resources. Staging data never flows to development. Each environment has its own secrets manager paths, service accounts, and database instances. See Security -- Environment isolation for the full policy.
Sprint-based release cadence
S&P ships at the end of each sprint: typically every two weeks. This is a deliberate choice, not a limitation.
Why not continuous deployment? Because S&P builds client-facing products where release coordination matters. Clients need predictable release schedules for their own planning. QA needs a stable window to validate the sprint's work. And the team needs a rhythm that separates "writing code" from "shipping code" so that neither activity suffers from the distraction of the other.
The sprint release cycle:
- Sprint days 1-8: Development. Features merge to
developmentbranch, auto-deploy to dev environment. - Sprint days 8-9: Code freeze on
stagingbranch. QA validates the release candidate on the staging environment. Bug fixes for the release candidate merge tostagingdirectly. - Sprint day 10 (release day): Staging is validated. The
stagingbranch merges tomain. The production deployment pipeline is triggered manually.
Code freeze does not mean all development stops. It means the staging branch is stabilised for release while new development continues on feature branches targeting the next sprint.
Hotfixes bypass the sprint cadence. When a production issue requires an immediate fix:
- Branch from
main(notdevelopment: thedevelopmentbranch may contain unreleased work). - Fix, test, get code review.
- Merge to
mainand trigger production deployment. - Cherry-pick the fix back to
developmentandstagingto prevent regression in the next release.
Hotfixes are the exception, not the process. If we're shipping hotfixes every sprint, the testing and QA process needs attention, not the deployment pipeline.
Release day and time. The default release window is the last working day of the sprint, during business hours in the client's primary timezone. Agree the exact day and time at project kickoff and document it in the project's communication agreement: clients need predictability more than they need a specific hour. Avoid Friday afternoon releases unless the client explicitly prefers them and the team has weekend on-call coverage. Hotfixes ship when the fix is validated, not on a schedule.
Release management
Release procedures are deliberately prescriptive. The cost of ambiguity when we're shipping to production is high.
A successful release is not "the pipeline turned green." It is a coordinated event where the team, stakeholders, and client know what is shipping, how it was validated, and what to do if something goes wrong. The technical pipeline handles the deploy; release management handles everything around it.
Stakeholder alignment
Communicate before you deploy, not after something breaks.
At sprint planning (or when scope stabilises):
- Confirm which stories are in the release scope and which are deferred.
- Flag anything that needs client awareness: breaking API changes, downtime, new third-party dependencies, data migrations, or UX changes that affect training materials.
3-5 business days before release day:
- Share the deployable scope document (see below) with the project DM, client PO, and any other stakeholders listed in the project's communication agreement.
- Confirm the release date and time still work. If scope has grown or critical bugs remain open, negotiate a deferral early, not on release morning.
- Surface known risks explicitly: "This release includes a database migration that adds a column: rollback is application-only" is the kind of detail stakeholders need.
For client-facing products: Treat release communication with the same care as incident communication. Surprises erode trust faster than a one-sprint delay.
Release documentation
Every production release produces three documents. They can live in Confluence, Jira, or the project repo, the location matters less than that they exist, are linked together, and are shared before deployment.
1. Deployable scope document
A single source of truth for what is (and is not) in this release. Create it when development merges to staging and update it until release day.
Include:
- Release version (
v1.4.0) and sprint number - Jira release version with the list of included stories (and explicitly excluded stories that were originally planned)
- Database migrations in this release and whether they are backward-compatible
- New environment variables, secrets, or infrastructure changes required in production
- Known limitations, feature flags, or partial rollouts
- Dependencies on external systems (client-side changes, third-party API updates, app store submissions)
Link this document in the project Slack channel when you share it with stakeholders.
2. Test plan
The test plan describes how the release candidate will be validated before production. It is not a wish list, it is the evidence that justifies the deploy.
Include:
- Link to the Qase test run for this release (mandatory on all S&P projects, see Testing Strategy)
- Scope of automated testing: which suites ran, on which branch, with what result
- Scope of manual and exploratory testing: bug bash date, targeted exploratory sessions, areas tested vs out of scope
- Regression coverage: which critical user flows were verified on staging
- Sign-off criteria and who provides sign-off (QA engineer, developer wearing the QA hat, or client UAT where applicable)
- Known gaps: anything not tested and why (e.g., "payment flow not tested: no staging credentials for Stripe; verified in dev only")
The test plan should be shared with stakeholders alongside the deployable scope document. On client projects with formal UAT, the test plan is the basis for client sign-off.
3. Release notes
User-facing summary of what changed. Written for the client and their users, not for the engineering team.
Include:
- New features and improvements (what users will notice)
- Bug fixes (grouped, not one line per Jira ticket)
- Breaking changes or required user actions ("Users will need to re-authenticate after this release")
- Known issues deferred to the next sprint
Maintain a CHANGELOG.md in the repo for projects that need a persistent history. For client-facing releases, publish release notes in Confluence or deliver them via the channel agreed in the communication agreement.
Testing by environment
Each environment has a distinct testing purpose. Do not skip a layer because "it passed in dev."
| Environment | When | Who | What gets validated |
|---|---|---|---|
| Development | Automatic on merge to development | Developer who built the feature | Feature works end-to-end with realistic data; no regressions in adjacent features; environment config is correct. Automated tests (lint, unit, integration) run in CI on every PR. |
| Staging | Automatic on merge to staging; release candidate soaks for at least 24 hours | QA (or developer wearing QA hat) + team | Full release testing gates: CI green, smoke tests, bug bash or targeted exploratory testing, regression on critical flows, QA sign-off. Migrations tested with production-like data volumes. |
| Production | Manual trigger on release day | Release owner (typically the tech lead or on-call engineer) | Post-deploy smoke tests, health checks, 30-minute monitoring window. Error rates at or below pre-deployment baseline. Client communication sent. |
Development is where individual features are proven. Staging is where the sprint's work is proven as a release candidate. Production verification confirms the deploy succeeded, it is not a substitute for staging validation.
If staging and production differ in any way that affects testing (missing integration, different feature flag defaults, smaller database), document the difference in the deployable scope document. Undocumented environment differences are the most common cause of "it worked in staging" production failures.
Definition of a successful release
A release is successful when all of the following are true:
- Scope matches expectation. Everything in the deployable scope document is live in production. Nothing shipped that was not in scope (unless a hotfix was explicitly agreed).
- Validation is complete. All release testing gates passed on staging. QA sign-off is recorded. The test plan gaps are documented and accepted.
- Production is healthy. Smoke tests pass. Application health checks are green. Error rates and key business metrics are at or below the pre-deployment baseline for at least 30 minutes.
- Stakeholders are informed. Pre-release communication was sent. Post-release confirmation was sent with the release version and link to release notes.
- The release is traceable. The production deployment maps to a specific commit SHA, a semantic version tag, and a Jira release version. Release notes and the deployable scope document are archived.
- Rollback is ready. The previous production image tag is known and verified available. The team reviewed the rollback procedure before triggering the deploy.
If any criterion fails after production deployment, initiate rollback immediately (see Rollback procedures). Do not declare the release successful and debug in production while users are affected.
Release day communication
Release day communication follows a fixed sequence. Adapt the channels (Slack, email, client portal) to the project's communication agreement, the sequence does not change.
Before deployment (release owner sends):
- Confirmation that release is proceeding as scheduled (or notification of deferral with reason)
- Link to deployable scope document and release notes
- Expected duration and any user-visible impact (downtime, maintenance window, required user actions)
- Who is the release owner and who to contact if issues arise
During deployment:
- Post when deployment starts
- Post when deployment completes and smoke tests begin
- Post if deployment is paused, rolled back, or delayed: silence during a release is as bad as silence during an incident
After successful deployment (within 1 hour):
- Confirmation that the release is live, with version number
- Link to release notes
- Any post-release actions for the client (clear cache, notify users, update training docs)
- Mark the Jira release as shipped
After a failed deployment or rollback:
- Notify stakeholders immediately with what happened and current system status
- Do not wait for root cause analysis before communicating: "We rolled back to v1.3.0, production is stable, investigating" is sufficient
- Follow the Observability & Incidents process if user impact occurred
Versioning and rollback (summary)
- Semantic versioning: Every production release gets a
vMAJOR.MINOR.PATCHtag onmainafter successful deployment. Sprint releases increment MINOR. Hotfixes increment PATCH. See Versioning and tagging for full rules. - Rollback: Rollback is the first response to post-deployment user impact, not a last resort. The release owner reviews the rollback procedure before every production deploy. See Rollback procedures for execution steps.
Release checklist
This checklist is prescriptive. Release day is not the time for judgment calls about what can be skipped.
Before triggering production deployment:
- Deployable scope document is published and shared with stakeholders.
- Test plan is published, Qase test run is linked, and QA sign-off is recorded (Jira comment, Slack message, or Confluence release note).
- Release notes are drafted and ready to publish on success.
- Stakeholders have been notified of the release date, time, and scope (~3-5 business days in advance).
- All Jira tickets in the sprint are in "Done" or explicitly moved to the next sprint.
- QA has signed off on the staging environment.
- The staging environment has been running the release candidate for at least 24 hours without regressions.
- Database migrations have been tested on staging and executed successfully.
- The release branch (
stagingor release branch) is merged tomain. - No unresolved critical or high-severity bugs exist for this release.
- Environment variables and secrets required for new features are configured in production.
- Release version is decided (
vMAJOR.MINOR.PATCH) andCHANGELOG.mdis updated. - The team has reviewed the rollback procedure and confirmed the previous version is available.
- Release owner is identified and release day communication is prepared.
After production deployment:
- Smoke tests pass in production (critical user flows verified manually or via automated smoke suite).
- Application health checks are green (API responds, database connected, external integrations reachable).
- Error rates in monitoring are at or below pre-deployment baseline for at least 30 minutes.
- Stakeholders are notified of successful deployment with version number and link to release notes.
- The Jira release is marked as shipped. Release notes are published.
If any post-deployment check fails, initiate the rollback procedure immediately. Do not debug in production while users are affected. Roll back, stabilise, then investigate.
Versioning and tagging
S&P projects use semantic versioning (vMAJOR.MINOR.PATCH):
- MAJOR: Breaking changes to the API or user-facing behaviour. Rare for most S&P projects.
- MINOR: New features, non-breaking. This is the typical sprint release increment.
- PATCH: Bug fixes and hotfixes.
Tags are created on the main branch after a successful production deployment, not before. The tag marks the exact commit that is running in production.
# After successful production deployment
git tag -a v1.4.0 -m "Sprint 12 release: user dashboard, notification preferences"
git push origin v1.4.0
The tag message should be a brief summary of what's in the release, not a changelog, but enough context that someone reading the tag list can understand what each release contained.
For projects that produce client-facing changelogs, maintain a CHANGELOG.md following Keep a Changelog format. Update it as part of the release preparation, not retroactively.
Database migration strategy
Database migrations are the riskiest part of most deployments. A migration that locks a table, drops a column, or corrupts data can take down production in ways that a code rollback cannot fix. Treat migrations with more caution than code changes.
ORM migration tools:
S&P projects use TypeORM or Prisma with PostgreSQL. Both provide migration frameworks:
| Tool | Migration command | Generate | Run |
|---|---|---|---|
| TypeORM | typeorm migration:generate | Auto-generates from entity changes | typeorm migration:run |
| Prisma | prisma migrate dev | Generates from schema changes | prisma migrate deploy |
Migration rules:
-
Migrations run before the application starts. The deployment pipeline executes migrations as a separate step before rolling out new application instances. Never rely on application startup to run migrations, if two instances start simultaneously, you get a race condition.
-
Migrations must be backward-compatible. The new code and the old code must both work with the migrated database schema. This is critical for zero-downtime deployments where old and new instances coexist during the rollout.
Backward-compatible migrations follow this pattern:
- Adding a column: add it as nullable or with a default value.
- Removing a column: stop reading it in the code first (deploy), then remove it in the next release.
- Renaming a column: add the new column, deploy code that writes to both, migrate data, deploy code that reads from the new column, drop the old column. This takes 2-3 releases.
- Adding a constraint: add it as NOT VALID first, then validate in a separate migration.
-
Never run destructive migrations without a backup. Before any migration that drops tables, columns, or modifies data, ensure a point-in-time database backup exists and has been verified.
-
Test migrations on staging with production-like data volumes. A migration that takes 100ms on a dev database with 10 rows can lock a production table with 10 million rows for minutes. Test with realistic data volumes.
-
Migrations are versioned and sequential. Never modify a migration that has already been applied to staging or production. If you need to change something, write a new migration.
Pipeline integration:
# bitbucket-pipelines.yml or .circleci/config.yml (excerpt)
deploy-staging:
step:
name: Deploy to staging
deployment: staging
script:
- gcloud run jobs execute migrate-${APP_NAME}-staging --wait
- gcloud run services update ${APP_NAME}-staging --image=${REGISTRY}/${APP_NAME}:${IMAGE_TAG}
The migration runs as a Cloud Run job (or equivalent one-off task) that completes before the application instances are updated. If the migration fails, the deployment stops: old application instances continue serving traffic with the old schema.
Deployment strategies
The deployment strategy determines how new code replaces old code in a running environment. The right strategy depends on the risk tolerance and the infrastructure.
Rolling update (default for most S&P projects)
New instances start alongside old instances. Traffic shifts gradually as new instances pass health checks and old instances drain. If a new instance fails its health check, the rollout stops automatically.
This is the default on Google Cloud Run, Kubernetes, and most container orchestrators. It works well when:
- Migrations are backward-compatible (old and new code coexist during rollout).
- Health checks are meaningful (not just "the process started" but "the app can serve requests").
- The rollout can be monitored in real time.
Blue-green deployment (for zero-downtime, instant rollback)
Two identical environments exist: blue (current production) and green (new version). Traffic switches from blue to green atomically. If something goes wrong, switch back to blue.
Use blue-green when:
- The client requires zero-downtime deployments with instant rollback.
- The application cannot tolerate mixed-version traffic during a rolling update.
- The project justifies the cost of running two production environments.
Blue-green is more expensive (double infrastructure during deployment) and more complex (database migrations must be compatible with both versions). Don't use it by default: use it when rolling updates are insufficient.
Feature flags (decoupling deploy from release)
Feature flags allow code to be deployed without being activated. This separates "the code is in production" from "the feature is available to users."
// Feature flag check, the flag provider determines who sees the feature
if (featureFlags.isEnabled('new-dashboard', { userId: user.id })) {
return this.newDashboardService.getData(user.id);
}
return this.dashboardService.getData(user.id);
Use feature flags when:
- A feature spans multiple sprints and you want to merge incrementally without exposing incomplete work.
- A feature needs gradual rollout (percentage-based, by user segment, by client).
- A feature needs a kill switch, the ability to disable it in production without a deployment.
Feature flag tooling options for S&P projects: LaunchDarkly (managed, full-featured), Unleash (self-hosted, open source), or a simple database-backed implementation for projects that only need basic on/off flags.
Clean up flags after rollout. Feature flags that live forever become dead code and conditional complexity. When a feature is fully rolled out and stable, remove the flag and the old code path. Track flag cleanup as a tech debt item with a target date.
Rollback procedures
Rollback is not a last resort. It is the first response to any post-deployment issue that affects users.
When to roll back:
- Error rates spike above the pre-deployment baseline.
- Health checks fail on new instances.
- Users report critical functionality is broken.
- Monitoring alerts fire for issues that didn't exist before deployment.
Do not attempt to debug and fix forward while users are affected. Roll back first, stabilise, then investigate. The only exception is when the rollback itself would cause more damage than the issue (e.g., a data migration that cannot be reversed).
Application rollback (the common case):
# Redeploy the previous image tag
gcloud run services update ${APP_NAME} \
--image=${REGISTRY}/${APP_NAME}:${PREVIOUS_IMAGE_TAG} \
--region=${REGION}
# Or re-run the last successful production deployment in your CI platform (Bitbucket Pipelines, CircleCI, or GitHub Actions)
Application rollback is fast because the previous container image still exists in the registry. This is why image tags use commit SHAs, you always know exactly which version to roll back to.
Database rollback (the dangerous case):
If a database migration needs to be reversed:
- Check if the migration has a
downmethod. TypeORM migrations generateupanddownmethods. Prisma migrations do not generate rollback SQL by default: write it manually for any non-trivial migration. - Test the rollback migration on staging first. Never run an untested rollback migration against production.
- If no rollback migration exists, restore from backup. This is why pre-migration backups are non-negotiable. Point-in-time recovery to the moment before the migration ran.
Database rollbacks are inherently riskier than application rollbacks. This is why backward-compatible migrations are so important, if the migration is backward-compatible, you can roll back the application without touching the database.
Document every rollback. After a rollback, write a brief incident note: what was deployed, what went wrong, when the rollback was initiated, how long users were affected, and what will change to prevent recurrence. This feeds into the Observability & Incidents process.
Artifact management
Container images are stored in Google Container Registry (GCR) or Artifact Registry. Every image is tagged with the commit SHA. Production images are retained for at least 90 days to support rollbacks. Non-production images can be garbage-collected more aggressively (30 days).
Build artifacts (coverage reports, test results, security scan outputs) are stored as CI platform artifacts (Bitbucket Pipelines artifacts, CircleCI workspaces/artifacts, GitHub Actions artifacts). Retention depends on the platform and plan: default is often 14-30 days. For audit purposes, security scan results should also be archived to longer-term storage.
npm packages (for shared internal libraries): Use a private registry (npm organization, GitHub Packages, or Artifact Registry). Publish from CI, never from a developer's machine. The CI pipeline is the only path from code to artifact.
Pipeline security
The CI/CD pipeline has access to credentials that can deploy code to production. It is a high-value target. Treat pipeline security with the same rigour as application security.
Secrets management in CI:
- Store secrets in the CI platform's secrets management, scoped to the appropriate environment. Bitbucket deployment variables, CircleCI contexts/project variables, or GitHub Actions secrets/environments. Never hardcode secrets in pipeline config files (
bitbucket-pipelines.yml,.circleci/config.yml,.github/workflows/*.yml). - Use environment-scoped secrets for environment-specific values (production database URL, API keys). On Bitbucket Pipelines, these are only available to steps with the matching
deploymentdirective. CircleCI and GitHub Actions have equivalent environment/context scoping. - For projects using GCP, authenticate via Workload Identity Federation (keyless) rather than long-lived service account keys. The key itself is a secret that can leak.
- Review pipeline variable access periodically. When a team member leaves the project, audit which secrets they had access to.
SAST and dependency scanning in CI:
The security scan pipeline defined in Security runs as part of the CI process. This includes:
pnpm auditfor known npm vulnerabilities- Trivy for dependency scanning and container image scanning
- Semgrep for static application security testing
- Gitleaks for secret detection in commits
These gates run on every PR. A failing security scan blocks the merge. See Security -- Automated security pipeline for configuration details.
Pipeline permissions:
- Production deployment steps require manual approval. No automated process should deploy to production without a human decision.
- Branch permissions protect
main,staging, anddevelopmentfrom direct pushes. All changes go through pull requests. - Pipeline configuration changes (
bitbucket-pipelines.yml,.circleci/config.yml,.github/workflows/*.yml) should be reviewed with the same scrutiny as application code, a malicious pipeline change can exfiltrate secrets or deploy compromised code.
Performance budgets in CI
Performance regressions are deployment bugs. Catching them in CI is cheaper than catching them in production.
Bundle size checks (frontend):
# bitbucket-pipelines.yml or .circleci/config.yml (excerpt)
performance:
step:
name: Performance budget
script:
- pnpm build
- npx bundlesize
Configure bundle size limits in package.json or a bundlesize config file. The CI step fails if any bundle exceeds its budget. Budgets should be based on real performance targets, a 200KB JavaScript budget is not arbitrary if it's derived from a 3G mobile loading time target.
For Next.js projects, @next/bundle-analyzer generates a visual breakdown of what's in each bundle. Use it to investigate when a budget is exceeded.
Lighthouse CI (for web applications):
Run Lighthouse in CI to catch performance, accessibility, and SEO regressions. Configure minimum score thresholds:
{
"ci": {
"assert": {
"assertions": {
"categories:performance": ["error", { "minScore": 0.9 }],
"categories:accessibility": ["error", { "minScore": 0.9 }],
"categories:seo": ["warn", { "minScore": 0.9 }]
}
}
}
}
Performance budgets are not one-time configurations. As the application grows, budgets may need adjustment, but always with a conscious decision and a documented reason, not a silent increase. If a budget is consistently exceeded, investigate the cause rather than raising the limit.
API response time checks (backend):
For backend services, include response time assertions in integration tests. If a critical endpoint's P95 response time exceeds 500ms in the test environment, that's worth investigating before it becomes a production problem. This isn't a replacement for production monitoring (see Observability), it's an early warning system.
Multi-project and monorepo pipelines
Some S&P projects use a monorepo structure (API + Web + shared libraries in one repository). In a monorepo, the pipeline should be selective, don't rebuild and retest the frontend when only backend code changed.
Change detection:
# bitbucket-pipelines.yml (excerpt: CircleCI and GitHub Actions support equivalent path-filtering)
# Only run API pipeline when API code changes
pipelines:
pull-requests:
'**':
- step:
name: Detect changes
script:
- |
if git diff --name-only origin/development...HEAD | grep -q "^apps/api/"; then
echo "API_CHANGED=true" >> .env.pipeline
fi
if git diff --name-only origin/development...HEAD | grep -q "^apps/web/"; then
echo "WEB_CHANGED=true" >> .env.pipeline
fi
For shared library changes, rebuild and test all consuming applications. A change to packages/shared/ should trigger both the API and web pipelines. Keep pipeline configuration DRY by extracting common steps into YAML anchors, a monorepo with three applications should not have three copies of the same lint step.
Critical thinking
When to invest in pipeline sophistication
A two-person project that ships a single service does not need blue-green deployments, canary releases, and a feature flag platform. Start with the basics (lint, test, build, deploy with rolling updates) and add complexity when you have evidence that the basics are insufficient.
Signs that you need more:
- Rollbacks happen frequently and take too long (consider blue-green).
- Features span multiple sprints and block other work (consider feature flags).
- The test suite takes 20+ minutes and developers avoid running it (consider parallelisation, selective testing).
- Deployment failures are hard to diagnose (consider better health checks, structured deployment logs).
Pipeline as code vs pipeline UI
The pipeline definition lives in a version-controlled config file in the repository (bitbucket-pipelines.yml, .circleci/config.yml, or .github/workflows/*.yml), not in a web UI. This is non-negotiable. Pipeline-as-code means the pipeline is versioned, reviewable, and reproducible. If the pipeline breaks, you can git blame to find when and why.
Some CI platforms tempt you with visual pipeline editors or UI-only configuration. Resist. UI-configured pipelines can't be code-reviewed, can't be branched, and can't be audited. The only pipeline configuration that should exist outside the repo is secrets (stored in the CI platform's secrets management).
The flaky test problem
Flaky tests (tests that pass and fail intermittently without code changes) erode trust in the pipeline. When the pipeline fails randomly, the team learns to ignore failures and retry until it passes. This is exactly the behaviour that lets real bugs through.
When a test is flaky:
- Quarantine it immediately: move it to a separate test suite that doesn't block the pipeline.
- File a ticket to fix it within the current sprint. Flaky tests are bugs, not tech debt.
- Fix the root cause (race condition, time-dependent logic, external service dependency, shared test state).
- Move it back to the main suite.
Never "fix" a flaky test by adding retries to the test itself. Retries mask the problem and make the test suite slower.
Build time discipline
CI minutes cost money (Bitbucket Pipelines, CircleCI, and GitHub Actions all bill by usage on paid tiers) and they cost the team's attention. A 20-minute pipeline means 20 minutes between "I pushed" and "I know if it's okay." That delay compounds across the team and across the day.
Keep the pipeline fast:
- Use dependency caching aggressively (
pnpmcache, Docker layer cache). - Run independent stages in parallel where the platform supports it.
- Avoid installing unnecessary dependencies. If the lint step doesn't need
sharporpuppeteer, don't install them. - Use Docker multi-stage builds to keep the build context small.
- For monorepos, only rebuild what changed.
If the pipeline consistently takes longer than 15 minutes for a PR check, treat it as a problem to solve, not a cost of doing business.
Environment parity
Staging should mirror production as closely as possible. The most common deployment failures come from differences between environments, a missing environment variable, a different database version, a service that exists in production but not in staging.
But perfect parity is expensive. A pragmatic approach: use the same Terraform modules for both environments, the same Docker images, and the same pipeline. Allow differences only in scale (fewer instances, smaller databases) and in data (synthetic vs real). Document every known difference between staging and production, if it's documented, it's a conscious trade-off. If it's not documented, it's a latent failure.
Checklist
For every project
- Pipeline is defined in code (
bitbucket-pipelines.yml,.circleci/config.yml, or.github/workflows/*.yml), not in a web UI. - Pipeline runs lint, test, build, deploy: in that order.
-
development,staging, andmainbranches are protected (no direct pushes). - Production deployment requires manual trigger, not automatic.
- Secrets are stored in the CI platform's secrets management, scoped to the correct environment.
- Container images are tagged with commit SHA, not
latest. - Security scanning (Trivy, Semgrep, Gitleaks, pnpm audit) runs on every PR.
- Database migrations run as a separate step before application deployment.
- A rollback procedure is documented and the team knows how to execute it.
- Previous production images are retained for at least 90 days.
For every release
- Deployable scope document published and shared with stakeholders.
- Test plan published with Qase test run linked.
- Release notes drafted before deploy, published after success.
- Stakeholders notified 3-5 business days before release day.
- Release owner identified; release day communication sent (before, during, after).
- QA has signed off on the staging environment.
- Staging has been running the release candidate for at least 24 hours.
- All database migrations have been tested on staging.
- No unresolved critical or high-severity bugs.
- Environment variables and secrets for new features are configured in production.
- Release is tagged with a semantic version after successful deployment.
- Post-deployment smoke tests pass.
- Monitoring confirms error rates are at baseline for 30 minutes.
- Rollback procedure has been reviewed.
For every sprint
- Pipeline build time is under 15 minutes for PR checks.
- No new flaky tests have been added (existing flaky tests are being addressed).
- Dependency caches are functioning (not reinstalling from scratch every build).
- Pipeline variable access has been reviewed if team membership changed.
AI tips
- Generate pipeline configuration. Describe your project structure (NestJS monorepo, Next.js frontend, PostgreSQL), CI platform (Bitbucket Pipelines, CircleCI, or GitHub Actions), and deployment target (Cloud Run, Vercel). Ask AI to generate the pipeline config. AI handles YAML syntax well but often misses platform-specific features: deployment environments, caches, OIDC auth, and artifact passing between steps need careful review.
- Debug pipeline failures. Paste the failing pipeline log and ask AI to identify the root cause. AI is good at parsing dense CI output, spotting version mismatches, and recognising common issues (missing environment variables, Docker build context problems, permission errors).
- Write database migration rollback scripts. Describe the forward migration and ask AI to generate the corresponding rollback SQL. AI is reliable for simple migrations (add/drop column, create index) but needs careful review for data migrations where the reverse operation may lose information.
- Optimise Docker build times. Share your Dockerfile and ask AI to identify layer caching opportunities, unnecessary COPY operations, and multi-stage build improvements. AI consistently catches common Docker antipatterns (copying
node_modulesbeforepackage.json, missing.dockerignore). - Draft release notes from git history. Provide the git log between two release tags and ask AI to categorise changes (features, fixes, internal improvements) and generate human-readable release notes. Review for accuracy. AI may misclassify a refactor as a feature or miss the user-facing impact of a change.
- Draft deployable scope and test plan. Provide the Jira release version, list of merged PRs, and Qase test run summary. Ask AI to generate a deployable scope document (what's in, what's out, migration notes, env changes) and a test plan outline. Review for accuracy. AI won't know about verbal deferrals or untested areas unless you tell it.
- Review pipeline security. Paste your pipeline YAML and ask AI to audit it for security issues: exposed secrets, missing environment scoping, overly broad permissions, unsigned artifacts. This is a useful pre-review check before the pipeline goes through code review.
Resources
S&P internal:
- S&P Git Workflow (Confluence) -- Existing branching strategy documentation
- Engineering-forward backend template -- Pipeline configuration templates
- Source Control -- Branching strategy and merge conventions
- Security -- Automated security pipeline -- SAST, SCA, and secret scanning configuration
- Testing Strategy -- Test suite structure that feeds the CI test stage
- Observability & Incidents -- Post-deployment monitoring and incident response
Industry references:
- 12-Factor App -- Foundational principles for cloud-deployed applications (particularly III: Config, V: Build/release/run, X: Dev/prod parity)
- Microsoft Code-with-Engineering Playbook -- CI/CD -- Engineering best practices for CI/CD
- Bitbucket Pipelines documentation
- CircleCI documentation
- GitHub Actions documentation
- Keep a Changelog -- Changelog formatting standard
- Semantic Versioning -- Versioning convention
- k6 -- Load testing tool (referenced in pipeline performance testing)
- LaunchDarkly -- Feature flag management platform
- Unleash -- Open-source feature flag platform