Testing Strategy
Testing is how you prove your software works, not to a machine, but to yourself, your team, and your users. A good testing strategy doesn't aim for perfection; it aims for confidence. Confidence that the feature you just built does what it claims, that the bug you just fixed won't come back, and that the release you're about to ship won't wake someone up at 3am.
Why this matters
Every team says they value quality. The difference between teams that actually deliver quality and teams that just talk about it is whether they've built testing into how they work, not as an afterthought, not as a gate at the end, but as a continuous practice woven through every stage of development.
At S&P, this connects directly to Integrity (we ship what we promise), Care (we respect our users' time and trust), and Teamwork (a well-tested codebase is one your colleagues can change with confidence). Testing isn't overhead, it's how we maintain the ability to move fast without breaking things.
The standard
The testing model: the modified testing trophy
S&P follows a modified version of Kent C. Dodds' testing trophy, adapted for our NestJS + React/Next.js stack. The core principle: the more your tests resemble the way your software is used, the more confidence they give you.
◇ ◇ ◇ ◇ Manual / exploratory QA (last layer, human judgment)
/\
╱ ╲ E2E tests (critical user journeys)
╱ ╲
╱ ╲
╱ ╲
╱──────────╲ Integration tests (the bulk of your suite)
╱ ╲
╱──────────────╲
╱ ╲ Unit tests (complex pure logic only)
╱══════════════════╲ Static analysis (TypeScript strict + ESLint)
Why not the classic test pyramid? The pyramid was designed for a world where integration tests were slow and expensive. In a modern NestJS + React stack, an integration test that hits a real PostgreSQL container runs in milliseconds. Writing hundreds of unit tests for NestJS controllers that just delegate to services gives you a high test count and low confidence. Integration tests cover more surface area per test and catch the bugs that actually reach production, the ones that live in the seams between components.
What this means in practice:
| Layer | What it covers | Proportion | Speed |
|---|---|---|---|
| Static analysis | Type errors, lint violations, dead code | Foundation (not counted) | Instant |
| Unit tests | Pure functions, complex business logic, algorithms, utilities | ~20% of test suite | < 1ms per test |
| Integration tests | API endpoints with real DB, React components with MSW, service interactions | ~60% of test suite | < 100ms per test |
| E2E tests | Critical user journeys across the full stack | ~20% of test suite | 1-10s per test |
| Manual / exploratory QA | Edge cases, usability, flows that automation can't cover | Last layer (not counted in suite) | Human-paced |
Static analysis:
Static analysis is not optional. It catches an entire class of bugs at zero runtime cost, before any test runs. This is the floor, not the ceiling.
What we require:
- TypeScript in strict mode (
"strict": trueintsconfig.json). Noanyas an escape hatch in production code. If you needany, you're missing a type definition: write it. - ESLint or Biome with project-relevant rules enforced in CI. Not as a suggestion; as a blocking check.
- Prettier or Biome for formatting so nobody spends review cycles on semicolons or indentation.
These run on every commit via pre-commit hooks and in CI. If static analysis fails, nothing else runs. There's no point running a test suite against code that doesn't type-check.
Unit tests:
Unit tests are for pure logic that's complex enough to get wrong: algorithms, calculations, data transformations, validation rules, custom utility functions. If the function takes inputs and returns outputs with no side effects, and the logic isn't trivial, unit-test it.
What to unit test:
- Business logic in service methods that doesn't touch the database or external APIs
- Data transformation and mapping functions
- Validation logic (custom validators, parsing rules)
- Utility functions with non-trivial logic (date formatting, currency calculations, string processing)
- Custom React hooks with complex state management
What not to unit test:
- NestJS controllers that delegate to services (test these at integration level)
- Simple CRUD service methods (integration tests cover these with more confidence)
- React components that just render props (test these via integration or visual regression)
- Framework wiring (decorators, middleware registration, module configuration)
- Getter/setter functions or trivial one-liners
Test runner: Jest is the current standard across S&P projects. When NestJS 12 delivers first-class Vitest support, new projects should adopt Vitest for its faster execution and native ESM support. Existing projects migrate opportunistically, don't rewrite a working test suite for a speed improvement.
Structure every test with AAA:
describe('PriceCalculator', () => {
describe('calculateDiscount', () => {
it('applies percentage discount when order exceeds minimum threshold', () => {
// Arrange
const calculator = new PriceCalculator();
const order = { subtotal: 150, discountCode: 'SAVE10' };
// Act
const result = calculator.calculateDiscount(order);
// Assert
expect(result).toBe(135);
});
});
});
Naming convention: Test names describe the behaviour, not the implementation. Follow the pattern: "[what is being tested] [under what conditions] [expected outcome]." Someone reading the test name alone should understand what broke when it fails.
Integration tests:
Integration tests are where you get the most confidence per test. They exercise real interactions, an HTTP request hitting your API, flowing through middleware, calling a service, querying a real database, and returning a response. On the frontend, they render a component, simulate user interactions, and verify what appears on screen with realistic (but controlled) network responses.
Backend integration tests (NestJS)
The approach: Test your API endpoints through HTTP using Supertest, against a real PostgreSQL database running in Testcontainers. No mocking the database. No SQLite stand-ins. The database under test is the same engine you run in production.
Why Testcontainers: A PostgreSQL container spins up in seconds, gives you a real database with real constraints, real JSON operations, and real query behaviour. Tests that pass against Testcontainers pass against production. Tests that pass against SQLite? You're guessing.
What to test at this level:
Every integration test should verify at least one of five outcomes:
- Response data: The API returns the correct status code and body
- State changes: The database reflects the expected change (row created, updated, deleted)
- Outgoing calls: External services received the expected request (use MSW or nock to intercept)
- Messages and events: The expected message was placed on the queue or event was emitted
- Error handling: Invalid input returns the correct error response and doesn't corrupt state
Test isolation: Each test creates its own data and cleans up after itself. No shared seeds across tests. No reliance on insertion order. Tests run in parallel: shared state is the number one cause of flaky integration tests.
For NestJS integration test setup (Testcontainers, Supertest, test module bootstrap, database seeding), see Backend Reference. Testing setup.
Frontend integration tests (React / Next.js)
The approach: Render components using React Testing Library, intercept network requests with MSW (Mock Service Worker), and interact with the component the way a user would: clicking buttons, filling forms, reading what appears on screen.
What to test at this level:
- User flows within a page or feature (fill form, submit, see confirmation)
- Component behaviour with different API responses (success, error, loading, empty state)
- Conditional rendering based on user roles or feature flags
- Form validation feedback visible to the user
Query priority: Use queries that reflect how users find elements: getByRole, getByLabelText, getByText. Avoid getByTestId except as a last resort for elements with no accessible role or text. If you can't query an element without a test ID, that's often an accessibility problem worth fixing.
For React Testing Library + MSW setup (Vitest config, render helpers, code examples), query priority table, and accessibility testing patterns, see Frontend Reference. Testing setup.
E2E tests:
E2E tests are the most expensive tests to write and maintain, so they cover only what matters most, the critical user journeys that, if broken, would mean the product is fundamentally not working. Think of E2E as your "the building is on fire" alarm, not your smoke detector.
Framework: Playwright. Playwright runs outside the browser via the Chrome DevTools Protocol, supports Chromium, Firefox, and WebKit, has native parallelism, and integrates well with CI platforms (Bitbucket Pipelines, CircleCI, GitHub Actions).
For Playwright setup, what to cover with E2E, stability patterns, and code examples, see QA Reference -- E2E testing with Playwright.
Local testing:
Testing starts on the developer's machine, not in CI. If tests are slow, painful, or confusing to run locally, developers will skip them, and no amount of CI enforcement fixes a broken local workflow.
What every developer should be able to do in under 30 seconds:
- Run the full unit + integration test suite for the module they're working on
- Run a single test file in watch mode while developing
- See clear, readable output when a test fails, what was expected, what actually happened, where
Local setup requirements:
- Docker running (for Testcontainers in backend integration tests)
npm testor the equivalent runs the relevant suite with sensible defaults- Watch mode enabled by default during development (
--watchflag) - No manual database setup, seed scripts, or environment juggling to run tests
Frontend local testing:
- Run the dev server (
npm run dev) and verify your changes in the browser before pushing - For UI work: test the golden path and at least two edge cases (empty state, error state) in the browser manually
- Run
npm testto execute the integration test suite before opening a PR
Backend local testing:
- Run integration tests against Testcontainers to verify API behaviour
- Use a tool like Postman, Insomnia, or
curlfor ad-hoc API exploration during development - Test error paths explicitly: what happens when the input is invalid, the database constraint is violated, the external service is down?
Dev environment testing
When a feature branch is deployed to a shared development environment, it should be tested beyond what automated tests cover. This is where the developer verifies their work in a production-like setting.
What to verify on the dev environment:
- The feature works end-to-end with real (or realistic) data, not just test fixtures
- The feature interacts correctly with other services and dependencies in the environment
- Environment-specific configuration (API keys, feature flags, URLs) is correct
- No regressions in adjacent features that share the same data or UI surface
Who does this: The developer who built the feature. This is not a handoff to QA, it's the developer confirming their work before inviting others to review it.
Huddle testing and bug bashes: structured internal testing
This is a new practice at S&P. The goal is to catch the bugs, UX friction, and edge cases that automated tests miss: by putting the software in front of real people who use it with real intent before it reaches users.
Huddle testing is a per-feature, 15-20 minute session where the developer who built a feature walks 1-2 colleagues through it, then watches them use it without guidance. Bug bashes are per-sprint, cross-functional sessions where the whole team systematically breaks the software before it ships: run 2-3 days before the sprint-end release.
For the complete huddle testing format, bug bash facilitation guide, and focus area checklist, see QA Reference -- Huddle testing and bug bashes.
Manual and exploratory testing:
Manual testing is the last layer of quality assurance, the one that catches what automation can't. Automated tests verify that the system does what the code says it should. Manual testing verifies that the system does what the user actually needs, that the experience makes sense, and that edge cases humans would hit in the real world are handled gracefully.
Manual testing is not a replacement for automation, it's the complement. If you find yourself repeatedly testing the same flow manually, that's a signal to automate it. Manual testing should focus on the novel, the subjective, and the unexpected.
For what manual testing covers, when it happens in the workflow, and exploratory testing charters, see QA Reference -- Manual and exploratory testing.
Test case management: Qase:
Every S&P project uses Qase as the test case management platform. Qase is where test cases live, test runs are tracked, and testing coverage is visible to the whole team. This is mandatory, not optional, not "nice to have."
For Qase project setup, test case structure, test run workflow, and Jira integration, see QA Reference -- Test case management with Qase.
QA integration:
S&P projects vary in QA staffing: some have dedicated QA engineers, others rely on developers alone. Regardless of staffing, the testing responsibilities are the same. What changes is who fills each role.
Shift-left principle: QA involvement starts at sprint planning, not after development is "done."
For the full QA activity matrix, patterns for teams with and without dedicated QA, and the QA onboarding checklist, see QA Reference -- QA integration patterns.
Release testing gates
Release procedures are deliberately prescriptive. The cost of ambiguity when you're shipping to production is too high.
Every sprint-end release follows this sequence. Each gate must pass before proceeding to the next. No exceptions, no "it's probably fine." For stakeholder communication, release documentation, and release day procedures, see CI/CD & Release Process -- Release management.
Gate 1: CI pipeline green
All automated tests pass on the release branch:
- Static analysis (TypeScript + ESLint): zero errors
- Unit tests: 100% pass rate
- Integration tests: 100% pass rate
- E2E critical path tests: 100% pass rate
If the pipeline is red, the release stops. If the failure is a flaky test, fix or quarantine the flaky test, do not re-run the pipeline and hope.
Gate 2: QA/Staging deployment and smoke test
The release candidate is deployed to staging. A smoke test suite (automated, under 90 seconds) runs immediately and verifies:
- The application starts and serves requests
- Database connectivity is working
- Authentication flow completes successfully
- The top 3-5 most critical features respond correctly
- No errors in application logs during smoke test
Gate 3: Bug bash or targeted exploratory testing
For sprints with significant feature work: run the full bug bash (described above).
For sprints with only minor changes or fixes: a targeted exploratory session (30 minutes) by the relevant developer(s) and QA (if available), focused on the changed areas and their adjacent features.
- Bug bash or targeted session completed
- All critical and high-severity findings addressed or consciously deferred with a Jira ticket
- No P0/P1 bugs remaining open for the release scope
Gate 4: QA sign-off (where applicable)
On projects with dedicated QA:
- QA has verified all acceptance criteria for stories in the release
- Regression testing on key flows is complete
- QA approves the release candidate
On projects without dedicated QA, the developer wearing the QA hat for the sprint provides the sign-off.
Gate 5: Production deployment and verification
- Deploy to production
- Run production smoke test (same checks as staging, adapted for production URLs)
- Monitor error rates and key metrics for 30 minutes post-deploy
- If error rates spike or smoke tests fail: roll back immediately, investigate, and re-enter the gate sequence
Testing in CI:
Not every test needs to block every merge. The goal is fast feedback on what matters and thorough validation before release.
Blocks PR merge (runs on every push to a PR):
- Static analysis (TypeScript + ESLint)
- Unit tests
- Integration tests
- E2E critical-path smoke subset (the 5-10 most important journeys, under 5 minutes)
Runs post-merge on staging (does not block individual PRs):
- Full E2E suite
- Performance benchmarks (if configured)
- Visual regression tests (if configured)
Runs before release (blocks release, not individual merges):
- Complete test suite across all layers
- Staging smoke tests
- Security scanning (dependency audit, SAST)
Parallelisation: Split test suites across parallel CI workers using timing-based sharding (split by historical run duration, not file count). Bitbucket Pipelines, CircleCI, and GitHub Actions all support parallel jobs natively. Target: the full PR-blocking suite completes in under 10 minutes.
Flaky test management: A flaky test is worse than no test: it trains the team to ignore failures. When a test flakes:
- Immediately move it to a quarantine group (runs but doesn't block)
- Create a Jira ticket to fix or delete it within the current sprint
- Track flaky test rates in your CI platform (Bitbucket Pipelines has built-in flaky test detection; CircleCI and GitHub Actions require test result reporting or third-party tools)
- If a quarantined test isn't fixed within one sprint, delete it and write a better one
Test coverage:
Coverage tells you what code your tests execute. It does not tell you whether your tests verify the right things. A test that calls every line but asserts nothing has 100% coverage and 0% value.
S&P's approach:
- Track coverage on critical paths: Authentication, payments, data processing, and core business logic should maintain at least 80% line coverage. This is a diagnostic floor, if coverage drops below it, investigate what's untested, don't blindly add tests to raise the number.
- No global coverage mandate. We don't chase a repository-wide percentage. We'd rather have 60% coverage with meaningful assertions than 90% coverage with tests that just execute code without checking outcomes.
- Use coverage reports to find gaps, not to prove quality. Run coverage tools (Istanbul/c8) and look at what's uncovered. Untested error handlers, uncovered catch blocks, and unreachable branches are the interesting findings, not the percentage at the top.
- Never set coverage as a merge gate. Coverage gates incentivise gaming: writing tests that touch lines without verifying behaviour. Review the tests themselves in code review, that's where quality lives.
Critical thinking
- Match test investment to risk. A CRUD screen for an internal admin tool doesn't need the same test suite as the payment flow. Calibrate your testing effort to the blast radius of failure and the cost of a bug reaching production.
- Don't test the framework. NestJS, React, PostgreSQL, and Playwright are heavily tested by their maintainers. You don't need to verify that
@Injectable()works or that React renders a<div>. Test your logic, not theirs. - Flaky tests are not normal. If your test suite has tests that "sometimes fail," you don't have a test problem, you have a reliability problem. Either the test has a race condition, the feature has a race condition, or the test infrastructure is under-resourced. All three are worth fixing.
- More tests is not always better. A test suite that takes 45 minutes to run will be skipped locally, dreaded in CI, and gradually ignored. A fast, focused suite that runs in 5 minutes gets run constantly and catches issues early. Prune ruthlessly.
- Integration tests can replace most unit tests. If an integration test already covers the behaviour, a unit test for the same behaviour is redundant. Test the behaviour once, at the level that gives the most confidence.
- E2E tests are not a safety net for missing lower-level tests. If your E2E test fails and you can't reproduce the bug with a unit or integration test, that's a sign your lower layers are undertested. Fix the gap, then the E2E test becomes confirmation, not discovery.
- Testing is not QA's job. Every developer is responsible for the correctness of their code. QA provides an additional perspective, exploratory coverage, and process rigour, but if the only thing standing between your bug and production is a QA engineer, the process has already failed.
- Team-size pragmatism. A two-person project doesn't need a formal bug bash. A 15-person project shipping to 100k users absolutely does. Scale the ceremony to the risk, not to the playbook.
Checklist
Before opening a PR
- I ran the relevant test suite locally and it passes.
- I added or updated unit tests for any new business logic.
- I added or updated integration tests for any new API endpoints or component behaviour.
- For new critical user journeys: I added or updated E2E tests.
- I tested edge cases, not just the happy path (empty input, invalid data, error responses).
- Test names describe the behaviour being verified, not the implementation.
Before merging
- CI pipeline is green (static analysis + unit + integration + E2E smoke).
- Code review includes review of test quality, not just feature code.
Before sprint release
- Gate 1: CI pipeline green on the release branch.
- Gate 2: Staging deployment and automated smoke test pass.
- Gate 3: Bug bash or targeted exploratory testing completed. No open P0/P1 bugs.
- Gate 4: QA sign-off (or developer QA-hat sign-off) received.
- Gate 5: Production deployment, smoke test, and 30-minute monitoring complete.
Test case management (ongoing)
- Qase project is set up and linked to Jira.
- Test cases exist for all acceptance criteria in the current sprint.
- Regression test suite is maintained and updated when features change.
- Test runs are created and tracked in Qase before each release.
- Failed test cases have linked Jira issues.
Test health (ongoing)
- No tests in quarantine for more than one sprint.
- Critical path coverage is at or above 80%.
- Full PR-blocking CI suite runs in under 10 minutes.
- E2E suite runs in under 15 minutes.
AI tips
- Generating test cases for Qase (AI Prompt Library): Use AI to generate structured test cases from multiple inputs: Figma designs (export as images or description), user stories from Jira, and (for existing projects) the actual codebase. Feed AI the acceptance criteria, screenshots of the UI, and relevant source files, then ask it to generate Qase-formatted test cases with preconditions, numbered steps, and expected results. AI is especially good at generating variations: happy path, error states, boundary values, permission-based scenarios, and cross-browser edge cases. For existing projects, point AI at the codebase to generate regression test cases that cover existing behaviour before changes are made. Review and refine the output. AI generates comprehensive coverage but may miss domain-specific edge cases that only someone who knows the business logic would catch.
- Generating test skeletons: Describe a function or API endpoint to AI and ask it to generate the test structure (describe blocks, test names, arrange/act/assert scaffolding) for the key scenarios. Then fill in the assertions yourself. AI is good at identifying which cases to test; you're better at knowing what the correct behaviour actually is.
- Finding untested edge cases: Paste a function or endpoint and ask AI "what edge cases am I probably not testing?" AI is surprisingly good at spotting null inputs, boundary conditions, concurrent access patterns, and error paths that developers overlook because they built the happy path first.
- Writing test data factories: If you need realistic test data (users, orders, products), describe the shape and constraints to AI and let it generate factory functions. Review the output for realistic values. AI sometimes generates data that's technically valid but would never appear in production.
- Debugging flaky tests: Describe the test, how it fails, and when it fails to AI. Flaky tests usually have a small set of root causes (race conditions, time-dependent assertions, shared state, network timing). AI can often pattern-match to the cause faster than manual investigation.
- E2E test scenarios: Before writing Playwright tests, describe the user journey to AI and ask it to outline the test steps and assertions. This often surfaces steps you'd forget (checking loading states, verifying URL changes, confirming data persistence after navigation).
Resources
- Kent C. Dodds. Testing Trophy and Testing Classifications
- Kent C. Dodds. Write tests. Not too many. Mostly integration.
- Martin Fowler. Practical Test Pyramid
- Node.js Best Practices. Testing and Quality
- Bulletproof React. Testing
- Microsoft Code-with-Engineering Playbook. Automated Testing
- Playwright Documentation
- React Testing Library. Guiding Principles
- Testcontainers for Node.js
- MSW. Mock Service Worker
- Bitbucket Pipelines. Flaky Test Management
- CircleCI. Parallelism
- GitHub Actions. Matrix strategies
- Qase. Test case management platform (S&P standard)
- Qase + Jira Integration. Link test cases to Jira issues
- S&P Engineering Principles
- S&P Code Review