QA Reference

Quality at S&P is everyone's responsibility. QA does not sit at the end of a pipeline waiting to reject work -- it provides the structure, rigor, and testing discipline that makes quality repeatable. Dedicated QA engineers bring depth to exploratory testing, test case management, and release verification. But the baseline expectation is the same whether a project has a QA team or not: every feature ships tested, every release passes gates, and every bug has a ticket. This appendix is the lookup reference for QA-specific implementation details -- use the heading list to jump to the concept you need.

Scope

This appendix covers QA-specific implementation patterns, tooling, and workflows for S&P projects. For cross-cutting practices, see the main playbook sections:

Testing Strategy -- testing philosophy, trophy model, release gates
Code Review -- review as a quality gate
CI/CD & Release Process -- pipeline stages, deployment gates

The processes described here apply regardless of tech stack. Examples reference S&P's standard tooling (Playwright, Qase, Jira, Bitbucket Pipelines / CircleCI / GitHub Actions), but the principles translate to any project.

E2E testing with Playwright

E2E tests are the most expensive tests to write and maintain. They cover only the critical user journeys -- the ones where breakage means the product is fundamentally not working. Playwright is the S&P standard for E2E testing.

Why Playwright

Playwright runs outside the browser via the Chrome DevTools Protocol. This architecture gives it meaningful advantages over alternatives:

Multi-browser support. Chromium, Firefox, and WebKit from a single API. No adapters, no compatibility layers.
Native parallelism. Worker-based parallel execution out of the box. No third-party plugins to shard tests.
CI integration. First-class Docker images, tracing, video recording, and screenshot capture. All essential for debugging failures in CI where you cannot reproduce locally.
Auto-waiting. Playwright waits for elements to be actionable before interacting. This eliminates the largest class of flaky E2E tests -- timing issues caused by explicit waits and sleep calls.
API testing. Playwright's request context allows API-level setup and teardown within the same test, which is critical for data seeding (see below).

Industry momentum is decisively in Playwright's favour. New projects should not consider alternatives unless there is a specific technical constraint.

Polyglot note: Cypress remains a viable alternative for teams already invested in it -- migration is not worth the cost unless the test suite is being rewritten anyway. Selenium is appropriate only for legacy projects or cross-platform mobile testing via Appium. For new S&P projects, Playwright is the recommendation without qualification.

What to cover with E2E

E2E tests guard the critical paths. These are the journeys that, if broken, mean the product cannot be used for its primary purpose:

Authentication flow -- login, logout, session expiry, password reset
The core CRUD operation for the product's primary entity (e.g., creating and managing orders, projects, or users)
Payment or billing flows -- if applicable, these are high-consequence and difficult to test at lower levels
Critical integrations -- third-party OAuth, webhook receipt, file upload/download
The onboarding or first-run experience -- a broken onboarding means zero new users

What NOT to cover with E2E

Form validations. Integration tests handle these faster and more reliably. An E2E test that checks 15 validation messages is expensive to maintain and provides little confidence beyond what a component test gives you.
Admin-only configuration screens. Unless they can break user-facing features. The blast radius of an admin bug is usually limited.
Cosmetic or layout concerns. Visual regression tools (Playwright screenshots compared with expect(page).toHaveScreenshot(), or dedicated tools like Percy/Chromatic) handle this better than assertion-based E2E tests.

Keeping E2E stable

Flaky E2E tests are worse than no E2E tests -- they train the team to ignore failures. These patterns keep the suite reliable:

Seed data via API, not the UI. Each test creates its own state through API calls or direct database setup using Playwright's request context. Seeding data through the UI is slow, brittle, and couples your test to unrelated UI flows.
Tests are independent. No test depends on another test having run first. No shared state between tests. If you need to test a multi-step flow, put the entire flow in one test.
Retry as a safety net, not a crutch. Configure 1-2 retries in CI. If a test needs retries to pass reliably, that is a signal to fix the test or the feature -- not to increase the retry count.
Target under 15 minutes for the full E2E suite. If it is slower, you have too many E2E tests, they are doing too much UI-based setup, or they are not running in parallel.
Use test fixtures (Playwright's test.extend) to encapsulate setup and teardown. This keeps tests focused on the journey, not on boilerplate.

Code example

A typical Playwright E2E test for authentication:

import { test, expect } from '@playwright/test';

test('user can log in and see their dashboard', async ({ page }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('securepassword');
  await page.getByRole('button', { name: 'Log in' }).click();

  await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
  await expect(page.getByText('Welcome back')).toBeVisible();
});

A test with API-seeded data:

import { test, expect } from '@playwright/test';

test('user sees their projects on the dashboard', async ({ page, request }) => {
  // Seed data via API -- fast, reliable, independent of UI
  const response = await request.post('/api/test/seed', {
    data: {
      user: { email: 'user@example.com', password: 'securepassword' },
      projects: [
        { name: 'Project Alpha', status: 'active' },
        { name: 'Project Beta', status: 'archived' },
      ],
    },
  });
  expect(response.ok()).toBeTruthy();

  // Test the actual journey
  await page.goto('/login');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('securepassword');
  await page.getByRole('button', { name: 'Log in' }).click();

  await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
  await expect(page.getByText('Project Alpha')).toBeVisible();
  await expect(page.getByText('Project Beta')).not.toBeVisible(); // archived
});

Playwright configuration for CI

Playwright needs specific configuration to run reliably in CI (Bitbucket Pipelines, CircleCI, GitHub Actions). The key differences from local development: headless mode is mandatory, retries absorb transient infrastructure issues, and artifacts (traces, screenshots) are essential for debugging failures you cannot reproduce locally.

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  fullyParallel: true,
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 2 : undefined,
  reporter: process.env.CI
    ? [['html', { open: 'never' }], ['junit', { outputFile: 'test-results/e2e-results.xml' }]]
    : 'html',
  use: {
    baseURL: process.env.BASE_URL || 'http://localhost:3000',
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'on-first-retry',
  },
  projects: [
    {
      name: 'chromium',
      use: { ...devices['Desktop Chrome'] },
    },
    // Add Firefox and WebKit for release testing, not for every PR
    ...(process.env.CI_RELEASE
      ? [
          { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
          { name: 'webkit', use: { ...devices['Desktop Safari'] } },
        ]
      : []),
  ],
});

Key configuration decisions:

Setting	Value	Reasoning
`fullyParallel`	`true`	Tests must be independent anyway -- run them in parallel for speed
`retries`	`2` in CI, `0` locally	CI retries catch infrastructure flakiness; local retries mask real bugs
`workers`	`2` in CI	Standard CI runners have limited CPU; 2 workers balances speed and stability
`trace`	`on-first-retry`	Traces are large; capture them only when debugging a failure
`forbidOnly`	`true` in CI	Prevents accidentally merging a `.only()` that skips the rest of the suite

Project structure for E2E tests

Keep E2E tests in a dedicated directory at the project root, separate from unit and integration tests. This makes it clear which tests are E2E (expensive, slow, require infrastructure) versus unit/integration (fast, isolated).

project-root/
  e2e/
    fixtures/
      auth.fixture.ts          # Reusable auth setup (login, create user)
      seed.fixture.ts          # Data seeding utilities
    pages/
      login.page.ts            # Page Object: login page locators and actions
      dashboard.page.ts        # Page Object: dashboard interactions
    tests/
      auth.spec.ts             # Authentication journey tests
      onboarding.spec.ts       # First-run experience tests
      core-crud.spec.ts        # Primary entity CRUD tests
      payments.spec.ts         # Payment flow tests (if applicable)
    playwright.config.ts       # E2E-specific Playwright config
  src/                         # Application source
  tests/                       # Unit and integration tests

Page Objects are optional but recommended for projects with more than 10 E2E tests. They encapsulate locators and common actions for a page, which reduces duplication and makes tests resilient to UI changes. For smaller suites, inline locators in tests are fine -- do not over-engineer the structure before you need it.

Test case management with Qase

Every S&P project uses Qase as the test case management platform. Qase is where test cases live, test runs are tracked, and testing coverage is visible to the whole team. This is mandatory -- not optional, not "nice to have."

Why a TCM tool matters

Without structured test case management, testing knowledge lives in people's heads. When the QA engineer goes on holiday or a developer rotates off the project, the testing knowledge walks out the door. Qase makes test cases a persistent, shared, reviewable artifact -- just like code.

A spreadsheet is not a substitute. Spreadsheets have no version history meaningful for test cases, no integration with Jira, no test run tracking, and no way to link a failed test to the bug ticket that resulted from it. Qase exists because spreadsheets failed at this job.

What goes into Qase

Content	When it is created	Who creates it
Test cases for acceptance criteria	During sprint planning / before development	QA or developer wearing QA hat
Regression test suite	When a feature reaches stable	QA or developer
Smoke test checklist	At project setup	QA or tech lead
Exploratory test charters	Before bug bash	QA or designated tester
Edge case scenarios	As discovered during development or testing	Anyone

Test case structure

Each test case in Qase should include:

Title -- Clear, action-oriented. Start with a verb. Good: "Verify user can reset password via email link." Bad: "Password reset."
Preconditions -- What state the system needs to be in before the test starts. Be specific: "User with email user@test.com exists and has a verified account."
Steps -- Numbered, specific actions the tester performs. Each step should be unambiguous enough that someone unfamiliar with the feature can execute it.
Expected results -- What should happen at each step or at the end. Observable outcomes, not implementation details.
Priority -- Critical, High, Medium, Low. Critical means "if this fails, the release is blocked."
Type -- Functional, regression, smoke, exploratory.

Naming conventions for test suites:

Level	Convention	Example
Project	`[Project Name]`	`ClientPortal`
Suite (feature area)	`[Feature] - [Scope]`	`Authentication - Login`
Sub-suite	`[Feature] - [Specific flow]`	`Authentication - Password Reset`
Test case	Verb-first, describes the outcome	`Verify user receives reset email within 60 seconds`

Keep the hierarchy shallow -- two levels of nesting is usually enough. Deep nesting makes test cases hard to find and harder to maintain.

Test runs

Before each release, create a test run in Qase from the relevant test suite. This is the formal record of what was tested, what passed, what failed, and what was skipped.

Test run workflow:

Create the run from the regression suite (or a subset relevant to the release scope).
Assign testers -- distribute test cases across available team members.
Execute and record results -- passed, failed, blocked, or skipped. For failures, add notes describing the actual behaviour.
Link defects -- failed test cases get a linked Jira issue created directly from Qase.
Review the run -- after execution, the run summary shows coverage and pass rate. This is your release readiness signal.

Track results in the tool, not in Slack messages or spreadsheets. The test run becomes the audit trail for what was tested and what was not.

Jira integration

Qase integrates with Jira bidirectionally. Set this up during project initialization -- not after the first release crisis:

Failed test to bug ticket. When a test case fails during a test run, create a linked Jira issue directly from Qase. The Jira ticket automatically includes a link back to the failed test case, providing traceability from bug to test to fix.
Story to test case. Link test cases to the Jira stories they verify. This makes it visible during sprint planning which stories have test coverage and which do not.
Test run to release. Tag test runs with the sprint or release version. Over time, this builds a history of release quality that is useful for retrospectives and stakeholder reporting.

Qase project setup checklist

When starting a new S&P project, set up Qase before the first sprint:

Create the Qase project. Use the same name as the Jira project for consistency.
Configure the Jira integration (Settings > Integrations > Jira Cloud).
Create the top-level test suites matching major feature areas.
Create a smoke test suite with 5-10 critical path test cases.
Create a regression suite (initially empty -- it grows as features stabilize).
Add all team members with appropriate roles (admin for QA lead, member for developers).
Document the Qase project URL in the project's README or Confluence space.

Huddle testing and bug bashes

Huddle testing and bug bashes are structured internal testing practices. They catch the bugs, UX friction, and edge cases that automated tests miss -- by putting the software in front of real people who use it with real intent before it reaches users.

This is a deliberately new practice at S&P. The goal is to build a culture where testing is a team activity, not a department function.

Huddle testing (per-feature)

When a feature is complete and deployed to staging, the developer who built it runs a short huddle test with 1-2 colleagues. Ideally, at least one participant is unfamiliar with the feature -- fresh eyes catch what the builder's eyes skip.

Format (15-20 minutes):

The developer gives a 2-minute walkthrough of what the feature does and what problem it solves.
The colleague(s) use the feature without guidance -- try to accomplish the task it is designed for.
Note any confusion, friction, unexpected behaviour, or bugs. The developer observes but does not help unless asked.
Developer captures findings as Jira tickets or addresses them immediately if trivial.

When to huddle test:

Any user-facing feature or significant UI change -- always.
Backend-only changes -- not needed. The integration test suite covers these.
Refactors or infrastructure work -- not needed unless the change affects observable behaviour.
Bug fixes -- only if the fix changes how the user interacts with the feature.

What makes a good huddle test participant: Someone who will use the feature the way a real user would, not the way the developer built it. Product managers and designers are excellent huddle testers because they approach the feature from the user's perspective, not the implementation perspective.

Bug bashes (per-sprint)

A bug bash is a time-boxed, cross-functional session where the team systematically breaks the software before it ships. S&P runs one per sprint, 2-3 days before the sprint-end release.

Format:

Phase	Duration	What happens
Setup	5-10 min	The facilitator (rotating role) defines the scope: which features to focus on, which areas are high-risk, what is new since last sprint. Distribute any exploratory test charters from Qase.
Testing	45-60 min	Everyone tests independently. Focus on exploratory testing -- try to break things, test edge cases, use the product in unexpected ways.
Debrief	15-20 min	Gather findings, triage bugs by severity, create Jira tickets tagged with `bug-bash-sprint-XX`.

Who participates:

Developers, QA (where available), designers, product managers. The more diverse the perspectives, the more bugs you find. A designer catches UX issues a developer will not see. A PM tests the actual user workflow, not just the happy path the developer built for.

What to focus on:

New features shipped this sprint -- the highest-risk area by definition.
Areas adjacent to changes -- regressions hide where you are not looking.
Cross-browser and cross-device behaviour -- test on at least one non-Chrome browser and one mobile viewport.
Data edge cases -- empty states, very long strings, special characters, large datasets, zero-item lists.
Permission and role boundaries -- test as different user roles, especially the transitions between what each role can and cannot do.

Cadence: Every sprint. This is not optional and not the first thing to cut when the sprint is busy. Skipping the bug bash to save an hour costs the 3am production incident that the bash would have caught.

Bug bash facilitation tips

The facilitator role rotates each sprint. Good facilitation makes the difference between a productive session and a waste of an hour:

Prepare a focus list before the session. List the features and areas to test, ranked by risk. Share it in Slack or email the day before so people can think about what to try.
Assign areas, not tasks. Tell people "focus on the checkout flow" not "test that the coupon code field validates correctly." Exploratory testing works because testers choose their own path.
Timebox strictly. The debrief is where value is captured. Do not let the testing phase run long and eat into triage time.
Triage ruthlessly. Not every finding is a blocker. Categorize as P0 (blocks release), P1 (fix this sprint), P2 (backlog), or "not a bug" -- and move on.

Manual and exploratory testing

Manual testing is the last layer of quality assurance -- the one that catches what automation cannot. Automated tests verify that the system does what the code says it should. Manual testing verifies that the system does what the user actually needs, that the experience makes sense, and that edge cases humans would hit in the real world are handled gracefully.

What manual testing covers that automation does not

Usability and UX issues. Confusing flows, misleading labels, unexpected behaviour that is technically "correct" but practically wrong.
Visual inconsistencies. Layout that feels off, spacing that is wrong, animations that are jarring -- things that are not caught by pixel-diff visual regression tools.
Complex multi-step workflows. Where the order and timing of user actions matters in ways that are difficult to encode in automated tests.
Cross-feature interactions. Feature A and feature B both work in isolation, but break when used together. Automation tests features individually; humans test them in combination.
Real-world data patterns. What happens when a user pastes formatted text from Word, uploads a 0-byte file, uses a 50-character hyphenated name, or enters an address with special characters?

When manual testing happens

Manual testing is not a phase -- it happens continuously at different intensities:

When	Who	Focus
During development	Developer	Self-test on staging before marking story as done -- not just "does it work" but "would I be comfortable if a client used this right now?"
After feature deployment	Developer + colleague	Huddle testing (see above)
Before sprint release	QA / team	Bug bash (see above)
Structured QA sessions	QA	Testing against test cases in Qase -- systematic verification of acceptance criteria and regression areas
Pre-release verification	QA / developer QA-hat	Final check against acceptance criteria before release sign-off

Exploratory testing charters

Unstructured "just click around" testing finds some bugs, but structured exploratory testing finds more. A charter gives the tester a mission without dictating the exact steps:

Charter template:

Target:    [Feature or area to explore]
Mission:   [What you are trying to learn or break]
Duration:  [Time box -- typically 30-60 minutes]
Notes:     [Anything discovered during the session]
Bugs:      [Jira ticket IDs for any defects found]

Example charters:

Target	Mission
User registration	Explore edge cases in the registration flow: special characters in names, disposable email addresses, concurrent registrations with the same email
Dashboard performance	Explore how the dashboard behaves with large datasets: 1000+ items, slow network (throttle in DevTools), rapid navigation between pages
Permission boundaries	Explore what happens when a user with "viewer" role attempts actions restricted to "editor" -- try URL manipulation, API calls, and UI interactions

Session-based testing

For larger testing efforts (new product launch, major feature overhaul), use session-based testing management (SBTM):

Plan sessions. Each session has a charter, a time box (60-90 minutes), and an assigned tester.
Execute sessions. The tester follows the charter, taking notes on findings, questions, and areas that need more investigation.
Debrief sessions. After each session, the tester reports findings to the team. Bugs get Jira tickets. Areas that need more exploration get new charters.
Track coverage. Keep a simple matrix of features vs. sessions completed. This shows where exploratory testing has been thorough and where gaps remain.

Relationship to automation

Manual testing complements automation -- it does not replace it, and automation does not replace it. The rule of thumb: if you find yourself manually testing the same flow more than three times, that is a signal to automate it. Manual testing should focus on the novel, the subjective, and the unexpected. Automation should handle the repetitive and the regression.

QA integration patterns

S&P projects vary in QA staffing -- some have dedicated QA engineers, others rely on developers alone. Regardless of staffing, the testing responsibilities are the same. What changes is who fills each role.

Shift-left principle

QA involvement starts at sprint planning, not after development is "done." Whether the QA role is filled by a dedicated engineer or a developer wearing the QA hat, the activities are the same:

Activity	When	Who
Review acceptance criteria for testability	Sprint planning	QA / designated reviewer
Write test scenarios for the story	Before development starts	QA / designated reviewer
Write unit + integration tests	During development	Developer
Exploratory testing on staging	After feature is deployed to staging	QA / another developer
E2E tests for new critical paths	During or after development	Developer or QA
Bug bash participation	End of sprint	Everyone

With dedicated QA

When a project has a dedicated QA engineer, the division of responsibility is clear:

QA writes and maintains the E2E test suite. QA owns the Playwright tests because they understand the critical paths from the user's perspective, not just the implementation perspective.
QA performs structured exploratory testing on staging. Using charters and session-based testing (described above), not ad-hoc clicking.
QA signs off on stories before they move to "done." This is a quality gate, not a bottleneck. If QA consistently blocks stories, the problem is upstream (unclear acceptance criteria, incomplete development), not with QA.
Developers are still responsible for unit and integration tests. QA does not write unit tests. That is the developer's job.
QA manages Qase. Test case creation, test run management, regression suite maintenance, and Jira integration.

Without dedicated QA

When a project does not have a dedicated QA engineer, the same activities still need to happen. The difference is how they are distributed:

Developers pair-review each other's work on staging. Not just code review -- actually use the feature. Open the browser, click through the flow, test the edge cases.
Rotate the "QA hat" each sprint. One developer is designated to focus on exploratory testing and E2E test writing for the sprint. This is a real allocation -- it should be visible in sprint planning, not an afterthought.
The huddle testing practice becomes critical. Without a QA engineer to catch issues, the team must catch them collectively. Skipping huddle tests on a project without QA is asking for production bugs.
Test case management still happens in Qase. The developer wearing the QA hat owns Qase for the sprint -- creating test runs, recording results, linking defects.

QA onboarding checklist for new projects

When QA joins a new project (or a developer takes on the QA role), complete these steps in the first sprint:

Get access. Jira, Qase, staging environment, CI/CD pipeline dashboard, Slack channels.
Understand the product. Read the product brief or PRD. Walk through the application as a user before reading any code.
Review existing test coverage. What is automated? What is manual? Where are the gaps?
Set up Qase. If not already done, follow the Qase project setup checklist above.
Create the smoke test suite. Identify the 5-10 most critical user journeys and create test cases for them.
Attend sprint planning. QA input on acceptance criteria starts from sprint one, not "once you're up to speed."
Run a baseline exploratory session. Spend 60-90 minutes exploring the application with no specific charter. Document initial findings -- bugs, UX concerns, areas that need deeper testing.
Set up the E2E framework. If Playwright is not already configured, set it up with the CI configuration described in this appendix.
Schedule the first bug bash. Even if it is a team of two, establish the cadence from the start.

Resources

Playwright Documentation -- official guides, API reference, best practices
Playwright Best Practices -- the Playwright team's own recommendations for reliable tests
Qase -- test case management platform (S&P standard)
Qase + Jira Integration -- setup guide for linking test cases to Jira issues
Qase Documentation -- full documentation for test case management, test runs, and reporting
ISTQB Foundation Level Syllabus -- foundational testing concepts and terminology
ISTQB Glossary -- standard definitions for testing terms
Session-Based Test Management -- James Bach's original SBTM methodology
Exploratory Testing -- foundational reading on structured exploratory approaches
Microsoft Code-with-Engineering Playbook -- Automated Testing -- patterns for test automation in CI/CD
Bitbucket Pipelines (Parallel Steps) parallelising test execution in Bitbucket Pipelines
CircleCI (Parallelism) splitting test suites across parallel containers
GitHub Actions (Matrix strategies) running tests in parallel across job variants

Scope​

E2E testing with Playwright​

Why Playwright​

What to cover with E2E​

What NOT to cover with E2E​

Keeping E2E stable​

Code example​

Playwright configuration for CI​

Project structure for E2E tests​

Test case management with Qase​

Why a TCM tool matters​

What goes into Qase​

Test case structure​

Test runs​

Jira integration​

Qase project setup checklist​

Huddle testing and bug bashes​

Huddle testing (per-feature)​

Bug bashes (per-sprint)​

Bug bash facilitation tips​

Manual and exploratory testing​

What manual testing covers that automation does not​

When manual testing happens​

Exploratory testing charters​

Session-based testing​

Relationship to automation​

QA integration patterns​

Shift-left principle​

With dedicated QA​

Without dedicated QA​

QA onboarding checklist for new projects​

Resources​