AI-Assisted Engineering

AI is a force multiplier, not a replacement. The value isn't in generating code faster, it's in compressing the feedback loop between thinking and validating. A developer who uses AI well doesn't produce more code; they produce better code with less wasted effort. But only if they know when to trust the output, when to challenge it, and when to close the tool and think for themselves.

Why this matters

Software engineering has always been about managing complexity. AI tools don't eliminate that complexity, they shift where the effort goes. Instead of typing boilerplate, you spend time reviewing generated output. Instead of searching documentation, you spend time verifying that the AI's answer is current and correct. The net result can be a significant productivity gain, but only if the team adopts AI deliberately, with clear standards for when it helps and when it hurts.

S&P's value of Evolution means we adopt tools that genuinely improve how we work, not tools that are merely new. AI in engineering is worth adopting because the evidence is clear: it accelerates repetitive tasks, surfaces edge cases humans miss, and lowers the barrier to exploring unfamiliar parts of the stack. But evolution also means honest evaluation: recognising where AI falls short and building practices that account for its limitations.

The standard

The human-in-the-loop principle

Every AI-assisted workflow at S&P follows one rule: a human reviews and approves AI output before it affects production code, documentation, or infrastructure. This is not optional, and it is not a formality. It is the engineering practice.

Why this matters concretely:

AI models generate plausible output, not provably correct output. Plausible code passes a glance review. Correct code passes a thorough one.
AI has no understanding of your project's business rules, client constraints, or team conventions unless you provide that context explicitly.
AI-generated code can introduce security vulnerabilities, performance regressions, and subtle logic errors that compile and pass basic tests but fail in production.

The human-in-the-loop is not a bottleneck to optimise away. It is the quality gate. Treat AI output the way you'd treat a PR from a new contractor: assume competence, but verify everything.

The 4D Framework for AI Fluency

The human-in-the-loop rule says you must stay in control. The 4D Framework says how. Developed by Anthropic with Rick Dakan and Joseph Feller, it breaks effective and responsible AI collaboration into four competencies: Delegation, Description, Discernment, and Diligence. We adopt it because it gives the team one shared vocabulary for what "using AI well" actually means, and because each competency already maps onto a practice covered in this section. Treat the framework as the mental model; the subsections that follow are the depth.

Delegation: decide what to hand off. Before you prompt, decide whether a task belongs to you, the AI, or a collaboration of both. This means knowing your goal and breaking the work into parts (goal and task awareness), knowing the tool's real strengths and limits (platform awareness), and matching each part to whoever does it best (task delegation). The engineering judgment lives in "When NOT to use AI" below: architecture decisions and novel domain logic stay with you, scaffolding and boilerplate go to the AI. Delegating a task the AI cannot do well wastes more time than doing it yourself.

Description: communicate the work clearly. This is prompting, but the framework's point is sharper: describing a task precisely is a professional communication competency, not a trick. It spans the output you want (product), the iterative dialogue to get there (process), and how an AI feature should behave for end users (performance). The depth is in "Prompt engineering for engineering tasks" below: be specific, provide architectural context, prefer specs over prose.

Discernment: judge what comes back. AI is optimised to sound fluent and confident, not to be correct (see Critical thinking). Discernment is the discipline of not trusting output at face value: evaluating the result (product), judging whether the collaboration is actually working (process), and assessing how an AI feature behaves in front of users (performance). This is the competency that most separates effective AI users from the rest, and it has its own home in "Evaluating AI output" and "AI-assisted code review" below.

Diligence: own the result. You are accountable for what you ship, regardless of who or what wrote it. Diligence covers ethical and responsible use (creation), being transparent about AI involvement where industry or legal norms require it (transparency), and fact-checking, testing, and vouching for output before it deploys (deployment). This maps to "Privacy and data considerations" below, the human-in-the-loop principle above, and the rule in Critical thinking that "the AI wrote it" is never an excuse. This is where S&P's value of Integrity is non-negotiable.

These four are not stages you complete once. They are competencies you apply on every AI-assisted task, in roughly that order: delegate, describe, discern, and stay diligent throughout.

Prompt engineering for engineering tasks

Prompting is not a dark art. It is context management. The quality of AI output is directly proportional to the quality of the context you provide. These patterns apply whether you're using Claude, Copilot, or any other AI coding tool.

Be specific about what you want

Vague prompts produce vague output. Compare:

Bad:  "Write a NestJS service for users"
Good: "Write a NestJS service for user registration that validates email
       uniqueness against PostgreSQL, hashes passwords with bcrypt, and
       returns a UserResponseDto (without the password hash). Use the
       repository pattern with TypeORM. Throw ConflictException if the
       email already exists."

The good prompt constrains the output enough that the AI can produce something close to what you actually need. The bad prompt produces a generic CRUD service that you'll rewrite anyway.

Provide architectural context

AI doesn't know your project's structure unless you tell it. When generating code that needs to fit into an existing codebase, include:

The module/service pattern your project uses
Relevant interfaces or DTOs the generated code should conform to
Error handling conventions (see Code Standards)
Naming conventions (file naming, variable casing, function naming)

"We use the following project structure in our NestJS monorepo:
- src/modules/{feature}/
  - {feature}.controller.ts
  - {feature}.service.ts
  - {feature}.repository.ts
  - dto/create-{feature}.dto.ts
  - dto/update-{feature}.dto.ts

All services inject repositories, not the TypeORM EntityManager directly.
All DTOs use class-validator decorators. All errors use NestJS built-in
exception classes (NotFoundException, ConflictException, etc.)

Generate the payments module following this pattern."

Spec-driven prompting

Instead of describing what you want in prose, write a specification (TypeScript interface, OpenAPI fragment, test case) and ask AI to implement it. Specs are unambiguous; prose descriptions are not.

// Spec-driven: give AI this interface, ask it to implement the service
interface AppointmentService {
  create(dto: CreateAppointmentDto): Promise<Appointment>;
  findByDateRange(start: Date, end: Date, clinicId: string): Promise<Appointment[]>;
  cancel(id: string, reason: string): Promise<void>;
}

This produces better output than "write an appointment service that handles creation, date range queries, and cancellation." The interface constrains the output to exactly what you need.

The most effective AI workflow is not "generate the whole thing in one prompt." It is:

Start with the interface. Ask AI to generate the types, DTOs, or API contract first. Review and adjust.
Generate the implementation. With the correct interface locked in, ask AI to implement it.
Generate the tests. With the implementation visible, ask AI to generate test cases. Add edge cases it missed.
Review and refine. Read every line. AI-generated code is a first draft, not a final product.

This mirrors how you'd work with a junior developer: define the contract, let them implement it, review their work, iterate. It also matches the feature implementation order defined in the CLAUDE.md template below: backend domain first, then API layer, then frontend, then tests.

Context window management

AI tools have limited context. What you include (and what you leave out) determines the output quality.

Include:

Relevant type definitions and interfaces
The function or module the generated code interacts with
Error handling patterns from your codebase
A representative example of similar code from the project

Omit:

Unrelated modules or services
Configuration files (unless directly relevant)
Entire test suites (provide one representative test instead)
Long README files or documentation (summarise what's relevant)

When working with large codebases, curating context is more valuable than dumping everything in. A focused 200-line context produces better output than an unfocused 2000-line context.

Claude Code and CLI tooling

Claude Code is S&P's primary AI coding tool. It operates directly in the terminal, reads your codebase, and executes commands, which makes it powerful and demands disciplined usage.

CLAUDE.md: project context that persists

Every S&P project should have a CLAUDE.md file at the repository root. This file is automatically loaded whenever Claude Code operates on the project, providing persistent context about:

Project structure and conventions
Tech stack and versions
Naming conventions and patterns
Common pitfalls or project-specific rules
Links to relevant documentation

# <Project Name> -- AI Context

## Core philosophy
- Spec-driven development: `contracts/api-service/openapi.yaml` generated
  from NestJS is source of truth for all clients
- Use the generated API client in web; never call the API directly
- SOLID principles are respected across backend services

## Key invariants
- Never modify generated code in `packages/api-client/src/` -- fully generated
- TypeORM schema changes require a migration; never use `synchronize: true`
  in production
- All routes authenticated via Firebase unless explicitly marked public
- CASL policies must exist before exposing any write endpoint
- Never finish a task with type errors -- `pnpm type-check` must pass
- Never use `any`
- Commits must pass all tests locally and on CI before merging

## Per-area workflows

### Backend changes (`apps/backend/`)
Use `backend-feature` skill for every NestJS task.

### After any DTO/controller change -- API sync (never skip)
1. `pnpm --filter api export:openapi`
2. `npx @stoplight/spectral-cli lint contracts/api-service/openapi.yaml`
   -- stop on errors
3. `pnpm generate:api-client`
4. `pnpm type-check` -- stop on failures

### Entity changes (`*.entity.ts`) -- Migration guard
Both `up()` and `down()` required. Run `pnpm migration:run` before proceeding.

### Frontend changes (`apps/web/`)
Use generated API client -- never call API directly.

### Infrastructure changes
Always run `tofu plan` and review before `tofu apply`. Never apply to
production without explicit confirmation.

## Pre-planning guards
Before planning any feature:
1. Verify proposed design fits current architecture (coupling, testability)
2. Define failing tests (red) before implementation

## Feature implementation order
1. Backend domain logic (entities, services, business rules)
2. API layer (controllers, DTOs, validation, authorization)
3. API sync (export spec, lint, regenerate client, type-check)
4. Frontend feature (consume typed API client)
5. i18n check
6. E2E tests (critical paths)
7. Unit tests (complex logic)
8. Type-check across all packages
9. Security scan -- fix findings with CVSS > 6.9

## What NOT to do
- Never use raw SQL queries -- always use TypeORM QueryBuilder or repositories
- Never store files on the local filesystem -- use GCS signed URLs
- Never skip DTO validation -- every endpoint has an input DTO
  with class-validator

The template above reflects a typical S&P NestJS monorepo. Other stacks keep the same categories with different tools and paths. The key sections every CLAUDE.md should have:

Core philosophy: 1-3 sentences on the project's development approach and source of truth
Key invariants (Hard constraints AI must never violate. These are not suggestions) when AI output violates an invariant, reject the output rather than fixing it. The violation means the AI lacked context or the constraint is missing from the file
Per-area workflows: What skill or process to follow for each part of the codebase. This prevents AI from applying backend patterns to frontend code or skipping the API sync step after a DTO change
Pre-planning guards: What to verify before generating implementation code. Architecture fit and test-first planning catch problems when they are cheap to fix
Feature implementation order: The recommended sequence for building features. AI is better at each step when the previous steps are done, because each step adds context
What NOT to do: Project-specific anti-patterns. Explicit prohibitions are more effective than implicit conventions

Keep CLAUDE.md concise. It is not a README, it is a context file for AI tools. Every irrelevant line pushes useful context out of the window. Point to detailed docs (setup guides, command references) rather than inlining them.

Update CLAUDE.md when conventions change: in the same PR as the convention change. Stale AI rules are worse than no rules: they teach AI outdated patterns that pass review because reviewers trust the AI "knows the conventions." An outdated CLAUDE.md actively degrades AI output quality, just like outdated architecture diagrams mislead developers (see Architecture & System Design).

Skills --- extending AI capabilities

Skills are reusable instruction packages that teach AI a specific workflow. They are to AI what npm packages are to Node: someone else encoded their expertise so you don't start from scratch every time.

Built-in vs community vs custom:

Category	Source	Trust level	Maintenance
Built-in	AI tool vendor (Anthropic, Cursor)	High --- tested, maintained	Vendor handles updates
Community	Open-source repos, skill registries	Medium --- evaluate before adopting	External maintainer; may go stale
Custom	Your team, for your project	Highest value, highest cost	You maintain it

Evaluating community skills. Apply the same rigor as evaluating npm dependencies:

What does it actually do? Read the full skill definition, not just the description.
Does it match your project's conventions? A skill that generates React class components when your project uses function components is actively harmful.
Is it maintained? Check last update, open issues, responsiveness.
Does it conflict with other skills or with your CLAUDE.md? Conflicting instructions produce unpredictable output.

When to build a custom skill. When you catch yourself giving AI the same multi-step instructions more than three times. When a workflow is project-specific enough that no community skill covers it.

High-value custom skills for S&P projects:

backend-feature: NestJS module creation following S&P patterns (entity, service, controller, DTOs, CASL policy, tests)
frontend-feature: Next.js feature scaffolding following Feature Sliced Design conventions
openapi-sync: The full export-lint-generate-typecheck pipeline triggered after any DTO or controller change
migration-guard: Entity change detection, migration generation, up/down verification

Skills live in .claude/commands/ and are version-controlled with the project. Treat them like any other code artifact: review changes, keep them current, delete ones that are no longer useful.

AI workflow orchestration

An AI workflow is a defined sequence of AI-assisted steps for a repeatable task. Instead of asking AI to "build a feature" (too broad, inconsistent results), you break the work into focused steps where each step's output feeds the next step's context.

Pre-planning guards. Before AI generates implementation code, run guard steps that check prerequisites:

Architecture review: Does this feature belong where it's being placed? Does the proposed design fit the current module boundaries? Surface coupling issues and testability concerns before 500 lines of generated code need to be thrown away.
Test-first planning: Define failing tests before implementation. Every task in the plan should have a corresponding test expectation. This constrains the AI to generate code that satisfies specific assertions rather than plausible-looking code that satisfies nothing verifiable.

These guards catch problems when they are cheap to fix: before implementation, not after. They are not optional. A feature plan that skips architecture review or test-first planning should be explicitly flagged as incomplete.

Chaining skills in order. Skills can reference each other in sequence. The architecture review runs before test planning, which runs before the implementation skill. Each step adds context that makes the next step more accurate.

Invariant enforcement. Hard constraints from CLAUDE.md apply here too --- when AI output violates an invariant, reject it rather than patching it. The violation signals a context gap that needs to be fixed in the configuration, not in the generated code.

MCP servers for tool integration

The Model Context Protocol (MCP) allows Claude Code to interact with external tools. Jira, Confluence, Slack, databases, APIs. This turns AI from a code generator into a workflow assistant.

Practical uses at S&P:

Jira integration: Query issue details, read acceptance criteria, and use them as context when generating implementation code or test cases.
Confluence integration: Read project documentation, architecture decisions, and design specs directly, eliminating the copy-paste step.
Database access: Query schema information, inspect data patterns, and generate migrations based on the actual current state of the database.
Slack integration: Summarise channel discussions that are relevant to the current task.

MCP connections should respect the same access controls as the developer using them. If you wouldn't query production data directly, don't have your MCP server do it for you. The convenience of AI doesn't override environment isolation principles.

Hooks for automation

Claude Code hooks let you automate actions that should happen before or after AI operations: linting generated code, running type checks, formatting output, or appending standard headers. Configure hooks in .claude/settings.json:

Pre-commit hooks: Run ESLint and Prettier on AI-generated files before they're staged.
Post-generation hooks: Automatically run tsc --noEmit to catch type errors in generated code.
Custom validation: Run project-specific checks (e.g., verifying that generated DTOs include all required decorators).

Hooks are the safety net for AI output. They catch the mechanical errors (formatting, typing, linting) so you can focus your review on logic, architecture, and correctness.

AI-assisted code review

AI does not replace code reviewers. It makes them more effective by handling the tedious parts so the reviewer can focus on what matters: design, logic, and whether the change actually solves the problem.

What AI is good at in code review:

Spotting inconsistencies with project conventions (naming, error handling patterns, file structure)
Identifying missing error handling, validation, or null checks
Detecting potential security issues (see Security: input validation, auth checks, secret handling)
Flagging overly complex code and suggesting simplifications
Checking that tests cover the stated requirements

What AI is not good at in code review:

Evaluating whether the architectural approach is right for the problem
Understanding business context and client requirements
Judging whether a trade-off is acceptable given the project's timeline and constraints
Catching subtle race conditions or distributed system bugs that require understanding the full system

The recommended workflow:

Author runs AI review on their own PR before requesting human review. Fix the obvious issues AI catches.
Human reviewer focuses on design, logic, and business context, the things AI can't evaluate.
The result: faster reviews, fewer nitpick comments, more substantive feedback. This complements the code review process in Code Review.

AI for test generation

AI is one of the most effective tools for generating test scaffolding. It identifies scenarios humans overlook because they built the happy path first and stopped thinking.

The workflow:

Write the implementation first (or at least define the interface).
Ask AI to generate test cases: specify the testing framework (Jest), the patterns your project uses (see Testing Strategy), and include relevant fixtures or factories.
Review every assertion. AI generates structurally correct tests that sometimes assert the wrong behaviour. The test name says "should reject invalid email" but the assertion checks the wrong field or the wrong status code.
Add the edge cases AI missed: domain-specific business rules, client-specific validation logic, and integration scenarios that depend on state only you know about.

For Qase test case management, AI can generate structured manual test cases from Figma designs, Jira stories, and the codebase. The Testing Strategy AI tips section covers this in detail, including the prompt library approach.

Where AI-generated tests go wrong:

Testing implementation details instead of behaviour (mocking everything, asserting on internal state)
Generating tests that pass but don't actually verify anything meaningful (empty assertions, overly loose matchers)
Using test data that is technically valid but nonsensical (a user with email "test@test.test" and name "asdf")
Missing the business-critical edge cases because they weren't in the prompt context

AI for documentation

Documentation is the task most developers avoid and AI handles well enough to remove the excuse.

Good uses:

Generating JSDoc comments for public APIs and service methods from the implementation
Drafting README sections for new modules or packages
Converting inline comments into structured documentation
Generating OpenAPI descriptions from NestJS controller decorators
Creating onboarding guides from the codebase structure

The standard: AI-generated documentation is a first draft. Review it for accuracy, remove AI-typical filler ("This module provides a robust and scalable solution for..."), and ensure it reflects what the code actually does, not what the AI infers it should do.

Do not generate documentation for the sake of documentation. If a function's name and signature make its purpose obvious, a JSDoc comment adds noise, not clarity. Apply the same judgment you'd apply to hand-written docs.

AI for debugging and root cause analysis

When you're stuck on a bug, AI can accelerate the diagnostic process, but only if you give it structured context rather than "it doesn't work."

An effective debugging prompt includes:

The error message or unexpected behaviour (exact text, not paraphrased)
The relevant code (the function that fails, not the entire module)
What you've already tried and ruled out
The environment (Node version, database, relevant config)

"This NestJS integration test fails intermittently:

  Error: Connection terminated unexpectedly

It passes locally 9/10 times but fails in CI (CircleCI/Bitbucket Pipelines) roughly 50%
of the time. The test uses Testcontainers with PostgreSQL 16. I've already
verified the container starts before tests run and increased the connection
timeout to 30s. The pipeline uses a Docker-in-Docker service. What are the
likely root causes?"

AI is good at pattern-matching known issues (connection pool exhaustion, race conditions, Docker networking quirks). It is not good at diagnosing bugs that require understanding your application's specific state machine or business logic. If AI doesn't solve it in two rounds of conversation, switch to traditional debugging: logs, breakpoints, bisecting.

Evaluating AI output

This is the most important skill in AI-assisted engineering. Generating output is easy. Knowing whether the output is correct is the actual work.

Security review of AI-generated code

Never trust AI-generated code on security-sensitive paths without explicit review. AI models produce patterns they've seen in training data, which includes both secure and insecure code. Common security issues in AI output:

Using dangerouslySetInnerHTML or equivalent without sanitisation
Generating SQL with string interpolation instead of parameterised queries
Hardcoding credentials or using weak defaults in generated configuration
Missing authorization checks on generated endpoints
Using deprecated or vulnerable library versions
Generating CORS configurations that are too permissive

Cross-reference every AI-generated endpoint, auth flow, or data handling path against the Security checklist. AI does not understand your threat model.

Performance implications

AI generates code that works. It does not always generate code that performs well at scale.

Watch for:

N+1 queries in generated database access code (AI loves to write for loops with individual queries)
Missing database indexes for generated query patterns
Loading entire datasets into memory when pagination or streaming is appropriate
Synchronous operations where async processing is needed
Missing connection pool configuration in generated database setup

Review AI-generated data access patterns against expected load. A query that works fine with 100 rows can cripple a database with 100,000 rows.

Catching hallucinations

AI models generate confidently wrong output. This is not a bug, it is a fundamental property of how they work. Common hallucination patterns in engineering:

Invented APIs: AI suggests a method or function that doesn't exist in the library version you're using. Always verify against official documentation.
Outdated patterns: AI generates code using deprecated APIs or patterns from older framework versions. Check that the generated code matches your project's versions.
Fabricated configuration: AI invents configuration options that sound plausible but don't exist. Verify against the tool's actual docs.
Wrong version assumptions: AI assumes a different Node.js, NestJS, or React version than what your project uses, leading to subtle incompatibilities.

The verification habit: When AI generates code that calls an external API or uses a library feature, spend 30 seconds checking the official docs. This single habit catches most hallucinations before they become bugs.

When NOT to use AI

AI is not universally helpful. These are the situations where it actively hurts more than it helps:

Creative architecture decisions. When you're deciding whether to use a message queue vs. direct API calls, whether to split a service, or how to structure a new domain model, you need first-principles thinking, not pattern matching. AI will give you a plausible answer, and that's the danger. Use AI to explore options after you've formed your own opinion, not as a substitute for the thinking. Architecture decisions should follow the process in Architecture & System Design.

Sensitive data handling. Never paste production data, real credentials, PII, or client-confidential information into AI tools, including AI tools that claim to be private or local. Assume every AI interaction is logged. If the task requires real data, use anonymised or synthetic data. This aligns with the Security environment isolation principle.

Novel or highly domain-specific logic. AI is trained on common patterns. If your feature requires business logic that is specific to the client's domain and unlikely to appear in training data, AI will generate plausible-looking code that doesn't actually implement the correct behaviour. In these cases, write the logic yourself and use AI for the surrounding scaffolding.

Critical incident response. When production is down, don't debug through an AI chat. Follow the incident response process: containment, notification, investigation. AI can help with post-incident analysis once the immediate issue is resolved.

Code you don't understand. If AI generates code and you can't explain what every line does, don't commit it. Either learn what it does, or write it yourself. Shipping code nobody on the team understands is how you create debugging nightmares that no amount of AI will solve.

Cost awareness and token efficiency

AI tools cost money. Not in a "we should be careful" way: in a "this shows up on the monthly bill" way. Token usage scales with context size and response length.

Practical cost management:

Keep prompts focused. Dumping your entire codebase into context is expensive and produces worse output than curated context.
Use .claude/CLAUDE.md and project skills to avoid repeating boilerplate context in every prompt.
For repetitive tasks (generating test files for multiple modules), write a skill once rather than prompting the same pattern repeatedly.
Use token-efficient tools like RTK where available to reduce overhead on routine operations (git status, file reading, linting output).
Monitor team AI spend monthly. If one project's AI costs spike, investigate whether the team is using AI effectively or just burning context on unstructured conversations.

The value test: If an AI-assisted task takes more time (prompting + reviewing + fixing) than doing it manually, stop using AI for that task. AI should save time, not create busywork.

Privacy and data considerations

AI tools process your input on external servers. This has real implications for client projects.

Hard rules:

Never paste production data, real API keys, client credentials, or PII into any AI tool.
Never paste proprietary client business logic that is covered by NDA without confirming your agreement permits it.
Use .gitignore and .claudeignore to prevent AI tools from reading sensitive files (.env, credentials, certificates).
When using MCP integrations with databases, configure them to access development or staging environments only, never production.

Project-level configuration: At project kickoff, the tech lead should decide and document which AI tools are approved for the project and what data sensitivity constraints apply. Some clients have explicit AI usage policies in their contracts. When in doubt, ask the client. This decision should be captured as an ADR in the project repo (see Architecture & System Design).

Team adoption strategy

Adopting AI tools is not a switch you flip. It is a practice you build, and it works best when adoption is deliberate rather than chaotic.

The champions model:

Start with one or two AI champions per team. These are engineers who are already using AI tools effectively and can help others. They don't need to be senior: enthusiasm and willingness to share what works matters more than title.
Regular sharing sessions. A 15-minute bi-weekly slot where someone shares a real workflow improvement: "Here's how I used Claude Code to generate our migration tests." Concrete examples beat abstract evangelism.
Project-level skills and context. Champions create the initial CLAUDE.md, build the first custom skills, and configure MCP integrations. These artifacts lower the adoption barrier for everyone else.
Gradual expansion. Don't mandate AI tool usage. Let the results speak. When developers see their teammate completing a tedious task in minutes that used to take hours, adoption follows naturally.

What doesn't work:

Mandating that everyone use AI tools by a specific date. People learn at different rates.
Treating AI usage as a productivity metric. Measuring "prompts per day" incentivises quantity over quality.
Assuming one training session is sufficient. AI tool workflows evolve rapidly: ongoing sharing is more valuable than a one-time workshop.

Keeping tools updated

AI tools evolve faster than most software. Models improve, new features ship, capabilities expand. A prompt that didn't work three months ago might work now. A workflow that was optimal last quarter might be obsolete.

Practices:

Review AI tool changelogs quarterly. When capabilities change, update your CLAUDE.md files and custom skills to take advantage of new features.
When a new model version is available, test it on representative tasks from your project before switching the whole team.
Deprecate custom skills and workarounds that are no longer needed. AI tool configuration accumulates cruft just like code does.
Share discoveries. When someone finds that a new AI capability eliminates a manual workflow, post it in the team's Slack channel. This connects to S&P's Learning & Growth culture.

Cross-reference: AI tips in other sections

This section is the deep dive. Every other section in the playbook includes a short "AI tips" callout with contextual guidance for that specific practice. Here's where to find them:

Section	AI tips focus
Engineering Principles	Using AI for decision records, trade-off analysis
Code Review	AI-assisted review workflow, catching what AI misses
Testing Strategy	Test generation, Qase test case generation, edge case discovery
Developer Experience	Dev environment setup, tooling configuration
Code Standards	Linting, formatting, convention enforcement
Security	OWASP review, scan triage, security header generation
Architecture & System Design	C4 diagram generation, ADR drafting, capacity estimation

Each section's AI tips are specific to that practice. This section covers the general principles and deeper workflows that apply across all practices.

Critical thinking

AI fluency is not the same as AI accuracy. AI generates grammatically perfect, well-structured, confident output. This makes it harder to spot errors, not easier. A hand-typed typo in a SQL query is easy to see. An AI-generated query that joins the wrong tables but compiles and runs is hard to see. Fluent output demands more careful review, not less.
Prompt engineering has diminishing returns. There's a point where refining a prompt further takes longer than writing the code yourself. If your third attempt at a prompt still isn't producing what you need, the task probably requires human judgment that the AI doesn't have. Close the AI tool and write it yourself.
AI output reflects training data, not best practices. If 80% of code in the training data uses a certain pattern, AI will recommend it: even if that pattern is outdated, insecure, or inappropriate for your use case. AI is a mirror of collective practice, not an oracle of best practice. Your engineering judgment is the filter.
Don't build critical workflows around AI tool availability. AI tools have outages, rate limits, and pricing changes. Your development process should work without AI, just slower. If the team can't ship a feature because the AI tool is down, that's a dependency problem, not a productivity tool.
The "AI wrote it" excuse doesn't fly. If AI-generated code introduces a bug, the developer who committed it owns the bug. You reviewed it, you approved it, you committed it. The same code review standards from Code Review apply to every line regardless of who (or what) wrote it.
Adoption is not uniform, and that's fine. Some developers will use AI for everything. Some will use it sparingly. Some will prefer not to use it at all. Respect that. The goal is to make AI tools available and supported, not to enforce a specific level of usage. Merit is about the quality of the output, not the tools used to produce it.

Checklist

For every AI-assisted task

The prompt includes relevant context (project conventions, interfaces, constraints) rather than relying on the AI to guess
AI-generated code has been read line by line, not just skimmed
Generated code follows project conventions (file structure, naming, error handling, DTOs)
Security-sensitive code paths have been reviewed against the Security checklist
Generated database queries have been checked for N+1 patterns and missing indexes
External API calls and library methods in AI output have been verified against official documentation
Tests for AI-generated code include edge cases beyond what the AI suggested
No production data, real credentials, or client PII was included in the prompt

For project AI setup

CLAUDE.md file exists at the repository root with current project context
Custom skills are created for recurring project-specific tasks
.claudeignore is configured to exclude sensitive files
MCP integrations are configured for development/staging environments only
AI tool usage policy is documented for the project (which tools, what data constraints)
At least one team member is designated as the AI champion for the project

For team adoption

AI tools are available and configured for all team members who want to use them
Regular sharing sessions are scheduled (bi-weekly recommended)
AI cost is monitored monthly at the project level
CLAUDE.md and custom skills are reviewed and updated quarterly
New team members are introduced to the project's AI tooling setup during onboarding

AI tips

Since this section is the AI deep-dive, these tips focus on meta-level AI usage: using AI to improve how you use AI.

Refine your CLAUDE.md. Ask AI to review your project's CLAUDE.md and suggest improvements based on the conventions it observes in the codebase. AI is good at spotting inconsistencies between what the CLAUDE.md says and what the code actually does.
Generate custom skills from repetitive prompts. If you find yourself writing the same type of prompt repeatedly (e.g., "generate a NestJS module with these patterns"), ask AI to convert that prompt into a reusable custom skill with parameterised inputs.
Audit AI-generated code in bulk. When a sprint included significant AI-generated code, review the aggregate patterns: Did AI introduce any anti-patterns consistently? Are there recurring issues (missing error handling, inconsistent naming) that should be added to the CLAUDE.md as explicit rules?
Use AI to onboard new team members. Point new joiners at the CLAUDE.md and custom skills as part of their onboarding. Then have them ask AI to explain the project's architecture, conventions, and patterns. This surfaces gaps in your documentation that you can then fix.
Retrospect on AI effectiveness. During sprint retros, occasionally ask: "Where did AI save us time this sprint? Where did it waste time?" Track the patterns. Double down on what works; stop doing what doesn't.

Resources

S&P internal:

S&P Engineering Principles
S&P Code Review
S&P Testing Strategy: includes the AI prompt library for Qase test case generation
S&P Developer Experience & Setup
S&P Code Standards
S&P Security
S&P Architecture & System Design
S&P Learning & Development (Confluence)

Claude Code and MCP:

AI fluency:

Anthropic AI Fluency: Framework & Foundations. The 4D Framework (Delegation, Description, Discernment, Diligence)

Prompt engineering:

Community skills worth evaluating:

OpenSpec. Spec-driven development tooling
E2E Testing Patterns. End-to-end test generation
Improve Codebase Architecture. Architecture analysis and pre-planning guard
TDD skill. Test-driven development with AI
Karpathy Guidelines. Behavioral guidelines for AI coding assistants
Caveman. Token-optimized communication mode

AI security and evaluation:

Spec-driven development:

Spectral. OpenAPI linting for contract validation
Spectral API Rulesets. Pre-built API design rules

Industry references:

Why this matters​

The standard​

The human-in-the-loop principle​

The 4D Framework for AI Fluency​

Prompt engineering for engineering tasks​

Be specific about what you want​

Provide architectural context​

Spec-driven prompting​

Use iterative refinement, not one-shot generation​

Context window management​

Claude Code and CLI tooling​

CLAUDE.md: project context that persists​

Skills --- extending AI capabilities​

AI workflow orchestration​

MCP servers for tool integration​

Hooks for automation​

AI-assisted code review​

AI for test generation​

AI for documentation​

AI for debugging and root cause analysis​

Evaluating AI output​

Security review of AI-generated code​

Performance implications​

Catching hallucinations​

When NOT to use AI​

Cost awareness and token efficiency​

Privacy and data considerations​

Team adoption strategy​

Keeping tools updated​

Cross-reference: AI tips in other sections​

Critical thinking​

Checklist​

For every AI-assisted task​

For project AI setup​

For team adoption​

AI tips​

Resources​