What Are These Templates?
These five artifacts operationalize the five foundational AI skills taught in the AI-ESG curriculum. They are working documents designed for teams building and deploying AI systems—not compliance theater.
Use them to: Justify deployment decisions, design prompts, architect workflows, evaluate outputs, and document ethical guardrails.
1
Strategic Brief
AI Strategy
2
Prompting Toolkit
Prompting
3
Workflow Blueprint
Workflow Integration
4
Evaluation Rubric
Critical Evaluation
5
Ethics Memo
Ethics & Trust
Justifies an AI deployment to business and risk stakeholders. Defines opportunity, risk tolerance, control architecture, financial model, and governance checkpoints.
Use When: Proposing a new AI capability (e.g., "Deploy AI ticket classifier"), seeking budget approval, or needing sign-off from risk/compliance teams.
Contents:
- Executive summary (GO / conditional GO / NO GO recommendation)
- Business problem, AI capability, deployment context
- Risk threshold definition (critical/major/minor/negligible consequences)
- Control architecture (pre-action controls, circuit breakers, audit trails)
- Financial model (cost, savings, ROI, risk-adjusted scenarios)
- Governance checkpoints (pilot phases, quarterly reviews, annual renewal)
- Sign-off matrix (business sponsor, risk/compliance, tech lead)
Foundational Skills:
AI Strategy
Critical Evaluation
Ethics & Trust
Example Use Case: Support team proposes AI-powered ticket classification to reduce routing time. Brief documents: (1) 4 FTEs saved = $24K/month savings, (2) hallucination risk managed via confidence threshold + human review, (3) circuit breaker: pause if weekly error rate >2%, (4) needs sign-off from CFO (ROI), Compliance (liability), and VP Engineering (feasibility).
Designs an AI-native workflow where AI is embedded in the core decision path (not "off to the side"), with human checkpoints, stop cards, and audit trails baked in from the start.
Use When: Architecting a new AI-assisted process, defining handoff rules between AI and humans, designing pilot & rollout phases, or documenting SLAs and escalation paths.
Contents:
- Workflow diagram (visual flow of AI decisions, human checkpoints, escalations)
- Inputs & outputs (what data flows in/out, source, format, frequency)
- AI task definition (what the AI does, latency SLA, cost, what it does NOT do)
- Decision logic & stop cards (confidence-based routing, circuit breaker conditions)
- Handoff points (AI → human review, human → AI learning loop, escalation path)
- Audit trail specification (JSON schema for every decision, retention policy)
- KPI monitoring (daily dashboard, weekly spot-check, monthly review)
- Pilot & validation phases (week 1–4 rollout strategy with go/no-go criteria)
Foundational Skills:
Workflow Integration
Critical Evaluation
AI Strategy
Example Use Case: Ticket classification workflow: (1) AI classifies with confidence score, (2) IF confidence >0.85 → auto-route; (3) IF 0.65–0.85 → flag for review (<4 hr SLA); (4) IF <0.65 OR category="Safety/Fraud" → escalate immediately. Audit trail logs every decision. Weekly: 5% spot-check of auto-routed. Monthly: recalibrate thresholds. Pilot: Week 1 at 5% traffic, then expand if accuracy >92%.
Provides explicit, repeatable criteria for assessing AI output quality. Scales from 10 to 1000s of evaluations while maintaining consistency.
Use When: Establishing baseline accuracy, spot-checking outputs, conducting blind evaluations, feeding results into model retraining, or proving system safety to regulators.
Contents:
- Evaluation dimensions (correctness, confidence calibration, reasoning quality, boundary handling)
- Rubric template with example (classification tasks, text generation, data analysis)
- Evaluation tracking sheet (test case log, weekly summary, root cause analysis)
- Blind evaluation protocol (3-rater inter-rater agreement >80%)
- Automation options (human + sampling, AI-assisted evaluation, automated test suite)
- Continuous improvement loop (monthly aggregation, prompt retraining, deployment)
Foundational Skills:
Critical Evaluation
Prompting
Workflow Integration
Example Use Case: Testing ticket classifier. Rubric scores 4 dimensions: correctness (0/1), confidence calibration (0–1), reasoning quality (0–1), ambiguity handling (0/1). Week 1: Evaluate all 100 tickets/day (establish baseline: 80% accuracy). Week 2+: Sample 10% daily (10 tickets). If accuracy drops <92%, increase sampling to 25% and investigate drift. Monthly: Aggregate, identify top errors (e.g., "confuses Account Access with Technical Support"), retrain prompt, re-evaluate.
Documents ethical risks, hard guardrails, and trust metrics. Ethics is treated as system design, not compliance theater. Guardrails are engineered to survive model upgrades and organizational pressure.
Use When: Designing a new system, preparing for regulatory audit, documenting risk mitigation, or proving to customers that your system is trustworthy.
Contents:
- System overview (name, owner, model(s), deployment date, review schedule)
- Ethical risks (bias, hallucination, over-reliance, false negatives, data leakage)
- Risk mitigation for each (testing, monitoring, escalation)
- 5 hard guardrails (category boundary, safety escalation, confidence threshold, audit trail, prompt versioning)
- 5 trust metrics (accuracy by segment, safety detection rate, false escalation rate, human override rate, harm tracking)
- Decision records (why we made this choice, alternatives considered, conditions for change)
- Governance checklist (monthly/quarterly/annual reviews)
Foundational Skills:
Ethics & Trust
Critical Evaluation
AI Strategy
Example Use Case: Ticket classifier raises 5 ethical risks: (1) Bias in routing (premium vs. free customers), (2) hallucination (assigns fake category), (3) over-reliance (agents rubber-stamp high-confidence), (4) missed safety cases (fraud underdetected), (5) data leakage (PII in logs). Guardrails: (1) hard-coded allowed categories, (2) keyword scan for safety + auto-escalate, (3) low-confidence triggers human review, (4) audit trail logs all decisions, (5) prompt versioned in git. Trust metrics: accuracy by segment (±2%), safety detection rate (≥95%), override rate (3–5%), harm count (0).
How These Templates Work Together
| Phase |
Question |
Use This Artifact |
Output |
| Planning |
Should we deploy this AI? |
Strategic AI Brief |
GO/NO GO decision, sign-off from stakeholders |
| Design |
How do we prompt the model? |
Prompting Toolkit |
Production prompt library v1.0, test results |
| Architecture |
How does AI fit into our process? |
Workflow Blueprint |
Workflow diagram, decision logic, SLAs, audit trail spec |
| Validation |
Is the output actually good? |
Output Evaluation Rubric |
Baseline accuracy, weekly monitoring, improvement recommendations |
| Governance |
Is it trustworthy & safe? |
Ethics Impact Memo |
Risk register, guardrails, trust metrics, quarterly reviews |
Recommended Reading Order
1️⃣ READ: Strategic AI Brief
└─ Understand your deployment goal, risk tolerance, stakeholder needs
2️⃣ READ: Ethics Impact Memo
└─ Name the risks and design guardrails BEFORE you build anything
3️⃣ BUILD: Prompting Toolkit
└─ Design your prompt, test it, establish baseline accuracy
4️⃣ BUILD: Workflow Blueprint
└─ Define decision logic, human checkpoints, audit trail, monitoring
5️⃣ VALIDATE: Output Evaluation Rubric
└─ Measure quality continuously, feed results into prompt retraining
🔄 REPEAT: Monthly prompt refresh + quarterly governance review
Why this order? Strategy first (clarify what you're building). Ethics first (design for safety, not as an afterthought). Then design → build → validate → iterate.
Linked to AI-ESG Curriculum Modules
| Module |
Sci-Fi Metaphor |
Recommended Template |
| Module 1: The 201 Gap |
The Teleporter Problem / Jagged Frontier |
Strategic AI Brief (define frontier & risk threshold) |
| Module 2: Framing the Relationship |
HAL 9000 / JARVIS |
Prompting Toolkit (design transparent AI instructions) |
| Module 3: Unmasking the Liability Sponge |
The Red Shirt / Tricorder |
Workflow Blueprint (define human accountability, evidence collection) |
| Module 4: You Are The Liability Sponge |
The Asimov Constraint |
Output Evaluation Rubric (measure circuit-breaker effectiveness) |
| Module 5: Escaping the Liability Sponge |
The Lucas Cycle / Seil |
Prompting Toolkit (hard-code wisdom, version control) |
| Module 6: The Refusal Stack |
The Refusal Stack |
Ethics Impact Memo (defense-in-depth guardrails) |
| Module 7: The Upside |
The Mentat |
Workflow Blueprint (human-AI partnership design) |
AI-ESG Integrated Strategist Curriculum
Artifact Templates Index | Foundational Skills Framework
← Back to Rosetta Stone |
Strategic Brief |
Prompting |
Workflow |
Evaluation |
Ethics