SOC 2 is now table stakes for enterprise AI sales. The 2025–2026 audit cycle introduces AI-specific sub-criteria under CC6 and CC7: auditors check whether LLM inference endpoints are authenticated, whether model outputs are logged, and whether you have incident response procedures for prompt injection attacks. custodia scan . generates a SOC 2-mapped security report that documents code-level evidence for each criterion — formatted for auditor submission.
SOC 2 was designed for traditional SaaS — database access, API authentication, encryption. When your product processes user requests through an LLM, you introduce a new attack surface that the original trust criteria weren't written to cover.
AICPA (the body that governs SOC 2) has updated audit guidance to explicitly include AI system controls. Auditors with AI-company experience now ask: How do you prevent prompt injection? What happens to user data in the inference payload? Who can modify the system prompt? Is there a kill switch?
Without code-level answers to these questions, you fail the AI-specific sub-criteria even if your traditional SOC 2 controls are solid.
Focus: CC6, CC7, and Availability — the criteria where AI companies most often have gaps.
CC6.1Logical access security software, infrastructure, and architectures are implementedAI-specific auditor focus: Auditors check: is LLM inference gated by authentication? Are training data buckets access-controlled with least privilege? Is there role-based access to model configs?
// ❌ CC6.1 — LLM inference endpoint unprotected
// No auth middleware → anyone can call your model
export async function POST(req: Request) {
const { prompt } = await req.json();
const result = await openai.chat.completions.create({
messages: [{ role: 'user', content: prompt }],
model: 'gpt-4o',
});
return Response.json({ result });
}// ✅ CC6.1 — Auth gating on inference endpoint
import { auth } from '@clerk/nextjs/server';
export async function POST(req: Request) {
const { userId, orgId } = await auth();
if (!userId) return new Response('Unauthorized', { status: 401 });
// Log for audit trail (CC6.2)
await auditLog.write({
event: 'llm.inference',
userId, orgId,
timestamp: Date.now(),
});
const { prompt } = await req.json();
const result = await openai.chat.completions.create({
messages: [{ role: 'user', content: prompt }],
model: 'gpt-4o',
});
return Response.json({ result });
}CC6.2New internal and external users are registered and authorizedAI-specific auditor focus: All user provisioning goes through a defined process; access is removed on offboarding. For AI systems: who can add new models, fine-tune, or update system prompts?
CC6.8Measures against malware and unauthorized softwareAI-specific auditor focus: Your LLM inference pipeline must not execute arbitrary code from model outputs. Prompt injection leading to code execution is a CC6.8 finding.
CC7.1Vulnerabilities are identified and the risk is managedAI-specific auditor focus: Auditors will ask: when did you last run a security scan? Do you have a process for finding and remediating OWASP vulnerabilities? Can you show scan history?
CC7.2Security events are identified and respond toAI-specific auditor focus: You need monitoring on LLM inference: anomalous prompt lengths, high-frequency requests from single IPs, unusual output patterns indicating prompt injection. Rate limiting is evidence for this criterion.
// ❌ CC7.2 — No monitoring on inference
// Model DoS and injection attempts go undetected
export async function POST(req: Request) {
const { prompt } = await req.json();
// No rate limiting, no logging, no anomaly detection
const result = await model.infer(prompt);
return Response.json(result);
}// ✅ CC7.2 — Monitoring + rate limiting
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(20, '1 m'),
});
export async function POST(req: Request) {
const ip = req.headers.get('x-forwarded-for') ?? 'unknown';
const { success } = await ratelimit.limit(ip);
if (!success) {
securityLog.warn({ event: 'rate_limit.exceeded', ip });
return new Response('Too Many Requests', { status: 429 });
}
const { prompt } = await req.json();
securityLog.info({ event: 'llm.inference', ip, promptLen: prompt.length });
return Response.json(await model.infer(prompt));
}CC7.4Incidents are identified, managed, and documentedAI-specific auditor focus: An incident response plan is required documentary evidence. For AI: your plan must cover prompt injection attacks, model output failures, and data leakage via inference.
A1.1Capacity planning processes are in placeAI-specific auditor focus: For LLM-dependent systems: you must demonstrate that you have max_tokens limits, fallback behavior when the upstream model API is unavailable, and documented SLA targets.
// ❌ A1.1 — No fallback, no token cap
// Provider outage = your app returns 500 to all users
const result = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
// No max_tokens, no timeout, no fallback
});// ✅ A1.1 — Timeout + fallback + token cap
const result = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
max_tokens: 1000,
}).catch(async (err) => {
// Fallback to cheaper model on failure
logger.error({ event: 'llm.primary_failed', err });
return openai.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: prompt }],
max_tokens: 500,
});
});A1.2Environmental protections and monitoring proceduresAI-specific auditor focus: Availability monitoring: uptime checks on inference endpoints, alerting on error rate spikes, runbooks for LLM provider outages.
Auditors collect evidence through a list of requests (ERL — Evidence Request List). For AI companies, expect these to appear in your ERL:
CC6.1CC6.1CC7.2CC6.2CC7.1CC7.4A1.2CC6.7CC6.1CC9.2If you sell software to enterprise customers, you will be asked for SOC 2 Type II. Enterprise buyers increasingly require it before signing contracts. For AI companies specifically, SOC 2 is now often required alongside EU AI Act and NIST AI RMF documentation in vendor security questionnaires.
CC6 (Logical and Physical Access Controls) covers how your system restricts access to data and functionality. For AI systems, auditors specifically check: is LLM inference gated by authentication? Are training data buckets access-controlled with least privilege? Is there role-based access to model configs? Are system prompt changes logged?
SOC 2 Type I (point-in-time assessment) typically takes 1-2 months. SOC 2 Type II (audit over observation period) requires a minimum 6-month observation period plus 1-2 months for audit fieldwork. The observation period starts as soon as controls are in place — implement controls now to start your clock.
Authentication on all data endpoints (CC6), audit logging of access events (CC6, CC7), encryption of data in transit and at rest (CC6), monitoring and alerting on anomalies (CC7), incident response procedures (CC7), and availability metrics. For AI systems: logging of model inference requests, access controls on training pipelines, and data retention policies for model inputs/outputs.
Yes. Custodia maps security findings to SOC 2 trust criteria. The PDF report from custodia scan includes a SOC 2 mapping section showing which CC criteria have gaps in your code, suitable for sharing with auditors and for internal compliance tracking.
Custodia maps every finding to SOC 2, NIST AI RMF, and EU AI Act criteria. One scan. Auditor-ready PDF output.