Rate limiting is not just a DDoS control. It protects login flows from brute force, signup flows from abuse, AI endpoints from cost explosions, exports from queue starvation, and search from scraping. The right question is not “do we have rate limiting?” but “which actions deserve which policy?”
Startups often delay rate limiting because it feels like infrastructure polish. Then the first abuse event hits and everyone learns the hard way that auth endpoints, invite flows, LLM calls, and export jobs are economic attack surfaces, not just performance concerns.
The mistake is treating every request equally. Login attempts, checkout creation, vector search, AI summarization, and bulk export generation do not have the same risk profile. One limit everywhere either blocks real users or leaves the expensive paths wide open.
Why Rate Limiting Is a Product Control
For a startup, abusive traffic is rarely just an infrastructure problem. It hits the business model directly. A login brute-force attack turns into account takeover risk. A signup flood turns into spam accounts and reputation damage. An AI prompt flood turns into a surprise bill. An export storm turns into job-queue starvation for real customers.
That is why rate limiting belongs next to authorization and billing logic, not off in a forgotten network corner. The system needs to understand that “generate 100 exports” is a more sensitive action than “load dashboard chrome,” even if both technically count as HTTP requests.
The most effective startup posture is layered: per-IP throttles for broad abuse, per-user limits for authenticated actions, and stricter quotas for high-cost or high-risk operations.
The Abuse Patterns Startups Actually Face
Login and OTP brute force
Credential stuffing, password guessing, and OTP spraying all depend on unbounded auth attempts.
Signup and invite abuse
Open signup paths get farmed for free trials, spam, and fake workspace creation when there is no per-actor friction.
AI cost flooding
Inference endpoints can become a financial DoS vector long before they become a classic uptime problem.
Export and report queue starvation
A few abusive users can monopolize worker capacity if expensive jobs are not throttled separately from cheap reads.
Protect the Expensive and Sensitive Paths First
If you only rate-limit generic traffic, the attacker will simply pivot to the endpoints that cost you the most.
export async function POST(req: Request) {
const body = await req.json();
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-6',
messages: [{ role: 'user', content: body.prompt }],
max_tokens: 1200,
});
return Response.json(response);
}const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(5, '1 m'),
});
export async function POST(req: Request) {
const { userId } = await auth();
if (!userId) {
return Response.json({ error: 'Unauthorized' }, { status: 401 });
}
const { success } = await ratelimit.limit(`ai:${userId}`);
if (!success) {
return Response.json({ error: 'Too many requests' }, { status: 429 });
}
return Response.json(await runAiCall(req));
}An AI endpoint typically needs both a short-window request limit and a larger daily quota tied to account plan or workspace budget.
Use different keys and policies for auth, AI, exports, and search. If one limit governs everything, it governs nothing well.
An Endpoint-by-Endpoint Startup Policy Matrix
Login, magic link, OTP, password reset
CriticalStrict per-IP and per-identifier limits with clear lockout behavior. These are abuse magnets and should fail closed quickly.
Signup and invite acceptance
HighModerate short-window limits plus abuse monitoring to stop spam account creation without breaking real onboarding.
AI generation or analysis endpoints
HighLow request ceilings, plan-aware quotas, and explicit cost controls. This is where denial of wallet starts.
Exports and long-running jobs
HighLow concurrency and queue caps per user or workspace so one customer cannot monopolize worker capacity.
Search and list endpoints
MediumModerate rate limits plus pagination caps to reduce scraping, inference, and backend load.
Webhook receivers
MediumDo not rely on rate limiting alone. Pair it with signature verification and idempotency to prevent replay-driven abuse.