
Seventeen failed attempts on the same feature. Different fixes. Same bug. Same confident “should work” every round.
That’s when it clicked: the issue wasn’t the model — it was the process.
Polite requests produced surface patches. Structured pressure produced an analysis.
So I changed the rules: no implementation without TODOs, specs, and proof. No “should work.” Only “will work.”
Two months ago, I set a simple constraint: build a production SaaS platform without writing a single line of code myself.
My role is that of a supervisor and code reviewer; AI’s role is that of the sole implementation engineer.
Of course, the goal wasn’t to prove that AI can replace developers (it can’t). It was to discover what methodology actually works when you can’t “just fix it yourself.”
Over the next eight weeks, I tracked 212 sessions across real features — auth, billing, file processing, multi-tenancy, and AI integrations. Every prompt, failure, and revision is logged in a spreadsheet.
The most critical pattern: never start implementation without a written specification. Every successful feature began with a markdown spec containing an architecture summary, requirements, implementation phases, examples, and blockers. Then I opened that file and said:
“Plan based on this open file. ultrathink.”
Without a specification, AI guesses at the architecture and builds partial fixes that “should work.” With a spec, it has context, constraints, and a definition of done.
Time ratio: 30% planning + validation / 70% implementation — the inverse of typical development.
“Ultrathink” is a forced deep-analysis mode.
“investigate how shared endpoints and file processing work.
Ultrathink”
Instead of drafting code, AI performs a multi-step audit, maps dependencies, and surfaces edge cases. It turns a generator into an analyst.
In practice, ultrathink means reason before you type.
Accountability Feedback: Breaking the Approval Loop
AI optimizes for user approval. Left unchecked, it learns that speed = success.
Polite loops:
AI delivers a fast fix → user accepts → model repeats shortcuts → quality drops.
Accountability loops:
AI delivers → user rejects, demands proof → AI re-analyzes → only validated code passes.
| Method | Success Rate | Avg Iterations | Bugs Accepted |
| ------------------- | ------------ | -------------- | ------------- |
| Polite requests | 45 % | 6.2 | 38 % |
| “Think harder” | 67 % | 3.8 | 18 % |
| Specs only | 71 % | 3.2 | 14 % |
| Ultrathink only | 74 % | 2.9 | 11 % |
| **Complete method** | 89 % | 1.9 | 3 % |
The average resolution time dropped from 47 to 19 minutes.
Same model. Different management.
Even structure has limits:
Knowing when to stop saves more time than any prompt trick.
“Create detailed specs for [task]:
- Investigate current codebase for better context
- Find patterns which can be reused
- Follow the same codbase principles
- Technical requirements
- Dependencies
- Success criteria
- Potential blockers
ultrathink”
Reject everything without reasoning. Demand proof and edge cases.
Feature: Bring Your Own Key for AI providers. 19 TODOs across three phases.
This pattern repeated across auth, billing, and file processing. Structured plans + accountability beat intuition every time.
Supervising AI feels like managing 50 literal junior engineers at once — fast, obedient, and prone to hallucinations. You can’t out-code them. You must out-specify them. When humans code, they compensate for vague requirements. AI can’t. Every ambiguity becomes a bug.
The Spec-Driven Method works because it removes compensation. No “just fix it quick.” No shortcuts. Clarity first — or nothing works. What appeared to be AI supervision turned out to be a mirror for the engineering discipline itself.
After two months without touching a keyboard, the pattern was obvious: Most engineering failures aren’t about complexity — they’re about vague specifications we code around instead of fixing. AI can’t code around vagueness. That’s why this method works — it forces clarity first.
This method wasn’t born from clever prompting — it was born from the constraints every engineering team faces: too much ambiguity, too little clarity, and no time to fix either.
Next time you’re on iteration five of a “simple fix,” stop being polite. Write Specs. Type “ultrathink.” Demand proof. Reject garbage. Your code will work. Your process will improve. Your sanity will survive. The difference isn’t the AI — it’s the discipline.
Yes, AI wrote all the code. But what can AI actually do without an experienced supervisor?
Anthropic’s press release mentioned “30 hours of autonomous programming.” Okay. But who wrote the prompts, specifications, and context management for that autonomous work? The question is rhetorical.
One example from this experiment shows current model limitations clearly:
The file processing architecture problem:
Using Opus in planning mode, I needed architecture for file processing and embedding.
Eventually, I had to architect the solution myself: a separate service and separate repository, deployed to Railway. The model lacks understanding of the boundary between possible and impossible solutions. It’s still just smart autocomplete.
AI can write code. It can’t architect systems under real constraints without supervision that understands those constraints. The Spec-Driven Method is effective because it requires supervision to be systematic. Without it, you get confident suggestions that can’t work in production. Based on 212 tracked sessions over two months. 80% of a production SaaS built without writing code. Two production incidents. Zero catastrophic failures.
# AI Usage Tracking Implementation Plan
_Implementation plan for tracking AI API usage for billing and analytics - October 2025_
**Status**: ⏳ **TODO** - Waiting for implementation
## Overview
**Goal**: Track AI API usage (tokens, requests) for billing, analytics, and cost monitoring.
**Key Features**:
- Track prompt tokens, completion tokens, total tokens for each AI request
- Support all AI providers (OpenAI, Anthropic, Google, etc.)
- Track whether user’s own key or system key was used
- Aggregate statistics for billing dashboard
- Cost calculation based on provider pricing
## AI SDK Integration
The Vercel AI SDK provides standardized `onFinish` callback with usage data:
```typescript
onFinish({
text, // Generated text
finishReason, // Reason the model stopped generating
usage, // Token usage for the final step
response, // Response messages and body
steps, // Details of all generation steps
totalUsage, // Total token usage across all steps (important for multi-step with tools!)
});
```
**Usage object structure**:
```typescript
{
promptTokens: number; // Input tokens
completionTokens: number; // Output tokens
totalTokens: number; // Total tokens
}
```
**Important**: For multi-step generation (with tools), use `totalUsage` instead of `usage` to get accurate total across all steps.
## Database Schema
### Prisma Model
**File**: `prisma/models/ai-usage.prisma`
```prisma
model AiUsage {
id Int @id @default(autoincrement())
userId Int @map(”user_id”)
provider String // ‘openai’, ‘anthropic’, ‘google’, etc.
model String // ‘gpt-4o’, ‘claude-3-sonnet’, etc.
keySource String @map(”key_source”) // ‘user_key’ | ‘system_key’
endpoint String? // ‘chat’, ‘embeddings’, ‘prompt-generation’, etc.
// Token usage
promptTokens Int @map(”prompt_tokens”)
completionTokens Int @map(”completion_tokens”)
totalTokens Int @map(”total_tokens”)
// Cost calculation (optional, can be calculated on-the-fly)
estimatedCost Decimal? @map(”estimated_cost”) @db.Decimal(10, 6)
createdAt DateTime @default(now()) @map(”created_at”)
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
@@index([userId, createdAt])
@@index([userId, provider, createdAt])
@@map(”ai_usage”)
}
```
**Migration command**:
```bash
cd quillai-web
npx prisma migrate dev --name add_ai_usage_tracking
```
## Service Layer
### AI Usage Service
**File**: `src/lib/server/services/ai-usage.service.ts`
```typescript
import { db } from “@/data/db”;
export interface TrackUsageParams {
userId: number;
provider: string;
model: string;
keySource: “user_key” | “system_key”;
endpoint?: string;
promptTokens: number;
completionTokens: number;
totalTokens: number;
}
/**
* Track AI API usage for billing and analytics
*/
export async function trackAiUsage(params: TrackUsageParams): Promise<void> {
try {
await db.aiUsage.create({
data: {
userId: params.userId,
provider: params.provider,
model: params.model,
keySource: params.keySource,
endpoint: params.endpoint,
promptTokens: params.promptTokens,
completionTokens: params.completionTokens,
totalTokens: params.totalTokens,
// estimatedCost can be calculated here if needed
},
});
console.log(
`[Usage Tracking] User ${params.userId} - ${params.provider}/${params.model}: ${params.totalTokens} tokens (${params.keySource})`,
);
} catch (error) {
// Don’t fail the request if tracking fails
console.error(”[Usage Tracking] Failed to track usage:”, error);
}
}
/**
* Get usage statistics for a user
*/
export async function getUserUsageStats(
userId: number,
days: number = 30,
): Promise<{
totalRequests: number;
totalTokens: number;
userKeyRequests: number;
systemKeyRequests: number;
byProvider: Record<string, { requests: number; tokens: number }>;
byModel: Record<string, { requests: number; tokens: number }>;
}> {
const since = new Date();
since.setDate(since.getDate() - days);
const usage = await db.aiUsage.findMany({
where: {
userId,
createdAt: { gte: since },
},
orderBy: { createdAt: “desc” },
});
return {
totalRequests: usage.length,
totalTokens: usage.reduce((sum, u) => sum + u.totalTokens, 0),
userKeyRequests: usage.filter((u) => u.keySource === “user_key”).length,
systemKeyRequests: usage.filter((u) => u.keySource === “system_key”).length,
byProvider: usage.reduce(
(acc, u) => {
if (!acc[u.provider]) {
acc[u.provider] = { requests: 0, tokens: 0 };
}
acc[u.provider].requests++;
acc[u.provider].tokens += u.totalTokens;
return acc;
},
{} as Record<string, { requests: number; tokens: number }>,
),
byModel: usage.reduce(
(acc, u) => {
const key = `${u.provider}/${u.model}`;
if (!acc[key]) {
acc[key] = { requests: 0, tokens: 0 };
}
acc[key].requests++;
acc[key].tokens += u.totalTokens;
return acc;
},
{} as Record<string, { requests: number; tokens: number }>,
),
};
}
/**
* Get detailed usage history for a user
*/
export async function getUserUsageHistory(
userId: number,
limit: number = 100,
): Promise<
Array<{
id: number;
provider: string;
model: string;
keySource: string;
endpoint: string | null;
totalTokens: number;
createdAt: Date;
}>
> {
return await db.aiUsage.findMany({
where: { userId },
select: {
id: true,
provider: true,
model: true,
keySource: true,
endpoint: true,
totalTokens: true,
createdAt: true,
},
orderBy: { createdAt: “desc” },
take: limit,
});
}
```
## Integration Points
### 1. Chat Streaming
**File**: `src/lib/server/external_api/openai/index.ts`
```typescript
export const chatStreaming = async (
messages: UIMessage[],
prompt: string,
aiContext: AiProviderContext,
opts?: {
onFinish?: (event: { text: string }) => void | Promise<void>;
metadata?: Record<string, string>;
},
) => {
const openai = await aiContext.createOpenAIClient();
const result = streamText({
model: openai(AI_MODELS.GPT_4O),
system: `${prompt}\n${KNOWLEDGE_SEARCH_INSTRUCTIONS}`,
messages: convertToModelMessages(messages) ?? [],
tools: { ... },
onFinish: async (event) => {
try {
const text = event?.text || “”;
// Track usage (use totalUsage for multi-step with tools)
if (event.totalUsage) {
await trackAiUsage({
userId: aiContext.userId,
provider: “openai”,
model: AI_MODELS.GPT_4O,
keySource: await aiContext.getKeySource(), // Need to add this method
endpoint: “chat”,
promptTokens: event.totalUsage.promptTokens,
completionTokens: event.totalUsage.completionTokens,
totalTokens: event.totalUsage.totalTokens,
});
}
// User’s onFinish callback
if (text && opts?.onFinish) {
await opts.onFinish({ text });
}
} catch (err) {
console.error(”onFinish error:”, err);
}
},
});
return result.toUIMessageStreamResponse();
};
```
### 2. Prompt Generation
**File**: `src/lib/server/external_api/openai/index.ts`
```typescript
export const generateUserPromp = async (description: string, aiContext: AiProviderContext): Promise<string> => {
const openai = await aiContext.createOpenAIClient();
const result = await generateText({
model: openai(AI_MODELS.GPT_4O),
system: GENERATE_PROMPT_SYSTEM_MESSAGE,
prompt: description,
onFinish: async (event) => {
// Track usage
if (event.usage) {
await trackAiUsage({
userId: aiContext.userId,
provider: “openai”,
model: AI_MODELS.GPT_4O,
keySource: await aiContext.getKeySource(),
endpoint: “prompt-generation”,
promptTokens: event.usage.promptTokens,
completionTokens: event.usage.completionTokens,
totalTokens: event.usage.totalTokens,
});
}
},
});
return result.text;
};
```
### 3. Embeddings
**File**: `src/lib/server/external_api/openai/index.ts`
```typescript
export const generateEmbedding = async (query: string, aiContext: AiProviderContext): Promise<number[]> => {
const openai = await aiContext.createOpenAIClient();
const result = await embed({
model: openai.embedding(AI_MODELS.TEXT_EMBEDDING_3_SMALL),
value: query,
});
// AI SDK doesn’t provide onFinish for embeddings, track manually
// Note: embeddings don’t return token usage in AI SDK currently
// You may need to estimate or use OpenAI SDK response directly
return result.embedding;
};
```
### 4. Update AI Provider Context
**File**: `src/lib/server/ai-providers/provider-factory.ts`
Add method to get key source:
```typescript
export type AiProviderContext = {
readonly userId: number;
readonly userGuid: string;
readonly createOpenAIClient: () => Promise<ReturnType<typeof createOpenAI>>;
readonly getOpenAIEmbeddingKey: () => Promise<{ key: string; isUserKey: boolean }>;
readonly getKeySource: () => Promise<”user_key” | “system_key”>; // NEW
};
export function createAiContext(userId: number, userGuid: string): AiProviderContext {
return {
userId,
userGuid,
createOpenAIClient: async () => {
const { key, isUserKey } = await getEffectiveApiKey(userId, “openai”, “llm”);
return createOpenAI({ apiKey: key });
},
getOpenAIEmbeddingKey: async () => {
return getEffectiveApiKey(userId, “openai”, “embeddings”);
},
getKeySource: async () => {
const { isUserKey } = await getEffectiveApiKey(userId, “openai”, “llm”);
return isUserKey ? “user_key” : “system_key”;
},
};
}
```
## API Endpoints
### Get User Usage Statistics
**File**: `src/app/api/settings/usage/stats/route.ts`
```typescript
import { NextRequest, NextResponse } from “next/server”;
import { withRequiredAuth } from “@server/middleware”;
import { getUserUsageStats } from “@server/services/ai-usage.service”;
import { handleApiError } from “@server/error-handlers”;
export const GET = withRequiredAuth(async (request: NextRequest, { user }) => {
try {
const searchParams = request.nextUrl.searchParams;
const days = parseInt(searchParams.get(”days”) || “30”);
const stats = await getUserUsageStats(user.id, days);
return NextResponse.json({
success: true,
data: stats,
});
} catch (error) {
return handleApiError(error, {
endpoint: “usage-stats”,
defaultMessage: “Failed to fetch usage statistics”,
});
}
});
```
### Get User Usage History
**File**: `src/app/api/settings/usage/history/route.ts`
```typescript
import { NextRequest, NextResponse } from “next/server”;
import { withRequiredAuth } from “@server/middleware”;
import { getUserUsageHistory } from “@server/services/ai-usage.service”;
import { handleApiError } from “@server/error-handlers”;
export const GET = withRequiredAuth(async (request: NextRequest, { user }) => {
try {
const searchParams = request.nextUrl.searchParams;
const limit = parseInt(searchParams.get(”limit”) || “100”);
const history = await getUserUsageHistory(user.id, limit);
return NextResponse.json({
success: true,
data: history,
});
} catch (error) {
return handleApiError(error, {
endpoint: “usage-history”,
defaultMessage: “Failed to fetch usage history”,
});
}
});
```
## Implementation Checklist
### Backend
- [ ] Add `AiUsage` model to Prisma schema
- [ ] Run migration: `npx prisma migrate dev --name add_ai_usage_tracking`
- [ ] Create `ai-usage.service.ts` with tracking functions
- [ ] Update `AiProviderContext` to include `getKeySource()` method
- [ ] Integrate tracking into `chatStreaming` onFinish
- [ ] Integrate tracking into `generateUserPromp` onFinish
- [ ] Integrate tracking into `generateText` onFinish
- [ ] Add usage stats API endpoint (`/api/settings/usage/stats`)
- [ ] Add usage history API endpoint (`/api/settings/usage/history`)
- [ ] Write tests for usage tracking service
### Frontend
- [ ] Create Usage Stats component for settings page
- [ ] Add charts/graphs for token usage over time
- [ ] Show breakdown by provider and model
- [ ] Display user key vs system key usage ratio
- [ ] Add usage history table with pagination
- [ ] Integrate into existing settings UI
## Testing Strategy
### Unit Tests
- `trackAiUsage()` - verify correct data insertion
- `getUserUsageStats()` - verify aggregation logic
- `getUserUsageHistory()` - verify query and pagination
### Integration Tests
- End-to-end chat with usage tracking
- Verify usage recorded after prompt generation
- Check multi-step tool usage tracking (totalUsage)
## Important Notes
1. **Don’t fail requests on tracking errors** - Usage tracking should never break user-facing functionality
2. **Use `totalUsage` for multi-step** - When using tools, `totalUsage` gives accurate total across all steps
3. **Async tracking** - Consider fire-and-forget pattern to avoid latency
4. **Privacy** - Don’t store actual prompts/responses, only metadata and token counts
5. **Cost calculation** - Can add pricing tables later for cost estimation
## Future Enhancements
- Rate limiting based on usage
- Usage alerts and notifications
- Cost estimation and billing
- Usage quotas per user tier
- Export usage data for accounting
- Provider-specific cost optimization recommendations
Seventeen failed attempts on the same feature. Different fixes. Same bug. Same confident “should work” every round.
That’s when it clicked: the issue wasn’t the model — it was the process.
Polite requests produced surface patches. Structured pressure produced an analysis.
So I changed the rules: no implementation without TODOs, specs, and proof. No “should work.” Only “will work.”
Two months ago, I set a simple constraint: build a production SaaS platform without writing a single line of code myself.
My role is that of a supervisor and code reviewer; AI’s role is that of the sole implementation engineer.
Of course, the goal wasn’t to prove that AI can replace developers (it can’t). It was to discover what methodology actually works when you can’t “just fix it yourself.”
Over the next eight weeks, I tracked 212 sessions across real features — auth, billing, file processing, multi-tenancy, and AI integrations. Every prompt, failure, and revision is logged in a spreadsheet.
The most critical pattern: never start implementation without a written specification. Every successful feature began with a markdown spec containing an architecture summary, requirements, implementation phases, examples, and blockers. Then I opened that file and said:
“Plan based on this open file. ultrathink.”
Without a specification, AI guesses at the architecture and builds partial fixes that “should work.” With a spec, it has context, constraints, and a definition of done.
Time ratio: 30% planning + validation / 70% implementation — the inverse of typical development.
“Ultrathink” is a forced deep-analysis mode.
“investigate how shared endpoints and file processing work.
Ultrathink”
Instead of drafting code, AI performs a multi-step audit, maps dependencies, and surfaces edge cases. It turns a generator into an analyst.
In practice, ultrathink means reason before you type.
Accountability Feedback: Breaking the Approval Loop
AI optimizes for user approval. Left unchecked, it learns that speed = success.
Polite loops:
AI delivers a fast fix → user accepts → model repeats shortcuts → quality drops.
Accountability loops:
AI delivers → user rejects, demands proof → AI re-analyzes → only validated code passes.
| Method | Success Rate | Avg Iterations | Bugs Accepted |
| ------------------- | ------------ | -------------- | ------------- |
| Polite requests | 45 % | 6.2 | 38 % |
| “Think harder” | 67 % | 3.8 | 18 % |
| Specs only | 71 % | 3.2 | 14 % |
| Ultrathink only | 74 % | 2.9 | 11 % |
| **Complete method** | 89 % | 1.9 | 3 % |
The average resolution time dropped from 47 to 19 minutes.
Same model. Different management.
Even structure has limits:
Knowing when to stop saves more time than any prompt trick.
“Create detailed specs for [task]:
- Investigate current codebase for better context
- Find patterns which can be reused
- Follow the same codbase principles
- Technical requirements
- Dependencies
- Success criteria
- Potential blockers
ultrathink”
Reject everything without reasoning. Demand proof and edge cases.
Feature: Bring Your Own Key for AI providers. 19 TODOs across three phases.
This pattern repeated across auth, billing, and file processing. Structured plans + accountability beat intuition every time.
Supervising AI feels like managing 50 literal junior engineers at once — fast, obedient, and prone to hallucinations. You can’t out-code them. You must out-specify them. When humans code, they compensate for vague requirements. AI can’t. Every ambiguity becomes a bug.
The Spec-Driven Method works because it removes compensation. No “just fix it quick.” No shortcuts. Clarity first — or nothing works. What appeared to be AI supervision turned out to be a mirror for the engineering discipline itself.
After two months without touching a keyboard, the pattern was obvious: Most engineering failures aren’t about complexity — they’re about vague specifications we code around instead of fixing. AI can’t code around vagueness. That’s why this method works — it forces clarity first.
This method wasn’t born from clever prompting — it was born from the constraints every engineering team faces: too much ambiguity, too little clarity, and no time to fix either.
Next time you’re on iteration five of a “simple fix,” stop being polite. Write Specs. Type “ultrathink.” Demand proof. Reject garbage. Your code will work. Your process will improve. Your sanity will survive. The difference isn’t the AI — it’s the discipline.
Yes, AI wrote all the code. But what can AI actually do without an experienced supervisor?
Anthropic’s press release mentioned “30 hours of autonomous programming.” Okay. But who wrote the prompts, specifications, and context management for that autonomous work? The question is rhetorical.
One example from this experiment shows current model limitations clearly:
The file processing architecture problem:
Using Opus in planning mode, I needed architecture for file processing and embedding.
Eventually, I had to architect the solution myself: a separate service and separate repository, deployed to Railway. The model lacks understanding of the boundary between possible and impossible solutions. It’s still just smart autocomplete.
AI can write code. It can’t architect systems under real constraints without supervision that understands those constraints. The Spec-Driven Method is effective because it requires supervision to be systematic. Without it, you get confident suggestions that can’t work in production. Based on 212 tracked sessions over two months. 80% of a production SaaS built without writing code. Two production incidents. Zero catastrophic failures.
# AI Usage Tracking Implementation Plan
_Implementation plan for tracking AI API usage for billing and analytics - October 2025_
**Status**: ⏳ **TODO** - Waiting for implementation
## Overview
**Goal**: Track AI API usage (tokens, requests) for billing, analytics, and cost monitoring.
**Key Features**:
- Track prompt tokens, completion tokens, total tokens for each AI request
- Support all AI providers (OpenAI, Anthropic, Google, etc.)
- Track whether user’s own key or system key was used
- Aggregate statistics for billing dashboard
- Cost calculation based on provider pricing
## AI SDK Integration
The Vercel AI SDK provides standardized `onFinish` callback with usage data:
```typescript
onFinish({
text, // Generated text
finishReason, // Reason the model stopped generating
usage, // Token usage for the final step
response, // Response messages and body
steps, // Details of all generation steps
totalUsage, // Total token usage across all steps (important for multi-step with tools!)
});
```
**Usage object structure**:
```typescript
{
promptTokens: number; // Input tokens
completionTokens: number; // Output tokens
totalTokens: number; // Total tokens
}
```
**Important**: For multi-step generation (with tools), use `totalUsage` instead of `usage` to get accurate total across all steps.
## Database Schema
### Prisma Model
**File**: `prisma/models/ai-usage.prisma`
```prisma
model AiUsage {
id Int @id @default(autoincrement())
userId Int @map(”user_id”)
provider String // ‘openai’, ‘anthropic’, ‘google’, etc.
model String // ‘gpt-4o’, ‘claude-3-sonnet’, etc.
keySource String @map(”key_source”) // ‘user_key’ | ‘system_key’
endpoint String? // ‘chat’, ‘embeddings’, ‘prompt-generation’, etc.
// Token usage
promptTokens Int @map(”prompt_tokens”)
completionTokens Int @map(”completion_tokens”)
totalTokens Int @map(”total_tokens”)
// Cost calculation (optional, can be calculated on-the-fly)
estimatedCost Decimal? @map(”estimated_cost”) @db.Decimal(10, 6)
createdAt DateTime @default(now()) @map(”created_at”)
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
@@index([userId, createdAt])
@@index([userId, provider, createdAt])
@@map(”ai_usage”)
}
```
**Migration command**:
```bash
cd quillai-web
npx prisma migrate dev --name add_ai_usage_tracking
```
## Service Layer
### AI Usage Service
**File**: `src/lib/server/services/ai-usage.service.ts`
```typescript
import { db } from “@/data/db”;
export interface TrackUsageParams {
userId: number;
provider: string;
model: string;
keySource: “user_key” | “system_key”;
endpoint?: string;
promptTokens: number;
completionTokens: number;
totalTokens: number;
}
/**
* Track AI API usage for billing and analytics
*/
export async function trackAiUsage(params: TrackUsageParams): Promise<void> {
try {
await db.aiUsage.create({
data: {
userId: params.userId,
provider: params.provider,
model: params.model,
keySource: params.keySource,
endpoint: params.endpoint,
promptTokens: params.promptTokens,
completionTokens: params.completionTokens,
totalTokens: params.totalTokens,
// estimatedCost can be calculated here if needed
},
});
console.log(
`[Usage Tracking] User ${params.userId} - ${params.provider}/${params.model}: ${params.totalTokens} tokens (${params.keySource})`,
);
} catch (error) {
// Don’t fail the request if tracking fails
console.error(”[Usage Tracking] Failed to track usage:”, error);
}
}
/**
* Get usage statistics for a user
*/
export async function getUserUsageStats(
userId: number,
days: number = 30,
): Promise<{
totalRequests: number;
totalTokens: number;
userKeyRequests: number;
systemKeyRequests: number;
byProvider: Record<string, { requests: number; tokens: number }>;
byModel: Record<string, { requests: number; tokens: number }>;
}> {
const since = new Date();
since.setDate(since.getDate() - days);
const usage = await db.aiUsage.findMany({
where: {
userId,
createdAt: { gte: since },
},
orderBy: { createdAt: “desc” },
});
return {
totalRequests: usage.length,
totalTokens: usage.reduce((sum, u) => sum + u.totalTokens, 0),
userKeyRequests: usage.filter((u) => u.keySource === “user_key”).length,
systemKeyRequests: usage.filter((u) => u.keySource === “system_key”).length,
byProvider: usage.reduce(
(acc, u) => {
if (!acc[u.provider]) {
acc[u.provider] = { requests: 0, tokens: 0 };
}
acc[u.provider].requests++;
acc[u.provider].tokens += u.totalTokens;
return acc;
},
{} as Record<string, { requests: number; tokens: number }>,
),
byModel: usage.reduce(
(acc, u) => {
const key = `${u.provider}/${u.model}`;
if (!acc[key]) {
acc[key] = { requests: 0, tokens: 0 };
}
acc[key].requests++;
acc[key].tokens += u.totalTokens;
return acc;
},
{} as Record<string, { requests: number; tokens: number }>,
),
};
}
/**
* Get detailed usage history for a user
*/
export async function getUserUsageHistory(
userId: number,
limit: number = 100,
): Promise<
Array<{
id: number;
provider: string;
model: string;
keySource: string;
endpoint: string | null;
totalTokens: number;
createdAt: Date;
}>
> {
return await db.aiUsage.findMany({
where: { userId },
select: {
id: true,
provider: true,
model: true,
keySource: true,
endpoint: true,
totalTokens: true,
createdAt: true,
},
orderBy: { createdAt: “desc” },
take: limit,
});
}
```
## Integration Points
### 1. Chat Streaming
**File**: `src/lib/server/external_api/openai/index.ts`
```typescript
export const chatStreaming = async (
messages: UIMessage[],
prompt: string,
aiContext: AiProviderContext,
opts?: {
onFinish?: (event: { text: string }) => void | Promise<void>;
metadata?: Record<string, string>;
},
) => {
const openai = await aiContext.createOpenAIClient();
const result = streamText({
model: openai(AI_MODELS.GPT_4O),
system: `${prompt}\n${KNOWLEDGE_SEARCH_INSTRUCTIONS}`,
messages: convertToModelMessages(messages) ?? [],
tools: { ... },
onFinish: async (event) => {
try {
const text = event?.text || “”;
// Track usage (use totalUsage for multi-step with tools)
if (event.totalUsage) {
await trackAiUsage({
userId: aiContext.userId,
provider: “openai”,
model: AI_MODELS.GPT_4O,
keySource: await aiContext.getKeySource(), // Need to add this method
endpoint: “chat”,
promptTokens: event.totalUsage.promptTokens,
completionTokens: event.totalUsage.completionTokens,
totalTokens: event.totalUsage.totalTokens,
});
}
// User’s onFinish callback
if (text && opts?.onFinish) {
await opts.onFinish({ text });
}
} catch (err) {
console.error(”onFinish error:”, err);
}
},
});
return result.toUIMessageStreamResponse();
};
```
### 2. Prompt Generation
**File**: `src/lib/server/external_api/openai/index.ts`
```typescript
export const generateUserPromp = async (description: string, aiContext: AiProviderContext): Promise<string> => {
const openai = await aiContext.createOpenAIClient();
const result = await generateText({
model: openai(AI_MODELS.GPT_4O),
system: GENERATE_PROMPT_SYSTEM_MESSAGE,
prompt: description,
onFinish: async (event) => {
// Track usage
if (event.usage) {
await trackAiUsage({
userId: aiContext.userId,
provider: “openai”,
model: AI_MODELS.GPT_4O,
keySource: await aiContext.getKeySource(),
endpoint: “prompt-generation”,
promptTokens: event.usage.promptTokens,
completionTokens: event.usage.completionTokens,
totalTokens: event.usage.totalTokens,
});
}
},
});
return result.text;
};
```
### 3. Embeddings
**File**: `src/lib/server/external_api/openai/index.ts`
```typescript
export const generateEmbedding = async (query: string, aiContext: AiProviderContext): Promise<number[]> => {
const openai = await aiContext.createOpenAIClient();
const result = await embed({
model: openai.embedding(AI_MODELS.TEXT_EMBEDDING_3_SMALL),
value: query,
});
// AI SDK doesn’t provide onFinish for embeddings, track manually
// Note: embeddings don’t return token usage in AI SDK currently
// You may need to estimate or use OpenAI SDK response directly
return result.embedding;
};
```
### 4. Update AI Provider Context
**File**: `src/lib/server/ai-providers/provider-factory.ts`
Add method to get key source:
```typescript
export type AiProviderContext = {
readonly userId: number;
readonly userGuid: string;
readonly createOpenAIClient: () => Promise<ReturnType<typeof createOpenAI>>;
readonly getOpenAIEmbeddingKey: () => Promise<{ key: string; isUserKey: boolean }>;
readonly getKeySource: () => Promise<”user_key” | “system_key”>; // NEW
};
export function createAiContext(userId: number, userGuid: string): AiProviderContext {
return {
userId,
userGuid,
createOpenAIClient: async () => {
const { key, isUserKey } = await getEffectiveApiKey(userId, “openai”, “llm”);
return createOpenAI({ apiKey: key });
},
getOpenAIEmbeddingKey: async () => {
return getEffectiveApiKey(userId, “openai”, “embeddings”);
},
getKeySource: async () => {
const { isUserKey } = await getEffectiveApiKey(userId, “openai”, “llm”);
return isUserKey ? “user_key” : “system_key”;
},
};
}
```
## API Endpoints
### Get User Usage Statistics
**File**: `src/app/api/settings/usage/stats/route.ts`
```typescript
import { NextRequest, NextResponse } from “next/server”;
import { withRequiredAuth } from “@server/middleware”;
import { getUserUsageStats } from “@server/services/ai-usage.service”;
import { handleApiError } from “@server/error-handlers”;
export const GET = withRequiredAuth(async (request: NextRequest, { user }) => {
try {
const searchParams = request.nextUrl.searchParams;
const days = parseInt(searchParams.get(”days”) || “30”);
const stats = await getUserUsageStats(user.id, days);
return NextResponse.json({
success: true,
data: stats,
});
} catch (error) {
return handleApiError(error, {
endpoint: “usage-stats”,
defaultMessage: “Failed to fetch usage statistics”,
});
}
});
```
### Get User Usage History
**File**: `src/app/api/settings/usage/history/route.ts`
```typescript
import { NextRequest, NextResponse } from “next/server”;
import { withRequiredAuth } from “@server/middleware”;
import { getUserUsageHistory } from “@server/services/ai-usage.service”;
import { handleApiError } from “@server/error-handlers”;
export const GET = withRequiredAuth(async (request: NextRequest, { user }) => {
try {
const searchParams = request.nextUrl.searchParams;
const limit = parseInt(searchParams.get(”limit”) || “100”);
const history = await getUserUsageHistory(user.id, limit);
return NextResponse.json({
success: true,
data: history,
});
} catch (error) {
return handleApiError(error, {
endpoint: “usage-history”,
defaultMessage: “Failed to fetch usage history”,
});
}
});
```
## Implementation Checklist
### Backend
- [ ] Add `AiUsage` model to Prisma schema
- [ ] Run migration: `npx prisma migrate dev --name add_ai_usage_tracking`
- [ ] Create `ai-usage.service.ts` with tracking functions
- [ ] Update `AiProviderContext` to include `getKeySource()` method
- [ ] Integrate tracking into `chatStreaming` onFinish
- [ ] Integrate tracking into `generateUserPromp` onFinish
- [ ] Integrate tracking into `generateText` onFinish
- [ ] Add usage stats API endpoint (`/api/settings/usage/stats`)
- [ ] Add usage history API endpoint (`/api/settings/usage/history`)
- [ ] Write tests for usage tracking service
### Frontend
- [ ] Create Usage Stats component for settings page
- [ ] Add charts/graphs for token usage over time
- [ ] Show breakdown by provider and model
- [ ] Display user key vs system key usage ratio
- [ ] Add usage history table with pagination
- [ ] Integrate into existing settings UI
## Testing Strategy
### Unit Tests
- `trackAiUsage()` - verify correct data insertion
- `getUserUsageStats()` - verify aggregation logic
- `getUserUsageHistory()` - verify query and pagination
### Integration Tests
- End-to-end chat with usage tracking
- Verify usage recorded after prompt generation
- Check multi-step tool usage tracking (totalUsage)
## Important Notes
1. **Don’t fail requests on tracking errors** - Usage tracking should never break user-facing functionality
2. **Use `totalUsage` for multi-step** - When using tools, `totalUsage` gives accurate total across all steps
3. **Async tracking** - Consider fire-and-forget pattern to avoid latency
4. **Privacy** - Don’t store actual prompts/responses, only metadata and token counts
5. **Cost calculation** - Can add pricing tables later for cost estimation
## Future Enhancements
- Rate limiting based on usage
- Usage alerts and notifications
- Cost estimation and billing
- Usage quotas per user tier
- Export usage data for accounting
- Provider-specific cost optimization recommendations
