Exercise Auto-Canonicalization¶

Overview¶

When a therapist creates a new exercise, the system automatically checks for duplicates within the same tenant. If a match is found, the new exercise is merged into the existing canonical exercise -- all workout references, tags, favorites, templates, and personalized videos are transparently repointed.

The feature uses a two-stage pipeline: a fast deterministic exact-name match, followed by an AI-powered fuzzy match for near-duplicates (e.g., "Bicep Curl" vs "Bicep Curls", "DB Press" vs "Dumbbell Press"). It runs asynchronously via BullMQ and is gated by the exerciseAutoCanonicalization tenant feature flag.

Pipeline¶

Trigger¶

Exercise creation (via createExerciseShared in exerciseService.ts or the legacy POST /exercises route) checks the exerciseAutoCanonicalization feature flag. If enabled, it enqueues a canonicalization job. The job runs asynchronously -- exercise creation returns immediately to the user.

Stage A: Deterministic Exact Match¶

Queries for an existing exercise in the same tenant where: - nameNormalized matches exactly - isActive === true - Not already merged (canonicalExerciseId is null) - Visible to the creator (isPublic === true OR same createdById)

If found, merges immediately with confidence: 1.0 and matchType: "EXACT".

nameNormalized is computed as: lowercase, trim whitespace, collapse internal whitespace to single spaces. Stored on the Exercise model and indexed at (tenantId, nameNormalized, isActive).

Stage B: AI Fuzzy Match¶

Runs only when Stage A finds no exact match.

Lexical prefilter: 1. Split the exercise name into words, keep words longer than 2 characters 2. Use the longest word as a case-insensitive contains filter 3. Query up to 5 candidate exercises with the same tenant/visibility constraints

If no candidates pass the prefilter, the pipeline writes an audit record with matchType: "AI_SKIP" and exits.

AI call: - Model: gpt-4.1-mini-2025-04-14 - API: OpenAI Responses API with json_schema structured output (strict: true) - Prompt: Given the new exercise name and up to 5 candidate names, decide whether to MERGE or SKIP, specify which candidate, and provide a confidence score (0-1)

Response schema:

{
  "decision": "MERGE" | "SKIP",
  "targetExerciseId": number | null,
  "confidence": number,
  "reason": string
}

Confidence threshold: 0.93. A merge is executed only when: 1. decision === "MERGE" 2. confidence >= 0.93 3. targetExerciseId exists in the candidate list (guards against hallucinated IDs) 4. The target exercise is still active

Below-threshold or SKIP decisions are recorded as matchType: "AI_SKIP".

Merge Transaction¶

When a match is confirmed, executeMerge() runs everything in a single Prisma $transaction:

1. UserFavoriteExercise¶

If the same user already has a favorite for the target, delete the source favorite. Otherwise, repoint to the target.

2. ExerciseTag¶

If the same tag already exists on the target, delete the source tag. Otherwise, repoint to the target.

3. ExerciseInWorkout¶

Only incomplete workouts are repointed (where isWorkoutCompleted === false). Completed workouts retain their original exercise reference for historical accuracy.

4. TemplateExercise¶

Always repointed unconditionally (templates are forward-looking).

5. Message¶

Always repointed unconditionally.

6. PersonalizedExerciseVideo¶

Active videos: If the target already has an active video for the same patient, the source video is deactivated (INACTIVE). Otherwise, the source video is repointed and remains active.
Inactive videos: Simply repointed.

7. Copy Missing Metadata¶

If the target is missing any of these fields but the source has them, copy from source: description, videoUrl, photoUrl, cloudflareStreamId, kinescopeVideoId, uploadedVideoKey, uploadedPhotoKey.

8. Retire Source¶

The source exercise is marked isActive: false, canonicalExerciseId set to the target's ID, and mergedAt set to the current timestamp.

9. Audit Log¶

An ExerciseMergeAudit record is written with status: "MERGED".

Database Models¶

Exercise (canonicalization fields)¶

model Exercise {
  // ... existing fields
  nameNormalized      String?
  canonicalExerciseId Int?
  canonicalExercise   Exercise?  @relation("CanonicalExercise", ...)
  mergedInto          Exercise[] @relation("CanonicalExercise")
  mergedAt            DateTime?

  @@index([tenantId, nameNormalized, isActive])
}

The self-referential canonicalExerciseId creates a tree: merged exercises point to their canonical target, and a canonical exercise can see all exercises merged into it via mergedInto.

ExerciseMergeAudit¶

model ExerciseMergeAudit {
  id               Int      @id @default(autoincrement())
  sourceExerciseId Int
  targetExerciseId Int?
  tenantId         Int
  matchType        String   // "EXACT", "AI_MERGE", "AI_SKIP", "AI_ERROR"
  confidence       Float?
  reason           String?
  status           String   // "MERGED", "SKIPPED", "FAILED"
  createdAt        DateTime @default(now())

  @@index([tenantId, createdAt, status])
}

matchType values: - EXACT -- Stage A deterministic match - AI_MERGE -- Stage B AI match above threshold - AI_SKIP -- Stage B: AI said skip, below threshold, or no candidates found - AI_ERROR -- Stage B: AI call threw an exception

BullMQ Queue¶

Queue name: exercise-canonicalization
Job ID: canonicalize-{exerciseId} (prevents duplicates)
Retry: 3 attempts with exponential backoff (10s, 20s, 40s)
Retention: 100 completed, 200 failed jobs
Worker concurrency: 1 (in src/worker.ts)
Dev fallback: When REDIS_URL is not set, runs synchronously in-process

Key Files¶

File	Role
`OktaPT-API/src/services/canonicalizationService.ts`	Full pipeline: Stage A + Stage B + merge transaction
`OktaPT-API/src/queue/canonicalizationQueue.ts`	BullMQ queue definition and enqueue helper
`OktaPT-API/src/services/exerciseService.ts`	`createExerciseShared()` -- triggers canonicalization after exercise creation
`OktaPT-API/src/worker.ts`	Background worker process
`OktaPT-API/src/services/__tests__/exerciseScopeService.test.ts`	Tests for exercise scope (related)