Skip to content

Exercise Auto-Canonicalization

Overview

When a therapist creates a new exercise, the system automatically checks for duplicates within the same tenant. If a match is found, the new exercise is merged into the existing canonical exercise -- all workout references, tags, favorites, templates, and personalized videos are transparently repointed.

The feature uses a two-stage pipeline: a fast deterministic exact-name match, followed by an AI-powered fuzzy match for near-duplicates (e.g., "Bicep Curl" vs "Bicep Curls", "DB Press" vs "Dumbbell Press"). It runs asynchronously via BullMQ and is gated by the exerciseAutoCanonicalization tenant feature flag.

Pipeline

Trigger

Exercise creation (via createExerciseShared in exerciseService.ts or the legacy POST /exercises route) checks the exerciseAutoCanonicalization feature flag. If enabled, it enqueues a canonicalization job. The job runs asynchronously -- exercise creation returns immediately to the user.

Stage A: Deterministic Exact Match

Queries for an existing exercise in the same tenant where: - nameNormalized matches exactly - isActive === true - Not already merged (canonicalExerciseId is null) - Visible to the creator (isPublic === true OR same createdById)

If found, merges immediately with confidence: 1.0 and matchType: "EXACT".

nameNormalized is computed as: lowercase, trim whitespace, collapse internal whitespace to single spaces. Stored on the Exercise model and indexed at (tenantId, nameNormalized, isActive).

Stage B: AI Fuzzy Match

Runs only when Stage A finds no exact match.

Lexical prefilter: 1. Split the exercise name into words, keep words longer than 2 characters 2. Use the longest word as a case-insensitive contains filter 3. Query up to 5 candidate exercises with the same tenant/visibility constraints

If no candidates pass the prefilter, the pipeline writes an audit record with matchType: "AI_SKIP" and exits.

AI call: - Model: gpt-4.1-mini-2025-04-14 - API: OpenAI Responses API with json_schema structured output (strict: true) - Prompt: Given the new exercise name and up to 5 candidate names, decide whether to MERGE or SKIP, specify which candidate, and provide a confidence score (0-1)

Response schema:

{
  "decision": "MERGE" | "SKIP",
  "targetExerciseId": number | null,
  "confidence": number,
  "reason": string
}

Confidence threshold: 0.93. A merge is executed only when: 1. decision === "MERGE" 2. confidence >= 0.93 3. targetExerciseId exists in the candidate list (guards against hallucinated IDs) 4. The target exercise is still active

Below-threshold or SKIP decisions are recorded as matchType: "AI_SKIP".

Merge Transaction

When a match is confirmed, executeMerge() runs everything in a single Prisma $transaction:

1. UserFavoriteExercise

If the same user already has a favorite for the target, delete the source favorite. Otherwise, repoint to the target.

2. ExerciseTag

If the same tag already exists on the target, delete the source tag. Otherwise, repoint to the target.

3. ExerciseInWorkout

Only incomplete workouts are repointed (where isWorkoutCompleted === false). Completed workouts retain their original exercise reference for historical accuracy.

4. TemplateExercise

Always repointed unconditionally (templates are forward-looking).

5. Message

Always repointed unconditionally.

6. PersonalizedExerciseVideo

  • Active videos: If the target already has an active video for the same patient, the source video is deactivated (INACTIVE). Otherwise, the source video is repointed and remains active.
  • Inactive videos: Simply repointed.

7. Copy Missing Metadata

If the target is missing any of these fields but the source has them, copy from source: description, videoUrl, photoUrl, cloudflareStreamId, kinescopeVideoId, uploadedVideoKey, uploadedPhotoKey.

8. Retire Source

The source exercise is marked isActive: false, canonicalExerciseId set to the target's ID, and mergedAt set to the current timestamp.

9. Audit Log

An ExerciseMergeAudit record is written with status: "MERGED".

Database Models

Exercise (canonicalization fields)

model Exercise {
  // ... existing fields
  nameNormalized      String?
  canonicalExerciseId Int?
  canonicalExercise   Exercise?  @relation("CanonicalExercise", ...)
  mergedInto          Exercise[] @relation("CanonicalExercise")
  mergedAt            DateTime?

  @@index([tenantId, nameNormalized, isActive])
}

The self-referential canonicalExerciseId creates a tree: merged exercises point to their canonical target, and a canonical exercise can see all exercises merged into it via mergedInto.

ExerciseMergeAudit

model ExerciseMergeAudit {
  id               Int      @id @default(autoincrement())
  sourceExerciseId Int
  targetExerciseId Int?
  tenantId         Int
  matchType        String   // "EXACT", "AI_MERGE", "AI_SKIP", "AI_ERROR"
  confidence       Float?
  reason           String?
  status           String   // "MERGED", "SKIPPED", "FAILED"
  createdAt        DateTime @default(now())

  @@index([tenantId, createdAt, status])
}

matchType values: - EXACT -- Stage A deterministic match - AI_MERGE -- Stage B AI match above threshold - AI_SKIP -- Stage B: AI said skip, below threshold, or no candidates found - AI_ERROR -- Stage B: AI call threw an exception

BullMQ Queue

  • Queue name: exercise-canonicalization
  • Job ID: canonicalize-{exerciseId} (prevents duplicates)
  • Retry: 3 attempts with exponential backoff (10s, 20s, 40s)
  • Retention: 100 completed, 200 failed jobs
  • Worker concurrency: 1 (in src/worker.ts)
  • Dev fallback: When REDIS_URL is not set, runs synchronously in-process

Key Files

File Role
OktaPT-API/src/services/canonicalizationService.ts Full pipeline: Stage A + Stage B + merge transaction
OktaPT-API/src/queue/canonicalizationQueue.ts BullMQ queue definition and enqueue helper
OktaPT-API/src/services/exerciseService.ts createExerciseShared() -- triggers canonicalization after exercise creation
OktaPT-API/src/worker.ts Background worker process
OktaPT-API/src/services/__tests__/exerciseScopeService.test.ts Tests for exercise scope (related)