Housekeeping Tasks
Scheduled maintenance and cleanup tasks for the platform.
Database Backfill Tasks
1. Scrub CDN Host from Asset URLs
Priority: Medium Status: Pending
Asset URLs in the database (e.g., flat_file.url) currently store full URLs with hardcoded CDN hosts like cdn-dev.emprops.ai or cdn.emprops.ai. This causes issues when:
- Switching environments (dev/staging/prod)
- Changing CDN providers
- Using environment-specific CDN URLs via
NEXT_PUBLIC_CDN_URL
Current State:
https://cdn-dev.emprops.ai/generations/uuid/image.pngTarget State:
/generations/uuid/image.pngImplementation:
- Create migration script to strip host from all
flat_file.urlentries - Update API endpoints to construct full URLs using
CDN_URLenv var - Update frontend to handle both relative paths and full URLs during transition
- Run backfill on staging, verify, then production
Affected Tables:
flat_file.url- Any other tables storing CDN asset URLs
Files with hardcoded CDN domains to update after backfill:
apps/emprops-studio/next.config.jsapps/emprops-studio/utils/index.tsapps/emprops-studio/lib/image-loader.jsapps/emprops-studio/lib/imgproxy.jsapps/emprops-studio/components/ImgproxyImage/index.tsxapps/emprops-studio/pages/api/flat-files/index.ts
Infrastructure Tasks
2. Implement Redis Memory Offload with Schedule
Priority: High Status: Pending
Implement scheduled offloading of Redis data to reduce memory pressure and costs.
Problem:
- Redis accumulates job history, completion records, and attestation data
- Memory usage grows unbounded over time
- No automatic cleanup of old data
Solution: Implement a scheduled job that:
- Archives completed job data older than N days to PostgreSQL or cold storage
- Removes stale worker heartbeats and machine registrations
- Cleans up orphaned job metadata
- Compresses or removes old event stream entries
Implementation Options:
Option A: Cron-based offload script
# Run daily at 2am
0 2 * * * node /path/to/redis-offload.jsOption B: Built-in Redis key expiration
- Set TTL on job completion records
- Set TTL on worker heartbeats
- Use Redis Streams with MAXLEN for event streams
Option C: Hybrid approach
- Use TTL for ephemeral data (heartbeats, short-term status)
- Use scheduled offload for archival data (job history, attestations)
Data to offload:
worker:completion:*- Job completion attestations (archive to DB)worker:failure:*- Job failure records (archive to DB)workflow:failure:*- Workflow failure records (archive to DB)- Old job metadata from completed jobs
- Stale machine registrations
Retention Policy (suggested):
| Data Type | Hot (Redis) | Archive (DB) |
|---|---|---|
| Active jobs | Indefinite | N/A |
| Completed jobs | 7 days | 90 days |
| Failed jobs | 14 days | 180 days |
| Worker heartbeats | 5 minutes TTL | N/A |
| Machine registrations | 1 hour TTL | N/A |
3. Stale Worktree Cleanup
Priority: Low (run monthly) Status: Script available
Git worktrees accumulate node_modules, .next, dist, and build artifacts that can consume 50-100GB+ over time.
Script Location: scripts/cleanup-stale-worktrees.sh
Usage:
# Preview what would be cleaned (dry run)
./scripts/cleanup-stale-worktrees.sh --dry-run
# Actually clean stale worktrees (>30 days old)
./scripts/cleanup-stale-worktrees.sh
# Custom threshold (e.g., 14 days)
./scripts/cleanup-stale-worktrees.sh --days 14What it does:
- Identifies worktrees with no commits in N days (default: 30)
- Removes:
node_modules/,.next/,dist/,.turbo/,target/(Rust) - Preserves all git history - just run
pnpm installto restore
What it skips:
- Current active worktree (user-decks)
- Recently active worktrees
Completed Tasks
[DONE] Stale Worktree Cleanup - 2026-01-07
Initial cleanup freed ~85GB:
- emerge-turbo-worktrees: 107GB → 34GB
- vibe-factory-worktrees: 12GB → 229MB
- Removed 4 broken empty worktrees
Move tasks here when completed with date and any notes.
