Plain-English steps to handle maintenance items and red/yellow alerts.
Maintenance overview
What it means: These are scheduled tasks that prevent surprises (patches, backups, failover drills).
Status colors: Green = not due soon. Yellow = due soon (inside warning window). Red = overdue.
Due math: next due = last done + cadence. “Due in” shows days remaining (negative = overdue).
If you see RED: Open the matching runbook below and do the steps. Update the collector config on TrueNAS at /mnt/Piscine/infra/mission-control-collector/repo/config/maintenance.json when done (this file is not in the Mission Control UI repo).
Advanced notes:
Cadence and warn windows are defined per item in /mnt/Piscine/infra/mission-control-collector/repo/config/maintenance.json (TrueNAS).
Status is worst-case across items. Runbook links are anchored to the IDs in that file.
What to do when something is RED
Find the item in the dashboard (Maintenance table) and click its Runbook link.
Follow the step-by-step list below for that item.
Verify success (checks included per item).
Update /mnt/Piscine/infra/mission-control-collector/repo/config/maintenance.json with the new last_done_at, result, and notes, then rerun the collector.
Backup & Restore SOP
Choose the right restore method (Website backups vs Server backups vs Snapshots) + copyable Copilot prompts.
Website backups live on TrueNAS at /mnt/Piscine/Home/1. Local Reach Web Design/Automated Backups/website-backups/<domain>/<YYYY-MM-DD> (HST date).
Weekly schedule: Sundays 03:30 UTC via /mnt/Piscine/infra/backup-websites/run-weekly.sh.
Confirm success: UI shows a fresh “Generated” timestamp and each site is pass or needs_review with counts.
If something is flagged: Take a snapshot/backup first, then do targeted investigation (identify the file, timeframe, and whether it matches a known maintenance window).
Quarterly restore test (snapshot/backup)
What this is: Restore a backup/snapshot to staging and ensure it works.
Why it matters: Proves backups are usable; avoids surprises during real incidents.
When it’s due: Every 90 days (warn inside 7 days).
Checklist:
Pick the latest HST-dated website backup for a key site (e.g., LRWD or traveltechus).
Restore to a staging environment or new VM.
Update hosts/DB config as needed for staging.
Load the site and run a basic sanity check (homepage, login if applicable).
Record time-to-restore and any issues.
Confirm success: Restored site loads in staging, data intact, and you documented restore time.
Troubleshooting:
Backup won’t import: check archive integrity, try a different snapshot.
Permissions/config issues: fix file ownership, update env/DB strings for staging.
MCI site collaboration
Purpose: Karen works locally with Codex and feature branches; Nick deploys when ready.