Mission Control Runbooks

Maintenance overview

What it means: These are scheduled tasks that prevent surprises (patches, backups, failover drills).
Status colors: Green = not due soon. Yellow = due soon (inside warning window). Red = overdue.
Due math: next due = last done + cadence. “Due in” shows days remaining (negative = overdue).
If you see RED: Open the matching runbook below and do the steps. Update the collector config on mc-ops-01-warp:/opt/localreach/mission-control/collector/config/maintenance.json when done (this file is not in the Mission Control UI repo).

Advanced notes:

Cadence and warn windows are defined per item in /opt/localreach/mission-control/collector/config/maintenance.json on mc-ops-01, reached from the Mac as mc-ops-01-warp.
Status is worst-case across items. Runbook links are anchored to the IDs in that file.

What to do when something is RED

Find the item in the dashboard (Maintenance table) and click its Runbook link.
Follow the step-by-step list below for that item.
Verify success (checks included per item).
Update mc-ops-01-warp:/opt/localreach/mission-control/collector/config/maintenance.json with the new last_done_at, result, and notes, then rerun the collector.

Backup & Restore SOP

Choose the right restore method (Website backups vs Server backups vs Snapshots) + copyable Copilot prompts.

Website backups are encrypted with restic and stored in Backblaze B2 bucket lrwd-mission-control-backups under mission-control/website-backups.
Runner path: mc-ops-01:/opt/localreach/mission-control/backup-websites/run-weekly.sh. The runner creates temporary HST-dated packages, uploads them to B2/restic, then removes the local tarballs from OVH.
Manual run commands live on Website Backups, including all-sites, single-site, and emergency restore shortcuts.
Included domains: localreachwebdesign.com, laspanishlessons.com, clsps.net, kitsapcriminaldefense.com, dukeplumb.com, alohachallengecoins.com, traveltechus.com, blog.traveltechus.com, majorchangeinitiative.org, blog.majorchangeinitiative.org.
Mission Control backup badges are sourced from backup-status.json and show OK when a backup is within 7 days.

Open Backup & Restore SOP

Monthly patch window (OS + packages)

What this is: Apply OS/package updates and reboot if needed.
Why it matters: Security fixes and stability; missing patches can leave vulnerabilities.
When it’s due: Every 30 days (warn inside 7 days).
Checklist:
- Open Runbook Prompts (West/East/mc-ops-01 patch + reboot)
- SSH to each server (west-01-warp, east-01-warp, mc-ops-01-warp).
- Run apt update, apt full-upgrade, and cleanup with the runbook prompt for that host.
- If prompted, schedule/perform reboot during low traffic.
- After reboot, verify key sites load (LRWD, traveltechus) and Mission Control runner timers are active.
Confirm success: No pending critical updates; dashboard server tiles show services OK and no reboot required.
Troubleshooting:
- If a package is held/broken: check /var/log/apt, consider apt --fix-broken install.
- If services fail after update: check systemctl status <service>, review logs, restart.

Monthly alert path test (email/SMS)

What this is: A controlled end-to-end test that proves a real HTTP failure flows from origin to collector to alert Worker to email/SMS.
Why it matters: It validates the full live alert path, not just a synthetic Worker dry run.
When it’s due: Every 30 days (warn inside 7 days).
Critical safety rule: Do not break /lb-health.txt, disable Cloudflare shared pools, or alter the shared LB monitor for this test. The LRWD LB health path is shared by multiple managed sites and dropping it can affect other client sites.
Checklist:
- Open Runbook Prompts (LRWD HTTP alert path test).
- Temporarily make only LRWD public page paths such as / and /pricing.html return a failure on one origin.
- Verify /lb-health.txt remains 200 before, during, and after the test.
- Wait for the collector and alert Worker to report the HTTP failure, then confirm email/SMS was received.
- Restore the origin config, rerun/await the collector, and verify the alert Worker returns clear from updated collector input.
Confirm success: Email/SMS arrives for the expected HTTP failure, then /api/alert-system-health and the Backup Integrity alert line return ALERTS: clear after restore.

Monthly malware scan (WordPress)

What this is: A pragmatic WordPress hygiene scan run over SSH against each site’s public URL and configured origins.
What it includes: Public/origin HTML checks, core checksum verification (WP-CLI), uploads PHP exposure, expanded DB/snippet persistence surfaces, new admin detection, active-theme PHP drift, heuristic grep sweeps, and monthly/full ClamAV when available.
What it does not do: No auto-cleanup, no quarantine, no deletes. It produces indicators and an auditable report.
Where reports live (UI): WordPress Malware Scans (Access-protected report downloads).
Where reports live (runner): mc-ops-01:/var/lib/mission-control/artifacts/wp-malware-scan/<site_id>/<YYYY-MM>/
How to run (from your Mac):
- ssh -t mc-ops-01-warp "sudo -u mission-control /opt/localreach/mission-control/collector/run-wp-malware-scans-ovh.sh"
- ssh -t mc-ops-01-warp "sudo -u mission-control /opt/localreach/mission-control/collector/run-wp-malware-scans-ovh.sh --only-site-id kitsap" (single site)
Prompt: Open Runbook Prompts (mc-ops-01 malware scan).
Confirm success: UI shows a fresh “Generated” timestamp and each site is pass or needs_review with counts.
If something is flagged: Take a snapshot/backup first, then do targeted investigation (identify the file, timeframe, and whether it matches a known maintenance window).

Project workspace SOPs

Use these standards when creating, migrating, syncing, or cleaning up Local Reach Web Design project folders.

Canonical root: iCloud Shared Projects GitHub: private repos by default WARP aliases preferred

Standard workspace rule

Active project work lives under ~/Library/Mobile Documents/com~apple~CloudDocs/Shared Projects/.
Each active project should carry AGENTS.md, README.md, deployment.md, SOP.md, .gitignore, docs/, and source/assets folders as applicable.
Do not treat iCloud as source control. Commit useful project history to Git and push approved repos to private GitHub.
Keep secrets, .env files, SSH keys, database dumps, uploads, backup archives, node_modules, .next, dist, and cache folders out of Git and normal synced project source.

Migration workflow

Inventory first: current location, Git state, deployment files, docs, hidden files, large folders, generated artifacts, and possible duplicate copies.
Dry-run copy before moving. Review exclusions before execution.
Copy with metadata preservation, verify the new project opens/builds, then update project docs to the new canonical path.
Move old Desktop sources to a rollback/reference holding folder only after validation. Delete nothing without a separate approval.

Live-site source rule

For current production websites, live production is the source of truth unless a project-specific deployment.md says otherwise.
Extract only safe source candidates into Git: child themes, custom plugins, MU plugins after secret review, static source, and non-secret templates.
Never commit full WordPress runtime trees, uploads, caches, wp-config.php, credentials, SMTP secrets, Turnstile/recaptcha secrets, or customer form submissions.
Before deploying source back to live, require a fresh task, rollback point, dry-run review, explicit approval, and post-deploy Mission Control verification.

Operational references

Quarterly restore test (snapshot/backup)

What this is: Restore a B2/restic website backup and prove the package is usable.
Why it matters: Proves backups are usable; avoids surprises during real incidents.
When it’s due: Every 90 days (warn inside 7 days).
Checklist:
- Open Runbook Prompts (B2/restic restore test).
- Pick a key site from Website Backups.
- Run the Emergency Restore helper into ~/Desktop/Emergency Restore.
- Validate files.tar.gz, manifest.json, and db.sql.gz for WordPress sites.
- Only restore to staging/new VM if the test scope requires booting the site.
- Record time-to-restore and any issues.
Confirm success: Restored package is complete and integrity checks pass; staging loads if staging was part of the test.
Troubleshooting:
- Backup won’t import: check archive integrity, try a different snapshot.
- Permissions/config issues: fix file ownership, update env/DB strings for staging.

MCI site collaboration

Purpose: Karen works locally with Codex and feature branches; Nick deploys when ready.

Repo: localreachwd/mci-site Local preview: http://localhost:3000 Server preview: mci.localreachwebdesign.com (CloudPanel + PM2)

Daily flow (Karen)

Sync and start dev: git pull, npm ci (if needed), npm run dev, open http://localhost:3000.
Edit locally with Codex and refresh the browser to validate changes.
Push a branch and tell Nick the branch name + what changed.

Deploy flow (Nick)

Review/merge Karen's branch into main.
Deploy to server preview and verify the site.

Open MCI collaboration prompts

Quarterly failover drill (LB)

What this is: Practice failing all LB-backed traffic from West to East and back.
Why it matters: Ensures the load balancer + origins really fail over when needed.
When it’s due: Every 90 days (warn inside 7 days).
Checklist:
- During a quiet window, drain/disable West pool in Cloudflare LB.
- Confirm traffic serves from East (dashboard LB tile should show Active: lrwd-east).
- Run basic site checks from East for the current LB-backed domains in the snapshot, currently localreachwebdesign.com, clsps.net, and kitsapcriminaldefense.com.
- Re-enable West and confirm Active returns to lrwd-west.
- Open Runbook Prompts (current shared LB failover drill).
- Document any issues or slow steps.
Confirm success: LB shows primary healthy after re-enable; site loads fine from both regions.
Troubleshooting:
- If LB won’t fail over: check origin health, SSL, and DNS settings in Cloudflare.
- If East is slow/down: inspect east-01 services (nginx/php/mysql/redis) and logs.