Plain-English steps to handle maintenance items and red/yellow alerts.
Maintenance overview
What it means: These are scheduled tasks that prevent surprises (patches, backups, failover drills).
Status colors: Green = not due soon. Yellow = due soon (inside warning window). Red = overdue.
Due math: next due = last done + cadence. “Due in” shows days remaining (negative = overdue).
If you see RED: Open the matching runbook below and do the steps. Update the collector config on mc-ops-01-warp:/opt/localreach/mission-control/collector/config/maintenance.json when done (this file is not in the Mission Control UI repo).
Advanced notes:
Cadence and warn windows are defined per item in /opt/localreach/mission-control/collector/config/maintenance.json on mc-ops-01, reached from the Mac as mc-ops-01-warp.
Status is worst-case across items. Runbook links are anchored to the IDs in that file.
What to do when something is RED
Find the item in the dashboard (Maintenance table) and click its Runbook link.
Follow the step-by-step list below for that item.
Verify success (checks included per item).
Update mc-ops-01-warp:/opt/localreach/mission-control/collector/config/maintenance.json with the new last_done_at, result, and notes, then rerun the collector.
Backup & Restore SOP
Choose the right restore method (Website backups vs Server backups vs Snapshots) + copyable Copilot prompts.
Website backups are encrypted with restic and stored in Backblaze B2 bucket lrwd-mission-control-backups under mission-control/website-backups.
Runner path: mc-ops-01:/opt/localreach/mission-control/backup-websites/run-weekly.sh. The runner creates temporary HST-dated packages, uploads them to B2/restic, then removes the local tarballs from OVH.
Manual run commands live on Website Backups, including all-sites, single-site, and emergency restore shortcuts.
SSH to each server (west-01-warp, east-01-warp, mc-ops-01-warp).
Run apt update, apt full-upgrade, and cleanup with the runbook prompt for that host.
If prompted, schedule/perform reboot during low traffic.
After reboot, verify key sites load (LRWD, traveltechus) and Mission Control runner timers are active.
Confirm success: No pending critical updates; dashboard server tiles show services OK and no reboot required.
Troubleshooting:
If a package is held/broken: check /var/log/apt, consider apt --fix-broken install.
If services fail after update: check systemctl status <service>, review logs, restart.
Monthly alert path test (email/SMS)
What this is: A controlled end-to-end test that proves a real HTTP failure flows from origin to collector to alert Worker to email/SMS.
Why it matters: It validates the full live alert path, not just a synthetic Worker dry run.
When it’s due: Every 30 days (warn inside 7 days).
Critical safety rule: Do not break /lb-health.txt, disable Cloudflare shared pools, or alter the shared LB monitor for this test. The LRWD LB health path is shared by multiple managed sites and dropping it can affect other client sites.
Temporarily make only LRWD public page paths such as / and /pricing.html return a failure on one origin.
Verify /lb-health.txt remains 200 before, during, and after the test.
Wait for the collector and alert Worker to report the HTTP failure, then confirm email/SMS was received.
Restore the origin config, rerun/await the collector, and verify the alert Worker returns clear from updated collector input.
Confirm success: Email/SMS arrives for the expected HTTP failure, then /api/alert-system-health and the Backup Integrity alert line return ALERTS: clear after restore.
Monthly malware scan (WordPress)
What this is: A pragmatic WordPress hygiene scan run over SSH against each site’s public URL and configured origins.
What it includes: Public/origin HTML checks, core checksum verification (WP-CLI), uploads PHP exposure, expanded DB/snippet persistence surfaces, new admin detection, active-theme PHP drift, heuristic grep sweeps, and monthly/full ClamAV when available.
What it does not do: No auto-cleanup, no quarantine, no deletes. It produces indicators and an auditable report.
Confirm success: UI shows a fresh “Generated” timestamp and each site is pass or needs_review with counts.
If something is flagged: Take a snapshot/backup first, then do targeted investigation (identify the file, timeframe, and whether it matches a known maintenance window).
Project workspace SOPs
Use these standards when creating, migrating, syncing, or cleaning up Local Reach Web Design project folders.
Active project work lives under ~/Library/Mobile Documents/com~apple~CloudDocs/Shared Projects/.
Each active project should carry AGENTS.md, README.md, deployment.md, SOP.md, .gitignore, docs/, and source/assets folders as applicable.
Do not treat iCloud as source control. Commit useful project history to Git and push approved repos to private GitHub.
Keep secrets, .env files, SSH keys, database dumps, uploads, backup archives, node_modules, .next, dist, and cache folders out of Git and normal synced project source.
Migration workflow
Inventory first: current location, Git state, deployment files, docs, hidden files, large folders, generated artifacts, and possible duplicate copies.
Dry-run copy before moving. Review exclusions before execution.
Copy with metadata preservation, verify the new project opens/builds, then update project docs to the new canonical path.
Move old Desktop sources to a rollback/reference holding folder only after validation. Delete nothing without a separate approval.
Live-site source rule
For current production websites, live production is the source of truth unless a project-specific deployment.md says otherwise.
Extract only safe source candidates into Git: child themes, custom plugins, MU plugins after secret review, static source, and non-secret templates.
Never commit full WordPress runtime trees, uploads, caches, wp-config.php, credentials, SMTP secrets, Turnstile/recaptcha secrets, or customer form submissions.
Before deploying source back to live, require a fresh task, rollback point, dry-run review, explicit approval, and post-deploy Mission Control verification.
What this is: Practice failing all LB-backed traffic from West to East and back.
Why it matters: Ensures the load balancer + origins really fail over when needed.
When it’s due: Every 90 days (warn inside 7 days).
Checklist:
During a quiet window, drain/disable West pool in Cloudflare LB.
Confirm traffic serves from East (dashboard LB tile should show Active: lrwd-east).
Run basic site checks from East for the current LB-backed domains in the snapshot, currently localreachwebdesign.com, clsps.net, and kitsapcriminaldefense.com.
Re-enable West and confirm Active returns to lrwd-west.