Plain-English guides for each tile. Click the anchors below to jump to a section.
mon-lrwd, an HTTPS check for localreachwebdesign.com/lb-health.txt. This is a server-availability signal shared by the sites currently attached to the shared WEST/EAST pools, not an independent per-site application check. The live snapshot currently marks LRWD, CLSPS, and KCD as LB-backed; Duke and Aloha have EAST standby checks but are not currently marked LB-backed.PAGE checks store only safe metadata: URLs checked, status, body byte count, title/text presence, expected-marker counts, and issue names. They do not store full page HTML or injected payloads.Edge (public cert) and Origin (installed server cert on WEST/EAST when applicable).mc-ops-01, not the older NAS backup folder cache.CLOUDPANEL: 2/2 OK means WEST and EAST CloudPanel application metadata backups were uploaded to B2/restic by the weekly CloudPanel app-state timer. This is not a replacement for per-site website file/database backups.ALERTS: clear means /api/alert-system-health reached the alert Worker dry run and found no active alert conditions. Active alerts or API errors turn only the alert line red; the backup integrity value stays tied to B2/restic snapshot status.mc-ops-01 and rerun the backup job. If the CloudPanel line is red, check mission-control-backup-cloudpanel-apps.timer and the CloudPanel app-state status JSON on mc-ops-01. If the alert line is red, open the active dashboard fault, fix the source condition, then verify the alert health line clears after collector input updates.replication.east_backups.clsps.net, kitsapcriminaldefense.com, dukeplumb.com, and alohachallengecoins.com are treated as WordPress and require a current WP files+DB sync signal.Z.Paste this at the top of future chats to provide full Mission Control context.
Mission Control Context Prompt (Leveling Prompt — UPDATED)
This is a leveling prompt. Acknowledge you understand this prompt, then WAIT for my task.
Use: Codex in `danger-full-access` sandbox mode (no approval prompts).
1) Mission Control summary (single source of truth)
- Mission Control is a static status dashboard for Local Reach Web Design.
- All UI renders from one snapshot JSON; the UI must not invent data.
2) Architecture flow (how updates move)
- Mission Control UI is static HTML/CSS/JS (no server rendering).
- The collector runs on the offsite runner mc-ops-01, not the Mac.
- Repo clone: /opt/localreach/mission-control/collector
- Wrapper: /opt/localreach/mission-control/collector/run-collector-ovh.sh
- Logs: /var/log/mission-control/collector.log and collector.err.log
- Manual run from Mac: ssh -t mc-ops-01-warp "sudo -u mission-control /opt/localreach/mission-control/collector/run-collector-ovh.sh"
- Validation output key: snapshots/latest-ovh-validation.json
- Production output key after cutover: snapshots/latest.json
- TrueNAS is historical/rollback context; current production collector and backup runner operations are on mc-ops-01.
- The collector uploads the snapshot to Cloudflare R2 using scoped credentials from /etc/mission-control/mission-control.env.
- The Mission Control site reads that snapshot via a Cloudflare Pages Function:
- API endpoint: /api/snapshot
- Implemented in: functions/api/snapshot.js
- Function loads snapshot from R2 and returns it to the browser
- WordPress malware scan status is served by /api/wp-malware-scans from R2.
- Alert system health is served by /api/alert-system-health, which performs a sanitized dry run of the alert Worker.
- The browser loads /api/snapshot and renders from that data.
- /api/snapshot is protected by Cloudflare Access in production.
- Symptom: HTTP 302 redirect to Access login (Access domain).
- Debug: confirm latest.json locally first, then fetch via Access/service token.
3) Dashboard layout model (3 layers)
Layer A - Impact Now (tiles, always visible)
Goal: "Are customers impacted right now?"
- Cloudflare Load Balancer status (active region + pool health)
- HTTP checks:
- Headers = time-to-headers (fetch resolves when headers arrive)
- Total = time-to-body (after reading response)
- Edge = Cloudflare colo from cf-ray (cf_colo)
- PAGE = customer-facing page sanity check: body size, visible text/title, expected marker, fatal/error text, and known suspicious public HTML indicators
- SSL expiry runway
- Backup integrity + alert system health
- Maintenance Due summary (next item + counts)
- Collector status (last run, age, trigger)
- Replication status tile (WEST->EAST content freshness)
- Sites (Layer A): show per-site Backup Freshness badge (✓/✗) from snapshot.backups.
Layer B - Will something break soon?
Goal: "What is trending toward failure?"
- Server resources / disk / early warnings (as implemented)
Layer C - Deep ops tables
Goal: "What needs action / details?"
- Sites table
- Servers table
- Maintenance table with runbook links
4) Key repos + files
Mission Control site repo (static dashboard):
- index.html (tile layout / sections)
- app.js (fetch snapshot, render tiles/tables)
- styles.css (UI styles)
- help.html (help docs and anchors)
- functions/api/snapshot.js (returns snapshot from R2)
- functions/api/wp-malware-scans.js (returns malware scan summary from R2)
- functions/api/alert-system-health.js (returns sanitized alert Worker dry-run status)
Collector repo (runs on mc-ops-01):
- /opt/localreach/mission-control/collector/collector.mjs (collects statuses + builds snapshot)
- /etc/mission-control/mission-control.env contains scoped Cloudflare/R2/B2 credentials
- Output: /opt/localreach/mission-control/collector/latest.json plus Cloudflare R2 snapshot upload
Backup system (runs on mc-ops-01):
- /opt/localreach/mission-control/backup-websites/run-weekly.sh (website backups, supports --only)
- /opt/localreach/mission-control/collector/backup-status.json (backup status input)
- Encrypted repository: b2:lrwd-mission-control-backups:mission-control/website-backups
5) Snapshot structure (conceptual)
Snapshot JSON is the single source of truth and contains sections like:
- cloudflare_lb
- sites[].http.home.headers_ms (or legacy ttfb_ms) + total_ms + cf_ray + cf_colo
- sites[].customer_experience with safe page-check metadata and issue labels
- ssl
- maintenance
- collector
- backups (from mc-ops-01 backup-status.json)
- backup_integrity
- wp_malware_scan summary via /api/wp-malware-scans
- alert system health via /api/alert-system-health
- replication.lrwd_west_to_east:
- east_received_iso = timestamp of content on EAST
- last_end_iso = last sync completion time
- age_hours
- status = ok/warn/fail
- note (optional)
6) Replication (WEST->EAST) specifics
There is a pull-sync service on EAST that copies localreachwebdesign.com from WEST to EAST.
- EAST uses systemd: wsync.service + wsync.timer
- Script: /usr/local/sbin/wsync.sh reads /root/sync-map.csv
- Log: /var/log/wsync.log
- Current schedule is managed by systemd timers; use the live timer state and snapshot freshness instead of assuming a fixed local schedule.
- Collector checks EAST via SSH and includes replication status in snapshot.
- Collector uses /etc/mission-control/ssh/config and /etc/mission-control/ssh/mission-control-collector on mc-ops-01 for origin SSH checks.
- Replication tile shows last sync + last content timestamp + age/status.
- Hot standby readiness uses nginx_active plus a localhost HTTP check with Host header; timestamps are informational only.
7) Website backups (encrypted, Backblaze B2)
- Full-restore website backups run on mc-ops-01 via /opt/localreach/mission-control/backup-websites/run-weekly.sh.
- Included domains: localreachwebdesign.com, laspanishlessons.com, clsps.net, kitsapcriminaldefense.com, dukeplumb.com, alohachallengecoins.com, traveltechus.com, blog.traveltechus.com, majorchangeinitiative.org, blog.majorchangeinitiative.org.
- Cron schedule timezone may vary (UTC/PST/etc.); folder naming is ALWAYS HST.
- Encrypted backups live in Backblaze B2:
b2:lrwd-mission-control-backups:mission-control/website-backups
- OVH local work packages are temporary and are removed after successful B2/restic upload.
- Durable website backups are restored from B2/restic via the Emergency Restore helper.
- Each run writes files.tar.gz, db.sql.gz (WP only), and manifest.json with run_date_hst + timestamp_utc before upload.
- Status file:
/opt/localreach/mission-control/collector/backup-status.json
- Per-site last_success_utc is updated ONLY after a successful backup for that site.
- On failure: record an error for that site and DO NOT overwrite last_success_utc.
- Backup freshness UI should turn RED at 7 days (within 7 days = OK):
* expected cadence: at least every 7 days
* ok=true when age_days <= 7.0
* ok=false when age_days > 7.0 or last_success_utc missing/invalid
- Collector embeds snapshot.backups from that file; UI shows ✓/✗ next to each site.
8) Alert emails/SMS
- Alert Worker: mission-control-alerts on Cloudflare Workers.
- Schedule: every 15 minutes.
- Relay: Google Apps Script sends through the Local Reach Gmail account to configured email/SMS recipients.
- Trigger scope: dashboard API failure, stale collector, HTTP home/deep failure, customer-facing page failure, urgent SSL or TLS handshake errors, backup/integrity failures, malware review/error, Cloudflare LB hard failure, and LB side degradation/failover sustained for at least 15 minutes.
- It does not send all-clear email; when no alert conditions remain, it silently stores fingerprint clear.
- Monthly maintenance includes "Monthly alert path test (email/SMS)" to prove the end-to-end path. This test must not break /lb-health.txt or shared Cloudflare LB pools.
9) Timezone rules (display-only)
- Collector/snapshot timestamps remain UTC ISO (do not change sources).
- UI requirement: display ALL timestamps in UTC/Zulu ISO format and label them with trailing "Z".
- If UI formatting is inconsistent, fix by routing date-time rendering through UTC/Zulu helpers.
10) Debug order for missing data
1) Collector output local (latest.json)
2) Upload succeeded (R2 updated)
3) /api/snapshot returns it (or Cloudflare Access is blocking)
4) app.js renders it
11) Status thresholds (summary; app.js is truth if this drifts)
- If this summary conflicts with app.js, app.js is the source of truth.
- Cloudflare LB: red if lb.error or active pool unhealthy; yellow if missing pools, active != primary, or failover not ready; green when active == primary and failover pool healthy.
- HTTP: red if any site http.home.ok or http.deep.ok is false; warn if TTFB >= 1500ms (HTTP_TTFB_WARN_MS); fail if TTFB > 2200ms (HTTP_TTFB_FAIL_MS).
- Customer-facing PAGE: red if configured public page checks fail body-size, visible-content, expected-marker, fatal/error-page, or suspicious public HTML indicator rules.
- SSL: warn when days_remaining < 30; fail when days_remaining < 7; OK when >= 30.
- Replication: hot-standby tile is readiness-first. In hot-standby mode, each site needs readiness checks plus a fresh sync signal from replication.east_backups (max 36h) to be green. For WordPress sites (clsps.net, kitsapcriminaldefense.com, dukeplumb.com, alohachallengecoins.com), WP files+DB sync signal is required.
- Collector: stale if generated_at/collector.last_run_at older than 12 minutes; invalid timestamp = red.
- Servers: fail if disk.root_used_pct >= 90 or server.ok === false (or server.severity red); warn if disk.root_used_pct >= 80, reboot_required true, updates_pending >= 100, or any service ok === false.
- Maintenance: overdue (days_until_due < 0) is red; warn windows come from the maintenance item config.
- Backups: ok if backup-status.json says ok === true (age_days <= 7.0); otherwise red.
- Alert system health: Backup Integrity includes a separate ALERTS line that shows clear only when /api/alert-system-health can reach the alert Worker dry run and reports zero active alerts.
12) How to work on changes (rules)
- Make minimal edits and keep layout consistent with the 3-layer model.
- Do not change snapshot schema unless explicitly required.
- If adding a new status:
1) Add collector logic in collector.mjs
2) Add to latest.json payload
3) Ensure upload to R2 works
4) Ensure /api/snapshot returns it
5) Render in app.js
6) Add help section in help.html
13) What I want from you (Codex)
When I ask for Mission Control changes:
- Assume this architecture and layout.
- Ask for the smallest missing detail only if absolutely necessary.
- Provide step-by-step "replace this / add this after this" instructions.
- Prefer safe, incremental changes and include quick verification commands.
14) Collector debug flags (only when needed)
- MC_DEBUG_CF_LB=1 enables extra Cloudflare LB debug lines:
- CF LB pools attached | <id>:<name>
- CF LB pools origins | <name>:<origin_count>
15) Completion requirement (deploy + version)
- When work is complete: commit, push, and deploy to the live Mission Control site.
- Report the deployed version as: UI v: <git short sha> (example: UI v: bdc9ab4)
/var/log/mission-control/collector.log (and collector.err.log) on mc-ops-01.ssh -t mc-ops-01-warp "sudo -u mission-control /opt/localreach/mission-control/collector/run-collector-ovh.sh".mc-ops-01 using the mission-control service account./etc/mission-control/mission-control.env for Cloudflare, R2, B2, and Access credentials.systemctl status <name>, check logs, restart if safe.