From d9a6057c753a5dfc144dc38b7bbe09453b0788a6 Mon Sep 17 00:00:00 2001 From: claude-timemachine Date: Tue, 2 Jun 2026 21:19:45 +0200 Subject: [PATCH] design: reshape cloud-svc as control plane (two-port split) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Earlier draft archived cloud-svc entirely. Better shape: keep it as a control plane for the restic backend. Two listeners in one process: - provisioning :9091 on automc-net (called by discord-bot) - operator :9092 on 127.0.0.1 (called by automc-setup wizard) Players still hit restic-rest-server (data plane) directly with their per-user password. cloud-svc never sits in the player data path — limits its public exposure to zero. --- DESIGN.md | 115 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 90 insertions(+), 25 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index a599840..a9f3f81 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -2,7 +2,11 @@ Per-Discord-user state sync for Minecraft. Pulls on launch, pushes on exit. Single JAR drops into Prism / MMC / ATLauncher / frazclient as a pre-launch + post-exit hook. -**Backend:** `restic-rest-server` with `--private-repos --append-only`. No custom server code. cloud-sync.jar is a restic CLI wrapper. +**Data plane:** `restic-rest-server` with `--private-repos --append-only`. Clients hit this directly with their per-user password. + +**Control plane:** `cloud-svc` Go service with two listeners — a provisioning port reachable from automc-net (called by discord-bot) and a loopback admin port (called by automc-setup wizard). Players never touch cloud-svc. + +**Client:** `cloud-sync.jar` subprocesses restic. ~200 LOC. ## Why this shape @@ -16,13 +20,15 @@ Per-Discord-user state sync for Minecraft. Pulls on launch, pushes on exit. Sing | Encryption at rest | Per-repo password, built in | | Multi-machine support | Restic tags + hostname; if we ever want it, free | -cloud-svc as I'd been building it was a worse re-implementation of all the above. Pivoting before it ships. +cloud-svc as originally designed was a worse re-implementation of all the above. Pivoting before it ships; cloud-svc gets reshaped into the control plane described below. ## Topology ```mermaid flowchart LR pl["player PC"]:::external + op["operator +(via SSH)"]:::external jar["cloud-sync.jar (in launcher's pre/post hooks)"]:::deploy @@ -33,13 +39,24 @@ on first run)"]:::deploy subgraph john["john (192.168.65.33)"] rp{{"reverse proxy :443"}}:::deploy - ao{{"restic-rest-server + + subgraph net["automc-net"] + ao{{"restic-rest-server --private-repos --append-only :8002"}}:::deploy + bot{{"discord-bot"}}:::deploy + cs_int{{"cloud-svc +provisioning :9091 +(automc-net only)"}}:::deploy + end + + cs_admin{{"cloud-svc +admin :9092 +(127.0.0.1 only)"}}:::deploy + store[/"/srv/cloud-data //..."/]:::pvc - bot{{"discord-bot"}}:::deploy htp[/"/etc/restic-users htpasswd"/]:::pvc end @@ -50,15 +67,39 @@ htpasswd"/]:::pvc rp -->|"loopback"| ao ao -->|"reads"| htp ao -->|"writes"| store + bot -.->|"on /register: -htpasswd -B add"| htp +POST /admin/users"| cs_int + cs_int -.->|"htpasswd add +restic init +key add"| htp + cs_int -.->|"mints repo"| store bot -.->|"DM password"| pl + op -.->|"SSH then +automc-setup cloud ..."| cs_admin + cs_admin -.->|"list / prune / revoke"| htp + cs_admin -.->|"prune via +operator master key"| store + classDef deploy fill:#d5e8d4,stroke:#82b366,color:#000 classDef pvc fill:#f5f5f5,stroke:#666,color:#000 classDef external fill:#f5f5f5,stroke:#666,color:#000,stroke-dasharray:5 5 ``` +`cloud-svc` runs as **one process with two listeners**: + +| Listener | Bind | Reachable from | Endpoints | +|---|---|---|---| +| Provisioning | `automc-net:9091` (no PublishPort) | discord-bot via service-net DNS | `POST /admin/users` only | +| Operator | `127.0.0.1:9092` | john's loopback (SSH session) | `GET/DELETE /admin/users`, `POST /admin/users/{id}/prune`, `GET /admin/users/{id}/quota`, etc. | + +The split means a compromised discord-bot can mint new accounts but cannot enumerate, prune, or revoke existing ones. Operator-only ops require shell access on john. + +Auth model: +- Provisioning listener: shared service token (env `CLOUD_PROVISIONING_KEY`), discord-bot uses same value from its own env +- Operator listener: no auth — loopback bind is the boundary, same pattern as `server-manager:127.0.0.1:8080` + ## Auth & identity | Element | Value | @@ -68,9 +109,9 @@ htpasswd -B add"| htp | URL pattern | `rest:https://cloud.tm.center//` | | Server isolation | `--private-repos` enforces URL path matches authenticated user | -discord-bot's `/register` flow extends to mint a random password, `htpasswd -B`-add it to the file, DM the password to the player. Existing flow stays untouched for non-cloud cases. +discord-bot's `/register` flow extends to call `POST cloud-svc:9091/admin/users` with the player's Discord ID. cloud-svc mints a random password, `htpasswd -B`-adds it to the file, runs `restic init` + `restic key add operator-master`, and returns the password. discord-bot DMs it to the player. discord-bot itself never touches restic or htpasswd directly. -Revocation = `htpasswd -D` removes the user. No token store, no scope checks, no auth-service involvement. +Revocation = operator runs `automc-setup cloud revoke ` which hits the loopback admin port. No token store, no scope checks, no auth-service involvement. ## Client flow @@ -138,40 +179,64 @@ Recommendation: **option 1** (multi-key per repo). On `/register`, the bot calls - Operator UI for "this player has 25 GB of cloud data, what's in it?" - Cross-machine sync UX (you can play on PC A then PC B; latest snapshot wins. No conflict UI because restic doesn't merge — restore-latest is destructive by design.) -## Migration from cloud-svc +## cloud-svc — reshape, not delete -cloud-svc was never deployed. No user data to migrate. Action: -- Archive `Timemachine/cloud-svc` repo (mark archived, leave commits + DESIGN.md as a record) -- Delete `cloud_pull` / `cloud_push` from `frazclient/client.py` -- Remove `automc_cloud_svc.md` memory entry, replace with `automc_cloud_sync.md` pointing here +cloud-svc gets a new purpose: control plane for the restic backend. Throw away: + +- Manifest types + validation (`manifest.go`) +- Blob storage + tarball extraction (`storage.go` body) +- Player-facing `/v1/*` endpoints (`server.go` body) +- Snapshot ID generation, content hash cross-check + +Keep: +- Project skeleton (go.mod, Dockerfile, Makefile, CI) +- Auth-cache pattern from `auth.go` (reused for provisioning token verification) +- Per-user mutex pattern from `storage.go` (still needed to serialize concurrent provisioning calls) +- Config loader from `config.go` (adds new vars) + +New code: +- Two `http.Server` instances, one per listener +- htpasswd writer that respects bcrypt + file locking +- restic CLI subprocesser (init repo, add key, prune) +- `time.Ticker` for nightly prune job + +Estimate: ~300 LOC kept, ~600 LOC new. Net smaller than current cloud-svc. + +Also delete `cloud_pull` / `cloud_push` from `frazclient/client.py` (these get obsoleted by `cloud-sync.jar` calls). ## Topology consequences for `automc/docs/network-exposure.md` -Same one public endpoint (`cloud.tm.center :443`), same reverse-proxy hardening checklist, same threat surface. Differences: +| Layer | Bind | Public? | +|---|---|---| +| `restic-ao` (data plane) | `127.0.0.1:8002` | Via reverse proxy at `cloud.tm.center:443` | +| `cloud-svc` provisioning listener | `automc-net:9091` (no PublishPort) | No | +| `cloud-svc` admin listener | `127.0.0.1:9092` | No | -| Old (cloud-svc) | New (restic-ao) | +Only one public HTTPS endpoint changes from the original plan: it now fronts `restic-ao` instead of `cloud-svc`. Same reverse-proxy hardening checklist applies. Threat surface differences: + +| Old (cloud-svc as data path) | New (restic-ao as data path) | |---|---| -| Bearer token via auth-service `/auth/verify-key` | Basic auth via htpasswd in restic-rest-server | -| Token leak = one user's data | Password leak = one user's data | +| Bearer token via auth-service `/auth/verify-key` | HTTP Basic via htpasswd in restic-rest-server | | Custom Go service, 33 tests | Upstream restic-rest-server, well-audited | -| `127.0.0.1:9091` loopback bind | `127.0.0.1:8002` (existing restic-ao quadlet) | -| 60s in-memory cache of verified tokens | rest-server reads htpasswd per request | +| Player-facing endpoints | None — cloud-svc not public | -Net: fewer moving parts, smaller attack surface. +Operator endpoints are loopback-only and require SSH access to john to reach. No new public surface from the control plane. ## Repo layout post-pivot | Repo | Purpose | |---|---| | `Timemachine/cloud-sync` (this) | Kotlin/Gradle JAR that subprocesses restic | -| `Timemachine/cloud-svc` | **Archived.** Snapshot of the abandoned path; commits + DESIGN.md kept as decision record | -| `Timemachine/discord-bot` | Extended `/register` flow to mint htpasswd creds + init restic repo | -| `Timemachine/automc` | `setup` wizard renders the restic-ao quadlet with the new flags; `database/schema.sql` unchanged | +| `Timemachine/cloud-svc` | **Reshaped** — control plane only. Two-port Go service for provisioning + operator ops. NOT archived. | +| `Timemachine/discord-bot` | Extended `/register` flow calls cloud-svc to provision; DMs returned password | +| `Timemachine/automc` | `setup` wizard adds `automc-setup cloud {list,prune,revoke,quota}` subcommands hitting cloud-svc's loopback admin port. Quadlet templates for both restic-ao (new flags) and cloud-svc (two listeners). `database/schema.sql` unchanged. | ## Pre-implementation checklist - [ ] User reviews this design doc -- [ ] Confirm: server-side prune via operator master password (option 1 above) -- [ ] Confirm: archive cloud-svc rather than delete +- [x] **Confirmed (2026-06-02): cloud-svc reshapes to control plane, not archived** +- [x] **Confirmed (2026-06-02): two-port split — automc-net for provisioning, loopback for operator** +- [ ] Confirm: server-side prune via operator master password key on each repo - [ ] Confirm: cloud-sync.jar auto-downloads restic binary vs requires it pre-installed -- [ ] Confirm: nightly prune at 04:00 UTC vs after-each-push +- [ ] Confirm: nightly prune cadence (default proposal: daily 04:00 UTC) +- [ ] Confirm: shared service token between discord-bot and cloud-svc provisioning port (env var on both)