design: reshape cloud-svc as control plane (two-port split)
Earlier draft archived cloud-svc entirely. Better shape: keep it as a control plane for the restic backend. Two listeners in one process: - provisioning :9091 on automc-net (called by discord-bot) - operator :9092 on 127.0.0.1 (called by automc-setup wizard) Players still hit restic-rest-server (data plane) directly with their per-user password. cloud-svc never sits in the player data path — limits its public exposure to zero.
This commit is contained in:
@@ -2,7 +2,11 @@
|
|||||||
|
|
||||||
Per-Discord-user state sync for Minecraft. Pulls on launch, pushes on exit. Single JAR drops into Prism / MMC / ATLauncher / frazclient as a pre-launch + post-exit hook.
|
Per-Discord-user state sync for Minecraft. Pulls on launch, pushes on exit. Single JAR drops into Prism / MMC / ATLauncher / frazclient as a pre-launch + post-exit hook.
|
||||||
|
|
||||||
**Backend:** `restic-rest-server` with `--private-repos --append-only`. No custom server code. cloud-sync.jar is a restic CLI wrapper.
|
**Data plane:** `restic-rest-server` with `--private-repos --append-only`. Clients hit this directly with their per-user password.
|
||||||
|
|
||||||
|
**Control plane:** `cloud-svc` Go service with two listeners — a provisioning port reachable from automc-net (called by discord-bot) and a loopback admin port (called by automc-setup wizard). Players never touch cloud-svc.
|
||||||
|
|
||||||
|
**Client:** `cloud-sync.jar` subprocesses restic. ~200 LOC.
|
||||||
|
|
||||||
## Why this shape
|
## Why this shape
|
||||||
|
|
||||||
@@ -16,13 +20,15 @@ Per-Discord-user state sync for Minecraft. Pulls on launch, pushes on exit. Sing
|
|||||||
| Encryption at rest | Per-repo password, built in |
|
| Encryption at rest | Per-repo password, built in |
|
||||||
| Multi-machine support | Restic tags + hostname; if we ever want it, free |
|
| Multi-machine support | Restic tags + hostname; if we ever want it, free |
|
||||||
|
|
||||||
cloud-svc as I'd been building it was a worse re-implementation of all the above. Pivoting before it ships.
|
cloud-svc as originally designed was a worse re-implementation of all the above. Pivoting before it ships; cloud-svc gets reshaped into the control plane described below.
|
||||||
|
|
||||||
## Topology
|
## Topology
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
flowchart LR
|
flowchart LR
|
||||||
pl["player PC"]:::external
|
pl["player PC"]:::external
|
||||||
|
op["operator
|
||||||
|
(via SSH)"]:::external
|
||||||
jar["cloud-sync.jar
|
jar["cloud-sync.jar
|
||||||
(in launcher's
|
(in launcher's
|
||||||
pre/post hooks)"]:::deploy
|
pre/post hooks)"]:::deploy
|
||||||
@@ -33,13 +39,24 @@ on first run)"]:::deploy
|
|||||||
subgraph john["john (192.168.65.33)"]
|
subgraph john["john (192.168.65.33)"]
|
||||||
rp{{"reverse proxy
|
rp{{"reverse proxy
|
||||||
:443"}}:::deploy
|
:443"}}:::deploy
|
||||||
|
|
||||||
|
subgraph net["automc-net"]
|
||||||
ao{{"restic-rest-server
|
ao{{"restic-rest-server
|
||||||
--private-repos
|
--private-repos
|
||||||
--append-only
|
--append-only
|
||||||
:8002"}}:::deploy
|
:8002"}}:::deploy
|
||||||
|
bot{{"discord-bot"}}:::deploy
|
||||||
|
cs_int{{"cloud-svc
|
||||||
|
provisioning :9091
|
||||||
|
(automc-net only)"}}:::deploy
|
||||||
|
end
|
||||||
|
|
||||||
|
cs_admin{{"cloud-svc
|
||||||
|
admin :9092
|
||||||
|
(127.0.0.1 only)"}}:::deploy
|
||||||
|
|
||||||
store[/"/srv/cloud-data
|
store[/"/srv/cloud-data
|
||||||
/<discord_id>/..."/]:::pvc
|
/<discord_id>/..."/]:::pvc
|
||||||
bot{{"discord-bot"}}:::deploy
|
|
||||||
htp[/"/etc/restic-users
|
htp[/"/etc/restic-users
|
||||||
htpasswd"/]:::pvc
|
htpasswd"/]:::pvc
|
||||||
end
|
end
|
||||||
@@ -50,15 +67,39 @@ htpasswd"/]:::pvc
|
|||||||
rp -->|"loopback"| ao
|
rp -->|"loopback"| ao
|
||||||
ao -->|"reads"| htp
|
ao -->|"reads"| htp
|
||||||
ao -->|"writes"| store
|
ao -->|"writes"| store
|
||||||
|
|
||||||
bot -.->|"on /register:
|
bot -.->|"on /register:
|
||||||
htpasswd -B add"| htp
|
POST /admin/users"| cs_int
|
||||||
|
cs_int -.->|"htpasswd add
|
||||||
|
restic init
|
||||||
|
key add"| htp
|
||||||
|
cs_int -.->|"mints repo"| store
|
||||||
bot -.->|"DM password"| pl
|
bot -.->|"DM password"| pl
|
||||||
|
|
||||||
|
op -.->|"SSH then
|
||||||
|
automc-setup cloud ..."| cs_admin
|
||||||
|
cs_admin -.->|"list / prune / revoke"| htp
|
||||||
|
cs_admin -.->|"prune via
|
||||||
|
operator master key"| store
|
||||||
|
|
||||||
classDef deploy fill:#d5e8d4,stroke:#82b366,color:#000
|
classDef deploy fill:#d5e8d4,stroke:#82b366,color:#000
|
||||||
classDef pvc fill:#f5f5f5,stroke:#666,color:#000
|
classDef pvc fill:#f5f5f5,stroke:#666,color:#000
|
||||||
classDef external fill:#f5f5f5,stroke:#666,color:#000,stroke-dasharray:5 5
|
classDef external fill:#f5f5f5,stroke:#666,color:#000,stroke-dasharray:5 5
|
||||||
```
|
```
|
||||||
|
|
||||||
|
`cloud-svc` runs as **one process with two listeners**:
|
||||||
|
|
||||||
|
| Listener | Bind | Reachable from | Endpoints |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Provisioning | `automc-net:9091` (no PublishPort) | discord-bot via service-net DNS | `POST /admin/users` only |
|
||||||
|
| Operator | `127.0.0.1:9092` | john's loopback (SSH session) | `GET/DELETE /admin/users`, `POST /admin/users/{id}/prune`, `GET /admin/users/{id}/quota`, etc. |
|
||||||
|
|
||||||
|
The split means a compromised discord-bot can mint new accounts but cannot enumerate, prune, or revoke existing ones. Operator-only ops require shell access on john.
|
||||||
|
|
||||||
|
Auth model:
|
||||||
|
- Provisioning listener: shared service token (env `CLOUD_PROVISIONING_KEY`), discord-bot uses same value from its own env
|
||||||
|
- Operator listener: no auth — loopback bind is the boundary, same pattern as `server-manager:127.0.0.1:8080`
|
||||||
|
|
||||||
## Auth & identity
|
## Auth & identity
|
||||||
|
|
||||||
| Element | Value |
|
| Element | Value |
|
||||||
@@ -68,9 +109,9 @@ htpasswd -B add"| htp
|
|||||||
| URL pattern | `rest:https://cloud.tm.center/<discord_id>/` |
|
| URL pattern | `rest:https://cloud.tm.center/<discord_id>/` |
|
||||||
| Server isolation | `--private-repos` enforces URL path matches authenticated user |
|
| Server isolation | `--private-repos` enforces URL path matches authenticated user |
|
||||||
|
|
||||||
discord-bot's `/register` flow extends to mint a random password, `htpasswd -B`-add it to the file, DM the password to the player. Existing flow stays untouched for non-cloud cases.
|
discord-bot's `/register` flow extends to call `POST cloud-svc:9091/admin/users` with the player's Discord ID. cloud-svc mints a random password, `htpasswd -B`-adds it to the file, runs `restic init` + `restic key add operator-master`, and returns the password. discord-bot DMs it to the player. discord-bot itself never touches restic or htpasswd directly.
|
||||||
|
|
||||||
Revocation = `htpasswd -D` removes the user. No token store, no scope checks, no auth-service involvement.
|
Revocation = operator runs `automc-setup cloud revoke <discord_id>` which hits the loopback admin port. No token store, no scope checks, no auth-service involvement.
|
||||||
|
|
||||||
## Client flow
|
## Client flow
|
||||||
|
|
||||||
@@ -138,40 +179,64 @@ Recommendation: **option 1** (multi-key per repo). On `/register`, the bot calls
|
|||||||
- Operator UI for "this player has 25 GB of cloud data, what's in it?"
|
- Operator UI for "this player has 25 GB of cloud data, what's in it?"
|
||||||
- Cross-machine sync UX (you can play on PC A then PC B; latest snapshot wins. No conflict UI because restic doesn't merge — restore-latest is destructive by design.)
|
- Cross-machine sync UX (you can play on PC A then PC B; latest snapshot wins. No conflict UI because restic doesn't merge — restore-latest is destructive by design.)
|
||||||
|
|
||||||
## Migration from cloud-svc
|
## cloud-svc — reshape, not delete
|
||||||
|
|
||||||
cloud-svc was never deployed. No user data to migrate. Action:
|
cloud-svc gets a new purpose: control plane for the restic backend. Throw away:
|
||||||
- Archive `Timemachine/cloud-svc` repo (mark archived, leave commits + DESIGN.md as a record)
|
|
||||||
- Delete `cloud_pull` / `cloud_push` from `frazclient/client.py`
|
- Manifest types + validation (`manifest.go`)
|
||||||
- Remove `automc_cloud_svc.md` memory entry, replace with `automc_cloud_sync.md` pointing here
|
- Blob storage + tarball extraction (`storage.go` body)
|
||||||
|
- Player-facing `/v1/*` endpoints (`server.go` body)
|
||||||
|
- Snapshot ID generation, content hash cross-check
|
||||||
|
|
||||||
|
Keep:
|
||||||
|
- Project skeleton (go.mod, Dockerfile, Makefile, CI)
|
||||||
|
- Auth-cache pattern from `auth.go` (reused for provisioning token verification)
|
||||||
|
- Per-user mutex pattern from `storage.go` (still needed to serialize concurrent provisioning calls)
|
||||||
|
- Config loader from `config.go` (adds new vars)
|
||||||
|
|
||||||
|
New code:
|
||||||
|
- Two `http.Server` instances, one per listener
|
||||||
|
- htpasswd writer that respects bcrypt + file locking
|
||||||
|
- restic CLI subprocesser (init repo, add key, prune)
|
||||||
|
- `time.Ticker` for nightly prune job
|
||||||
|
|
||||||
|
Estimate: ~300 LOC kept, ~600 LOC new. Net smaller than current cloud-svc.
|
||||||
|
|
||||||
|
Also delete `cloud_pull` / `cloud_push` from `frazclient/client.py` (these get obsoleted by `cloud-sync.jar` calls).
|
||||||
|
|
||||||
## Topology consequences for `automc/docs/network-exposure.md`
|
## Topology consequences for `automc/docs/network-exposure.md`
|
||||||
|
|
||||||
Same one public endpoint (`cloud.tm.center :443`), same reverse-proxy hardening checklist, same threat surface. Differences:
|
| Layer | Bind | Public? |
|
||||||
|
|---|---|---|
|
||||||
|
| `restic-ao` (data plane) | `127.0.0.1:8002` | Via reverse proxy at `cloud.tm.center:443` |
|
||||||
|
| `cloud-svc` provisioning listener | `automc-net:9091` (no PublishPort) | No |
|
||||||
|
| `cloud-svc` admin listener | `127.0.0.1:9092` | No |
|
||||||
|
|
||||||
| Old (cloud-svc) | New (restic-ao) |
|
Only one public HTTPS endpoint changes from the original plan: it now fronts `restic-ao` instead of `cloud-svc`. Same reverse-proxy hardening checklist applies. Threat surface differences:
|
||||||
|
|
||||||
|
| Old (cloud-svc as data path) | New (restic-ao as data path) |
|
||||||
|---|---|
|
|---|---|
|
||||||
| Bearer token via auth-service `/auth/verify-key` | Basic auth via htpasswd in restic-rest-server |
|
| Bearer token via auth-service `/auth/verify-key` | HTTP Basic via htpasswd in restic-rest-server |
|
||||||
| Token leak = one user's data | Password leak = one user's data |
|
|
||||||
| Custom Go service, 33 tests | Upstream restic-rest-server, well-audited |
|
| Custom Go service, 33 tests | Upstream restic-rest-server, well-audited |
|
||||||
| `127.0.0.1:9091` loopback bind | `127.0.0.1:8002` (existing restic-ao quadlet) |
|
| Player-facing endpoints | None — cloud-svc not public |
|
||||||
| 60s in-memory cache of verified tokens | rest-server reads htpasswd per request |
|
|
||||||
|
|
||||||
Net: fewer moving parts, smaller attack surface.
|
Operator endpoints are loopback-only and require SSH access to john to reach. No new public surface from the control plane.
|
||||||
|
|
||||||
## Repo layout post-pivot
|
## Repo layout post-pivot
|
||||||
|
|
||||||
| Repo | Purpose |
|
| Repo | Purpose |
|
||||||
|---|---|
|
|---|---|
|
||||||
| `Timemachine/cloud-sync` (this) | Kotlin/Gradle JAR that subprocesses restic |
|
| `Timemachine/cloud-sync` (this) | Kotlin/Gradle JAR that subprocesses restic |
|
||||||
| `Timemachine/cloud-svc` | **Archived.** Snapshot of the abandoned path; commits + DESIGN.md kept as decision record |
|
| `Timemachine/cloud-svc` | **Reshaped** — control plane only. Two-port Go service for provisioning + operator ops. NOT archived. |
|
||||||
| `Timemachine/discord-bot` | Extended `/register` flow to mint htpasswd creds + init restic repo |
|
| `Timemachine/discord-bot` | Extended `/register` flow calls cloud-svc to provision; DMs returned password |
|
||||||
| `Timemachine/automc` | `setup` wizard renders the restic-ao quadlet with the new flags; `database/schema.sql` unchanged |
|
| `Timemachine/automc` | `setup` wizard adds `automc-setup cloud {list,prune,revoke,quota}` subcommands hitting cloud-svc's loopback admin port. Quadlet templates for both restic-ao (new flags) and cloud-svc (two listeners). `database/schema.sql` unchanged. |
|
||||||
|
|
||||||
## Pre-implementation checklist
|
## Pre-implementation checklist
|
||||||
|
|
||||||
- [ ] User reviews this design doc
|
- [ ] User reviews this design doc
|
||||||
- [ ] Confirm: server-side prune via operator master password (option 1 above)
|
- [x] **Confirmed (2026-06-02): cloud-svc reshapes to control plane, not archived**
|
||||||
- [ ] Confirm: archive cloud-svc rather than delete
|
- [x] **Confirmed (2026-06-02): two-port split — automc-net for provisioning, loopback for operator**
|
||||||
|
- [ ] Confirm: server-side prune via operator master password key on each repo
|
||||||
- [ ] Confirm: cloud-sync.jar auto-downloads restic binary vs requires it pre-installed
|
- [ ] Confirm: cloud-sync.jar auto-downloads restic binary vs requires it pre-installed
|
||||||
- [ ] Confirm: nightly prune at 04:00 UTC vs after-each-push
|
- [ ] Confirm: nightly prune cadence (default proposal: daily 04:00 UTC)
|
||||||
|
- [ ] Confirm: shared service token between discord-bot and cloud-svc provisioning port (env var on both)
|
||||||
|
|||||||
Reference in New Issue
Block a user