Files
cloud-sync/DESIGN.md
T
claude-timemachine 698a7a037c
CI / build (push) Failing after 3s
CI / release (push) Has been skipped
design: pivot to restic-rest-server as the backend
cloud-svc was a worse re-implementation of what restic-rest-server
already does (--private-repos + --append-only + native retention +
chunk-level dedup). Pivoting before either ships in production.

cloud-sync.jar becomes a restic CLI wrapper. ~200 LOC instead of
~2000+ in the custom-server path. Server-side prune via operator
master password (option 1 — multi-key per repo).

Open questions flagged at end of doc for confirmation.
2026-06-02 20:44:48 +02:00

178 lines
7.9 KiB
Markdown

# cloud-sync — design
Per-Discord-user state sync for Minecraft. Pulls on launch, pushes on exit. Single JAR drops into Prism / MMC / ATLauncher / frazclient as a pre-launch + post-exit hook.
**Backend:** `restic-rest-server` with `--private-repos --append-only`. No custom server code. cloud-sync.jar is a restic CLI wrapper.
## Why this shape
| Concern | How restic solves it |
|---|---|
| Snapshot semantics | Native — every `restic backup` is a snapshot |
| Deduplication | Chunk-level (not just file-level), built in |
| Retention policy | `restic forget --keep-last/daily/weekly/monthly` |
| Append-only enforcement | `restic-rest-server --append-only`: even with a valid password, clients can't delete |
| Per-user isolation | `--private-repos`: URL path must contain the authenticated username |
| Encryption at rest | Per-repo password, built in |
| Multi-machine support | Restic tags + hostname; if we ever want it, free |
cloud-svc as I'd been building it was a worse re-implementation of all the above. Pivoting before it ships.
## Topology
```mermaid
flowchart LR
pl["player PC"]:::external
jar["cloud-sync.jar
(in launcher's
pre/post hooks)"]:::deploy
restic["restic binary
(auto-downloaded
on first run)"]:::deploy
subgraph john["john (192.168.65.33)"]
rp{{"reverse proxy
:443"}}:::deploy
ao{{"restic-rest-server
--private-repos
--append-only
:8002"}}:::deploy
store[/"/srv/cloud-data
/<discord_id>/..."/]:::pvc
bot{{"discord-bot"}}:::deploy
htp[/"/etc/restic-users
htpasswd"/]:::pvc
end
pl --> jar --> restic
restic ==>|"rest:https
<discord_id>:<password>"| rp
rp -->|"loopback"| ao
ao -->|"reads"| htp
ao -->|"writes"| store
bot -.->|"on /register:
htpasswd -B add"| htp
bot -.->|"DM password"| pl
classDef deploy fill:#d5e8d4,stroke:#82b366,color:#000
classDef pvc fill:#f5f5f5,stroke:#666,color:#000
classDef external fill:#f5f5f5,stroke:#666,color:#000,stroke-dasharray:5 5
```
## Auth & identity
| Element | Value |
|---|---|
| User identity | Discord ID (immutable, from discord-bot's existing account-card flow) |
| User credential | restic repo password = bcrypt'd in `/etc/restic-users` htpasswd file |
| URL pattern | `rest:https://cloud.tm.center/<discord_id>/` |
| Server isolation | `--private-repos` enforces URL path matches authenticated user |
discord-bot's `/register` flow extends to mint a random password, `htpasswd -B`-add it to the file, DM the password to the player. Existing flow stays untouched for non-cloud cases.
Revocation = `htpasswd -D` removes the user. No token store, no scope checks, no auth-service involvement.
## Client flow
### `cloud-sync.jar pull`
```
1. Load creds from <pack-folder>/.cloud-token (format: discord_id:password on one line)
2. Locate or auto-download restic binary into <jar dir>/restic-<version>/
3. restic -r rest:https://<url>/<discord_id>/ snapshots --latest 1 --json
4. If no snapshots → exit 0 (first run on this machine, nothing to restore)
5. restic restore latest --target <pack-folder> --include-from cloud-scope.txt
```
### `cloud-sync.jar push`
```
1. Same creds + restic locator as pull
2. restic backup <pack-folder> --files-from cloud-scope.txt --exclude-from cloud-exclude.txt
3. restic forget --keep-last 20 --keep-daily 7 --keep-weekly 4 --keep-monthly 6 --prune
```
The `forget --prune` step is allowed by `restic-rest-server --append-only` only if the client supplies `Force-Allow-Forget: true`. We DON'T enable this in `--append-only` mode — the server refuses forget. **Pruning happens server-side via a nightly cron** running `restic forget` with the operator's full-access password against the repo. Clients can only add, never remove.
## `cloud-scope.json` → restic args
| Input | Becomes |
|---|---|
| `include: ["options.txt", "config/", "journeymap/data/"]` | Listed in `cloud-scope.txt`, passed as `--files-from cloud-scope.txt` |
| `exclude: ["config/simple-mod-sync*", "**/*.log"]` | Listed in `cloud-exclude.txt`, passed as `--exclude-from cloud-exclude.txt` |
| `max_size_mb_per_file: 50` | restic doesn't have a per-file size cap; we filter during scope generation |
## Retention policy
Server-side cron (e.g., daily at 04:00 UTC) walks all per-user repos:
```bash
for repo in /srv/cloud-data/*/; do
user=$(basename "$repo")
restic -r "$repo" --password-file /etc/restic-master-pass \
forget --keep-last=20 --keep-daily=7 --keep-weekly=4 --keep-monthly=6 --prune
done
```
This requires the operator to have a "master password" that opens any user's repo — restic doesn't have that natively. **Options:**
1. **Init each user's repo with TWO keys** — one for the user, one for the operator-side pruner. restic supports multi-key per repo.
2. **Run the cron with each user's own password** — requires storing all user passwords server-side; defeats the encryption.
3. **Don't auto-prune** — let users push forever, trust quota at the rest-server level.
Recommendation: **option 1** (multi-key per repo). On `/register`, the bot calls `restic -r <repo> --password-file <operator> key add` to add the player's password as a SECOND key. The pruner cron uses the operator master password.
## What's in v1
- restic-rest-server with `--private-repos --append-only --htpasswd-file`
- discord-bot `/register` extension: mint password, htpasswd add, `restic init` repo, `restic key add` player key
- cloud-sync.jar that subprocesses restic for pull/push
- Auto-download restic binary on first run from upstream GitHub release
- Server-side nightly prune cron with operator-side master password key
## What's deferred
- restic version pinning / auto-update of the binary (treat like packwiz-installer self-update)
- Server-side `restic check` cron for repo integrity
- Per-user quota at the rest-server level (rest-server supports `--max-size` per-user via `.maxsize` file in each repo)
- Operator UI for "this player has 25 GB of cloud data, what's in it?"
- Cross-machine sync UX (you can play on PC A then PC B; latest snapshot wins. No conflict UI because restic doesn't merge — restore-latest is destructive by design.)
## Migration from cloud-svc
cloud-svc was never deployed. No user data to migrate. Action:
- Archive `Timemachine/cloud-svc` repo (mark archived, leave commits + DESIGN.md as a record)
- Delete `cloud_pull` / `cloud_push` from `frazclient/client.py`
- Remove `automc_cloud_svc.md` memory entry, replace with `automc_cloud_sync.md` pointing here
## Topology consequences for `automc/docs/network-exposure.md`
Same one public endpoint (`cloud.tm.center :443`), same reverse-proxy hardening checklist, same threat surface. Differences:
| Old (cloud-svc) | New (restic-ao) |
|---|---|
| Bearer token via auth-service `/auth/verify-key` | Basic auth via htpasswd in restic-rest-server |
| Token leak = one user's data | Password leak = one user's data |
| Custom Go service, 33 tests | Upstream restic-rest-server, well-audited |
| `127.0.0.1:9091` loopback bind | `127.0.0.1:8002` (existing restic-ao quadlet) |
| 60s in-memory cache of verified tokens | rest-server reads htpasswd per request |
Net: fewer moving parts, smaller attack surface.
## Repo layout post-pivot
| Repo | Purpose |
|---|---|
| `Timemachine/cloud-sync` (this) | Kotlin/Gradle JAR that subprocesses restic |
| `Timemachine/cloud-svc` | **Archived.** Snapshot of the abandoned path; commits + DESIGN.md kept as decision record |
| `Timemachine/discord-bot` | Extended `/register` flow to mint htpasswd creds + init restic repo |
| `Timemachine/automc` | `setup` wizard renders the restic-ao quadlet with the new flags; `database/schema.sql` unchanged |
## Pre-implementation checklist
- [ ] User reviews this design doc
- [ ] Confirm: server-side prune via operator master password (option 1 above)
- [ ] Confirm: archive cloud-svc rather than delete
- [ ] Confirm: cloud-sync.jar auto-downloads restic binary vs requires it pre-installed
- [ ] Confirm: nightly prune at 04:00 UTC vs after-each-push