# cloud-svc design Steam-Cloud-style per-user file sync for Minecraft clients. Player launches their client → state pulled. Player exits → state pushed. Across machines, conflicts resolved via per-file mtime + a pre-launch dialog when ambiguity remains. ## Identity User identity = Discord ID (already issued by automc's account-card flow). Cloud token is a long-lived API key with scope `cloud:rw`, issued via `auth-service` and tied to the Discord ID. ``` client.py ──── Authorization: Bearer ────► cloud-svc │ ▼ POST /auth/verify-key auth-service ──► returns { discord_id, scopes } ``` cloud-svc never sees Discord IDs directly from the client — it always asks auth-service. Token revocation is a single DB UPDATE on `api_keys.revoked`. ## On-disk layout `~/automc/cloud-data/` on `john`, owned by `automc` user. Per-Discord-ID prefix: ``` cloud-data/ / manifest.json ← latest snapshot's per-file mtime + sha256 map snapshots/ .tar.zst ← full content tarball (zstd-compressed) blobs/ ← content-addressed dedupe (sha256-prefixed dirs) ab/ abcdef0123... ← raw file content, referenced by manifest cd/ cdef0123... ``` **Why both tarball AND blob store?** - Tarballs are the snapshot's immutable historical record (good for restore-to-version). - Blob store is the on-line read path: per-file fetch on conflict resolution. Tarballs would force decompress-everything for one file. - Same content in two places = waste. Resolved by **hard-linking** the blob from inside the tarball at write time (cloud-svc writes one canonical copy, tarball entries are hardlinks). Linux supports this in tar. If hardlinks turn out to be a pain across the rootless-podman boundary, fall back to blob-only and synthesize tarballs on demand for download. Defer the call. ## Snapshot retention | Trigger | Action | |---|---| | `cloud_push` from client | New snapshot row. Tar written. Manifest updated. | | Snapshot count > `RETAIN_LATEST` (default 30) | Oldest deleted. Hardlinked blobs that lose all refs are GC'd in a periodic job. | | Per-user quota exceeded | Reject push with HTTP 413 + JSON `{error: "quota", used, limit}`. Client surfaces in UI. | Snapshot IDs are ULIDs (timestamp-sortable, unique without coordination). Tarball name = `01J<26 chars>.tar.zst`. ## Per-file metadata `manifest.json` (per user, latest snapshot only): ```json { "snapshot_id": "01J9XQK4Z3...", "created_at": "2026-06-02T18:30:00Z", "files": { "options.txt": { "sha256": "ab...", "size": 5234, "mtime": "2026-06-02T18:25:11Z" }, "config/voicechat-client.json": { "sha256": "cd...", "size": 432, "mtime": "2026-06-01T22:14:03Z" }, "journeymap/data/sp/world1/...": { "sha256": "ef...", "size": 88123,"mtime": "2026-06-02T18:29:42Z" } } } ``` Stored only for the **latest** snapshot. Older snapshots' manifests live alongside `.tar.zst`. Reading old manifest = open tar header. Manifest mtime is **the file's mtime at push time on the client** — not the upload time. This is the source of truth for conflict resolution. ## HTTP API All endpoints under `/v1/`. JSON unless noted. Bearer auth required. | Method | Path | Purpose | |---|---|---| | `GET` | `/v1/manifest` | Return latest manifest for caller. 200 + JSON, or 204 if no snapshots yet. | | `GET` | `/v1/blob/{sha256}` | Stream raw file content. 200 + bytes, or 404. Used during conflict resolution to fetch a specific file the client wants from the remote. | | `POST` | `/v1/snapshot` | Upload new snapshot. Body = multipart `{manifest.json, snapshot.tar.zst}`. Server validates manifest matches tar contents (hash check), assigns snapshot_id, stores. Returns `{snapshot_id, snapshot_url}`. | | `GET` | `/v1/snapshots` | List caller's snapshot IDs + timestamps (newest first). Used for restore UI / debugging. | | `GET` | `/v1/snapshot/{id}` | Download a specific historical tarball. | | `DELETE` | `/v1/snapshot/{id}` | Delete a specific snapshot (e.g., compromised data). Cannot delete the latest. | | `GET` | `/v1/quota` | `{used_bytes, limit_bytes, snapshots, snapshot_limit}` | All authentication errors return `401 {error: "auth"}`. Quota errors return `413 {error: "quota", ...}`. Schema validation `400`. Unknown user (token revoked / Discord ID stripped) `403`. ## Pull semantics (client side) ``` 1. GET /v1/manifest → remote manifest 2. Walk local include-paths, compute (path, mtime, sha256) for each file 3. For each path in (remote ∪ local): remote_only → DOWNLOAD via GET /v1/blob/{sha256}, write file, set mtime to remote.mtime local_only → no-op (will push on exit) both, sha matches → no-op both, sha differs: remote.mtime > local.mtime → AUTO_REMOTE: download, overwrite, set mtime local.mtime > remote.mtime → AUTO_LOCAL: keep local (will push on exit) |diff| ≤ 2s OR same mtime → CONFLICT: surface in dialog ``` The 2s threshold absorbs FS-level mtime rounding. ## Push semantics (client side) ``` 1. Walk local include-paths, build per-file (path, mtime, sha256) 2. GET /v1/manifest, build delta: in local, not in remote → new in local, sha differs from remote → changed in remote, not in local → DELETED (manifest entry, no blob) 3. Build manifest.json for new snapshot: {snapshot_id, created_at, files: {}} 4. Build tarball: only new + changed files (deleted entries omitted) 5. POST /v1/snapshot with manifest + tarball 6. On 200, save snapshot_id to local state file (used for next pull's known-base) 7. On 413 (quota), surface to user; offer pruning or scope reduction ``` ## Conflict UI Pre-launch (after packwiz, before MC starts). When `pull` finds files in CONFLICT state, render a dialog. Cross-platform via stdlib `tkinter`: ``` ┌──────────────────────────────────────────────────────────┐ │ Cloud sync — manual resolve needed │ ├──────────────────────────────────────────────────────────┤ │ Some files differ between this machine and your cloud: │ │ │ │ File Local Remote │ │ options.txt 18:25 +0200 18:24 +0200 │ │ ( ) keep local ( ) use remote │ │ │ │ config/voicechat-client.json 22:14 +0200 22:13 +0200 │ │ ( ) keep local ( ) use remote │ │ │ │ [Use local for all] [Use remote for all] [Cancel launch] │ │ │ │ [Continue launch] │ └──────────────────────────────────────────────────────────┘ ``` Defaults per-row to "use remote" (matches Steam's default — pull is destructive but consistent). User can override per file. **Cancel launch** = abort `client.py`, return non-zero. Player can fix manually then re-run. ## What syncs (configurable per distribution) `cloud-scope.json` next to `client.py`: ```json { "include": [ "options.txt", "optionsof.txt", "optionsshaders.txt", "config/", "journeymap/data/", "screenshots/" ], "exclude": [ "config/simple-mod-sync*", "config/packwiz*", "**/.tmp", "**/cache/" ], "max_size_mb_per_file": 50, "max_total_mb": 200 } ``` Defaults are baked into client.py if the file is absent. JourneyMap (`journeymap/data/`) tracks per-server worlds, waypoints, settings — explicitly included. ## Auth-service integration cloud-svc → auth-service contract (already exists in `auth-service/server.go`): ``` POST http://auth-service:9090/auth/verify-key Authorization: # cloud-svc's own service token Body: { "key": "" } 200 { "user_id": "", "scopes": ["cloud:rw"] } 401 { "error": "invalid" } 403 { "error": "revoked" } ``` cloud-svc caches verified tokens in-memory for 60s to avoid hammering auth-service. Cache invalidated on 401 from client (forced refresh). ## Encryption at rest **Optional, deferred.** Today: blobs are raw on `john`'s disk, owned by `automc` user, mode 0600. Filesystem permissions are the only barrier. Acceptable for pre-prod. For production: per-user symmetric key (derived from Discord ID + master secret) encrypts blobs with AES-GCM. Manifest stored with the encrypted blob mappings; client provides key per request. Adds significant complexity — defer until production scale. ## Quadlet template ```ini [Unit] Description=automc cloud-svc (player state sync) After=automc-pg.service auth-service.service [Container] ContainerName=cloud-svc Image=git.timemachine.center/timemachine/cloud-svc:latest Network=automc-net NetworkAlias=cloud-svc Environment=TZ={{ tz }} EnvironmentFile=%h/automc/secrets/cloud-svc.env PublishPort=127.0.0.1:9091:9091 Volume=%h/automc/cloud-data:/data:Z [Service] Restart=always [Install] WantedBy=default.target ``` Bound to `127.0.0.1:9091` — players reach it only via SSH tunnel during dev. In real deployment, exposed via a reverse-proxy on the same hostname they use for the packwiz pack URL (e.g., `packs.timemachine.center/cloud/...`). ## Out of scope (v1) - **Selective restore from old snapshot** — UI for "go back to last Tuesday's state". The API supports it (`GET /v1/snapshot/{id}` + manual extraction); the UI is deferred. - **Multi-device live conflict** (player on PC + laptop simultaneously) — single-machine assumption, race documented. - **Compression tuning** — zstd level 3 default. May tune up to 6 if disk pressure observed. - **Anti-replay on tokens** — straight bearer auth. If a token leaks, revoke it. Not a primary attack surface. - **Cross-server modpack-aware filtering** — cloud-scope.json is per-distribution, not per-server. Different servers might want different scopes; defer. ## Stack - **Go 1.24+** (matches other automc services) - `jackc/pgx/v5` for pg (if metadata stored there; alternative: sqlite per-deploy) - `klauspost/compress/zstd` for tarball compression - Standard `archive/tar` for tarball assembly - No external HTTP framework — `net/http` + `gorilla/mux` style routing (or stdlib `http.ServeMux` Go 1.22+ pattern syntax) ## File / module layout ``` cloud-svc/ cmd/cloud-svc/main.go internal/ api/ ← HTTP handlers per endpoint storage/ ← on-disk blob + tarball R/W, GC manifest/ ← manifest.json types + (de)serialize + validate auth/ ← auth-service client (token verify + cache) quota/ ← per-user quota tracking database/migrations/ ← if pg-backed metadata Dockerfile Makefile .gitea/workflows/ci.yaml docs/ARCHITECTURE.md ``` ~1500 LOC Go total estimate. ## Open questions 1. **Metadata in automc-pg or sqlite-per-instance?** - pg: shared with other services, easier ops, schema migrations same pipeline. - sqlite: zero coupling, faster local I/O, harder to query externally. - Recommendation: **pg** for consistency with the rest of automc. 2. **Quota source**: hardcoded per-user, or per-user-row in DB? - DB row allows admin to bump limits per player. Use `users.cloud_quota_bytes` column (nullable, default to global). 3. **Reverse proxy in front of cloud-svc**: needed for player-facing URL (`packs.timemachine.center/cloud/...`)? - Either nginx fronting cloud-svc, or expose cloud-svc directly with its own TLS via Caddy/something. Defer.