initial: Steam-Cloud-style per-user state sync skeleton
HTTP API + on-disk storage + auth-service token verification + dev mode. 31 tests pass, vet clean. See DESIGN.md for the architecture and README.md for the operator surface. Pending: pg-backed per-user quota override, snapshot retention / blob GC, tarball-vs-manifest content cross-check, end-to-end deploy on john.
This commit is contained in:
@@ -0,0 +1,266 @@
|
||||
# cloud-svc design
|
||||
|
||||
Steam-Cloud-style per-user file sync for Minecraft clients. Player launches their client → state pulled. Player exits → state pushed. Across machines, conflicts resolved via per-file mtime + a pre-launch dialog when ambiguity remains.
|
||||
|
||||
## Identity
|
||||
|
||||
User identity = Discord ID (already issued by automc's account-card flow). Cloud token is a long-lived API key with scope `cloud:rw`, issued via `auth-service` and tied to the Discord ID.
|
||||
|
||||
```
|
||||
client.py ──── Authorization: Bearer <cloud-token> ────► cloud-svc
|
||||
│
|
||||
▼ POST /auth/verify-key
|
||||
auth-service ──► returns { discord_id, scopes }
|
||||
```
|
||||
|
||||
cloud-svc never sees Discord IDs directly from the client — it always asks auth-service. Token revocation is a single DB UPDATE on `api_keys.revoked`.
|
||||
|
||||
## On-disk layout
|
||||
|
||||
`~/automc/cloud-data/` on `john`, owned by `automc` user. Per-Discord-ID prefix:
|
||||
|
||||
```
|
||||
cloud-data/
|
||||
<discord_id>/
|
||||
manifest.json ← latest snapshot's per-file mtime + sha256 map
|
||||
snapshots/
|
||||
<snapshot_id>.tar.zst ← full content tarball (zstd-compressed)
|
||||
blobs/ ← content-addressed dedupe (sha256-prefixed dirs)
|
||||
ab/
|
||||
abcdef0123... ← raw file content, referenced by manifest
|
||||
cd/
|
||||
cdef0123...
|
||||
```
|
||||
|
||||
**Why both tarball AND blob store?**
|
||||
- Tarballs are the snapshot's immutable historical record (good for restore-to-version).
|
||||
- Blob store is the on-line read path: per-file fetch on conflict resolution. Tarballs would force decompress-everything for one file.
|
||||
- Same content in two places = waste. Resolved by **hard-linking** the blob from inside the tarball at write time (cloud-svc writes one canonical copy, tarball entries are hardlinks). Linux supports this in tar.
|
||||
|
||||
If hardlinks turn out to be a pain across the rootless-podman boundary, fall back to blob-only and synthesize tarballs on demand for download. Defer the call.
|
||||
|
||||
## Snapshot retention
|
||||
|
||||
| Trigger | Action |
|
||||
|---|---|
|
||||
| `cloud_push` from client | New snapshot row. Tar written. Manifest updated. |
|
||||
| Snapshot count > `RETAIN_LATEST` (default 30) | Oldest deleted. Hardlinked blobs that lose all refs are GC'd in a periodic job. |
|
||||
| Per-user quota exceeded | Reject push with HTTP 413 + JSON `{error: "quota", used, limit}`. Client surfaces in UI. |
|
||||
|
||||
Snapshot IDs are ULIDs (timestamp-sortable, unique without coordination). Tarball name = `01J<26 chars>.tar.zst`.
|
||||
|
||||
## Per-file metadata
|
||||
|
||||
`manifest.json` (per user, latest snapshot only):
|
||||
|
||||
```json
|
||||
{
|
||||
"snapshot_id": "01J9XQK4Z3...",
|
||||
"created_at": "2026-06-02T18:30:00Z",
|
||||
"files": {
|
||||
"options.txt": { "sha256": "ab...", "size": 5234, "mtime": "2026-06-02T18:25:11Z" },
|
||||
"config/voicechat-client.json": { "sha256": "cd...", "size": 432, "mtime": "2026-06-01T22:14:03Z" },
|
||||
"journeymap/data/sp/world1/...": { "sha256": "ef...", "size": 88123,"mtime": "2026-06-02T18:29:42Z" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Stored only for the **latest** snapshot. Older snapshots' manifests live alongside `.tar.zst`. Reading old manifest = open tar header. Manifest mtime is **the file's mtime at push time on the client** — not the upload time. This is the source of truth for conflict resolution.
|
||||
|
||||
## HTTP API
|
||||
|
||||
All endpoints under `/v1/`. JSON unless noted. Bearer auth required.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|---|---|---|
|
||||
| `GET` | `/v1/manifest` | Return latest manifest for caller. 200 + JSON, or 204 if no snapshots yet. |
|
||||
| `GET` | `/v1/blob/{sha256}` | Stream raw file content. 200 + bytes, or 404. Used during conflict resolution to fetch a specific file the client wants from the remote. |
|
||||
| `POST` | `/v1/snapshot` | Upload new snapshot. Body = multipart `{manifest.json, snapshot.tar.zst}`. Server validates manifest matches tar contents (hash check), assigns snapshot_id, stores. Returns `{snapshot_id, snapshot_url}`. |
|
||||
| `GET` | `/v1/snapshots` | List caller's snapshot IDs + timestamps (newest first). Used for restore UI / debugging. |
|
||||
| `GET` | `/v1/snapshot/{id}` | Download a specific historical tarball. |
|
||||
| `DELETE` | `/v1/snapshot/{id}` | Delete a specific snapshot (e.g., compromised data). Cannot delete the latest. |
|
||||
| `GET` | `/v1/quota` | `{used_bytes, limit_bytes, snapshots, snapshot_limit}` |
|
||||
|
||||
All authentication errors return `401 {error: "auth"}`. Quota errors return `413 {error: "quota", ...}`. Schema validation `400`. Unknown user (token revoked / Discord ID stripped) `403`.
|
||||
|
||||
## Pull semantics (client side)
|
||||
|
||||
```
|
||||
1. GET /v1/manifest → remote manifest
|
||||
2. Walk local include-paths, compute (path, mtime, sha256) for each file
|
||||
3. For each path in (remote ∪ local):
|
||||
remote_only → DOWNLOAD via GET /v1/blob/{sha256}, write file, set mtime to remote.mtime
|
||||
local_only → no-op (will push on exit)
|
||||
both, sha matches → no-op
|
||||
both, sha differs:
|
||||
remote.mtime > local.mtime → AUTO_REMOTE: download, overwrite, set mtime
|
||||
local.mtime > remote.mtime → AUTO_LOCAL: keep local (will push on exit)
|
||||
|diff| ≤ 2s OR same mtime → CONFLICT: surface in dialog
|
||||
```
|
||||
|
||||
The 2s threshold absorbs FS-level mtime rounding.
|
||||
|
||||
## Push semantics (client side)
|
||||
|
||||
```
|
||||
1. Walk local include-paths, build per-file (path, mtime, sha256)
|
||||
2. GET /v1/manifest, build delta:
|
||||
in local, not in remote → new
|
||||
in local, sha differs from remote → changed
|
||||
in remote, not in local → DELETED (manifest entry, no blob)
|
||||
3. Build manifest.json for new snapshot:
|
||||
{snapshot_id, created_at, files: {<full current set>}}
|
||||
4. Build tarball: only new + changed files (deleted entries omitted)
|
||||
5. POST /v1/snapshot with manifest + tarball
|
||||
6. On 200, save snapshot_id to local state file (used for next pull's known-base)
|
||||
7. On 413 (quota), surface to user; offer pruning or scope reduction
|
||||
```
|
||||
|
||||
## Conflict UI
|
||||
|
||||
Pre-launch (after packwiz, before MC starts). When `pull` finds files in CONFLICT state, render a dialog. Cross-platform via stdlib `tkinter`:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ Cloud sync — manual resolve needed │
|
||||
├──────────────────────────────────────────────────────────┤
|
||||
│ Some files differ between this machine and your cloud: │
|
||||
│ │
|
||||
│ File Local Remote │
|
||||
│ options.txt 18:25 +0200 18:24 +0200 │
|
||||
│ ( ) keep local ( ) use remote │
|
||||
│ │
|
||||
│ config/voicechat-client.json 22:14 +0200 22:13 +0200 │
|
||||
│ ( ) keep local ( ) use remote │
|
||||
│ │
|
||||
│ [Use local for all] [Use remote for all] [Cancel launch] │
|
||||
│ │
|
||||
│ [Continue launch] │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Defaults per-row to "use remote" (matches Steam's default — pull is destructive but consistent). User can override per file.
|
||||
|
||||
**Cancel launch** = abort `client.py`, return non-zero. Player can fix manually then re-run.
|
||||
|
||||
## What syncs (configurable per distribution)
|
||||
|
||||
`cloud-scope.json` next to `client.py`:
|
||||
|
||||
```json
|
||||
{
|
||||
"include": [
|
||||
"options.txt",
|
||||
"optionsof.txt",
|
||||
"optionsshaders.txt",
|
||||
"config/",
|
||||
"journeymap/data/",
|
||||
"screenshots/"
|
||||
],
|
||||
"exclude": [
|
||||
"config/simple-mod-sync*",
|
||||
"config/packwiz*",
|
||||
"**/.tmp",
|
||||
"**/cache/"
|
||||
],
|
||||
"max_size_mb_per_file": 50,
|
||||
"max_total_mb": 200
|
||||
}
|
||||
```
|
||||
|
||||
Defaults are baked into client.py if the file is absent. JourneyMap (`journeymap/data/`) tracks per-server worlds, waypoints, settings — explicitly included.
|
||||
|
||||
## Auth-service integration
|
||||
|
||||
cloud-svc → auth-service contract (already exists in `auth-service/server.go`):
|
||||
|
||||
```
|
||||
POST http://auth-service:9090/auth/verify-key
|
||||
Authorization: <SM_API_KEY> # cloud-svc's own service token
|
||||
Body: { "key": "<player-token>" }
|
||||
|
||||
200 { "user_id": "<discord_id>", "scopes": ["cloud:rw"] }
|
||||
401 { "error": "invalid" }
|
||||
403 { "error": "revoked" }
|
||||
```
|
||||
|
||||
cloud-svc caches verified tokens in-memory for 60s to avoid hammering auth-service. Cache invalidated on 401 from client (forced refresh).
|
||||
|
||||
## Encryption at rest
|
||||
|
||||
**Optional, deferred.** Today: blobs are raw on `john`'s disk, owned by `automc` user, mode 0600. Filesystem permissions are the only barrier. Acceptable for pre-prod.
|
||||
|
||||
For production: per-user symmetric key (derived from Discord ID + master secret) encrypts blobs with AES-GCM. Manifest stored with the encrypted blob mappings; client provides key per request. Adds significant complexity — defer until production scale.
|
||||
|
||||
## Quadlet template
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=automc cloud-svc (player state sync)
|
||||
After=automc-pg.service auth-service.service
|
||||
|
||||
[Container]
|
||||
ContainerName=cloud-svc
|
||||
Image=git.timemachine.center/timemachine/cloud-svc:latest
|
||||
Network=automc-net
|
||||
NetworkAlias=cloud-svc
|
||||
Environment=TZ={{ tz }}
|
||||
EnvironmentFile=%h/automc/secrets/cloud-svc.env
|
||||
PublishPort=127.0.0.1:9091:9091
|
||||
Volume=%h/automc/cloud-data:/data:Z
|
||||
|
||||
[Service]
|
||||
Restart=always
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
```
|
||||
|
||||
Bound to `127.0.0.1:9091` — players reach it only via SSH tunnel during dev. In real deployment, exposed via a reverse-proxy on the same hostname they use for the packwiz pack URL (e.g., `packs.timemachine.center/cloud/...`).
|
||||
|
||||
## Out of scope (v1)
|
||||
|
||||
- **Selective restore from old snapshot** — UI for "go back to last Tuesday's state". The API supports it (`GET /v1/snapshot/{id}` + manual extraction); the UI is deferred.
|
||||
- **Multi-device live conflict** (player on PC + laptop simultaneously) — single-machine assumption, race documented.
|
||||
- **Compression tuning** — zstd level 3 default. May tune up to 6 if disk pressure observed.
|
||||
- **Anti-replay on tokens** — straight bearer auth. If a token leaks, revoke it. Not a primary attack surface.
|
||||
- **Cross-server modpack-aware filtering** — cloud-scope.json is per-distribution, not per-server. Different servers might want different scopes; defer.
|
||||
|
||||
## Stack
|
||||
|
||||
- **Go 1.24+** (matches other automc services)
|
||||
- `jackc/pgx/v5` for pg (if metadata stored there; alternative: sqlite per-deploy)
|
||||
- `klauspost/compress/zstd` for tarball compression
|
||||
- Standard `archive/tar` for tarball assembly
|
||||
- No external HTTP framework — `net/http` + `gorilla/mux` style routing (or stdlib `http.ServeMux` Go 1.22+ pattern syntax)
|
||||
|
||||
## File / module layout
|
||||
|
||||
```
|
||||
cloud-svc/
|
||||
cmd/cloud-svc/main.go
|
||||
internal/
|
||||
api/ ← HTTP handlers per endpoint
|
||||
storage/ ← on-disk blob + tarball R/W, GC
|
||||
manifest/ ← manifest.json types + (de)serialize + validate
|
||||
auth/ ← auth-service client (token verify + cache)
|
||||
quota/ ← per-user quota tracking
|
||||
database/migrations/ ← if pg-backed metadata
|
||||
Dockerfile
|
||||
Makefile
|
||||
.gitea/workflows/ci.yaml
|
||||
docs/ARCHITECTURE.md
|
||||
```
|
||||
|
||||
~1500 LOC Go total estimate.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Metadata in automc-pg or sqlite-per-instance?**
|
||||
- pg: shared with other services, easier ops, schema migrations same pipeline.
|
||||
- sqlite: zero coupling, faster local I/O, harder to query externally.
|
||||
- Recommendation: **pg** for consistency with the rest of automc.
|
||||
2. **Quota source**: hardcoded per-user, or per-user-row in DB?
|
||||
- DB row allows admin to bump limits per player. Use `users.cloud_quota_bytes` column (nullable, default to global).
|
||||
3. **Reverse proxy in front of cloud-svc**: needed for player-facing URL (`packs.timemachine.center/cloud/...`)?
|
||||
- Either nginx fronting cloud-svc, or expose cloud-svc directly with its own TLS via Caddy/something. Defer.
|
||||
Reference in New Issue
Block a user