1752ef05a6
HTTP API + on-disk storage + auth-service token verification + dev mode. 31 tests pass, vet clean. See DESIGN.md for the architecture and README.md for the operator surface. Pending: pg-backed per-user quota override, snapshot retention / blob GC, tarball-vs-manifest content cross-check, end-to-end deploy on john.
267 lines
12 KiB
Markdown
267 lines
12 KiB
Markdown
# cloud-svc design
|
||
|
||
Steam-Cloud-style per-user file sync for Minecraft clients. Player launches their client → state pulled. Player exits → state pushed. Across machines, conflicts resolved via per-file mtime + a pre-launch dialog when ambiguity remains.
|
||
|
||
## Identity
|
||
|
||
User identity = Discord ID (already issued by automc's account-card flow). Cloud token is a long-lived API key with scope `cloud:rw`, issued via `auth-service` and tied to the Discord ID.
|
||
|
||
```
|
||
client.py ──── Authorization: Bearer <cloud-token> ────► cloud-svc
|
||
│
|
||
▼ POST /auth/verify-key
|
||
auth-service ──► returns { discord_id, scopes }
|
||
```
|
||
|
||
cloud-svc never sees Discord IDs directly from the client — it always asks auth-service. Token revocation is a single DB UPDATE on `api_keys.revoked`.
|
||
|
||
## On-disk layout
|
||
|
||
`~/automc/cloud-data/` on `john`, owned by `automc` user. Per-Discord-ID prefix:
|
||
|
||
```
|
||
cloud-data/
|
||
<discord_id>/
|
||
manifest.json ← latest snapshot's per-file mtime + sha256 map
|
||
snapshots/
|
||
<snapshot_id>.tar.zst ← full content tarball (zstd-compressed)
|
||
blobs/ ← content-addressed dedupe (sha256-prefixed dirs)
|
||
ab/
|
||
abcdef0123... ← raw file content, referenced by manifest
|
||
cd/
|
||
cdef0123...
|
||
```
|
||
|
||
**Why both tarball AND blob store?**
|
||
- Tarballs are the snapshot's immutable historical record (good for restore-to-version).
|
||
- Blob store is the on-line read path: per-file fetch on conflict resolution. Tarballs would force decompress-everything for one file.
|
||
- Same content in two places = waste. Resolved by **hard-linking** the blob from inside the tarball at write time (cloud-svc writes one canonical copy, tarball entries are hardlinks). Linux supports this in tar.
|
||
|
||
If hardlinks turn out to be a pain across the rootless-podman boundary, fall back to blob-only and synthesize tarballs on demand for download. Defer the call.
|
||
|
||
## Snapshot retention
|
||
|
||
| Trigger | Action |
|
||
|---|---|
|
||
| `cloud_push` from client | New snapshot row. Tar written. Manifest updated. |
|
||
| Snapshot count > `RETAIN_LATEST` (default 30) | Oldest deleted. Hardlinked blobs that lose all refs are GC'd in a periodic job. |
|
||
| Per-user quota exceeded | Reject push with HTTP 413 + JSON `{error: "quota", used, limit}`. Client surfaces in UI. |
|
||
|
||
Snapshot IDs are ULIDs (timestamp-sortable, unique without coordination). Tarball name = `01J<26 chars>.tar.zst`.
|
||
|
||
## Per-file metadata
|
||
|
||
`manifest.json` (per user, latest snapshot only):
|
||
|
||
```json
|
||
{
|
||
"snapshot_id": "01J9XQK4Z3...",
|
||
"created_at": "2026-06-02T18:30:00Z",
|
||
"files": {
|
||
"options.txt": { "sha256": "ab...", "size": 5234, "mtime": "2026-06-02T18:25:11Z" },
|
||
"config/voicechat-client.json": { "sha256": "cd...", "size": 432, "mtime": "2026-06-01T22:14:03Z" },
|
||
"journeymap/data/sp/world1/...": { "sha256": "ef...", "size": 88123,"mtime": "2026-06-02T18:29:42Z" }
|
||
}
|
||
}
|
||
```
|
||
|
||
Stored only for the **latest** snapshot. Older snapshots' manifests live alongside `.tar.zst`. Reading old manifest = open tar header. Manifest mtime is **the file's mtime at push time on the client** — not the upload time. This is the source of truth for conflict resolution.
|
||
|
||
## HTTP API
|
||
|
||
All endpoints under `/v1/`. JSON unless noted. Bearer auth required.
|
||
|
||
| Method | Path | Purpose |
|
||
|---|---|---|
|
||
| `GET` | `/v1/manifest` | Return latest manifest for caller. 200 + JSON, or 204 if no snapshots yet. |
|
||
| `GET` | `/v1/blob/{sha256}` | Stream raw file content. 200 + bytes, or 404. Used during conflict resolution to fetch a specific file the client wants from the remote. |
|
||
| `POST` | `/v1/snapshot` | Upload new snapshot. Body = multipart `{manifest.json, snapshot.tar.zst}`. Server validates manifest matches tar contents (hash check), assigns snapshot_id, stores. Returns `{snapshot_id, snapshot_url}`. |
|
||
| `GET` | `/v1/snapshots` | List caller's snapshot IDs + timestamps (newest first). Used for restore UI / debugging. |
|
||
| `GET` | `/v1/snapshot/{id}` | Download a specific historical tarball. |
|
||
| `DELETE` | `/v1/snapshot/{id}` | Delete a specific snapshot (e.g., compromised data). Cannot delete the latest. |
|
||
| `GET` | `/v1/quota` | `{used_bytes, limit_bytes, snapshots, snapshot_limit}` |
|
||
|
||
All authentication errors return `401 {error: "auth"}`. Quota errors return `413 {error: "quota", ...}`. Schema validation `400`. Unknown user (token revoked / Discord ID stripped) `403`.
|
||
|
||
## Pull semantics (client side)
|
||
|
||
```
|
||
1. GET /v1/manifest → remote manifest
|
||
2. Walk local include-paths, compute (path, mtime, sha256) for each file
|
||
3. For each path in (remote ∪ local):
|
||
remote_only → DOWNLOAD via GET /v1/blob/{sha256}, write file, set mtime to remote.mtime
|
||
local_only → no-op (will push on exit)
|
||
both, sha matches → no-op
|
||
both, sha differs:
|
||
remote.mtime > local.mtime → AUTO_REMOTE: download, overwrite, set mtime
|
||
local.mtime > remote.mtime → AUTO_LOCAL: keep local (will push on exit)
|
||
|diff| ≤ 2s OR same mtime → CONFLICT: surface in dialog
|
||
```
|
||
|
||
The 2s threshold absorbs FS-level mtime rounding.
|
||
|
||
## Push semantics (client side)
|
||
|
||
```
|
||
1. Walk local include-paths, build per-file (path, mtime, sha256)
|
||
2. GET /v1/manifest, build delta:
|
||
in local, not in remote → new
|
||
in local, sha differs from remote → changed
|
||
in remote, not in local → DELETED (manifest entry, no blob)
|
||
3. Build manifest.json for new snapshot:
|
||
{snapshot_id, created_at, files: {<full current set>}}
|
||
4. Build tarball: only new + changed files (deleted entries omitted)
|
||
5. POST /v1/snapshot with manifest + tarball
|
||
6. On 200, save snapshot_id to local state file (used for next pull's known-base)
|
||
7. On 413 (quota), surface to user; offer pruning or scope reduction
|
||
```
|
||
|
||
## Conflict UI
|
||
|
||
Pre-launch (after packwiz, before MC starts). When `pull` finds files in CONFLICT state, render a dialog. Cross-platform via stdlib `tkinter`:
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────┐
|
||
│ Cloud sync — manual resolve needed │
|
||
├──────────────────────────────────────────────────────────┤
|
||
│ Some files differ between this machine and your cloud: │
|
||
│ │
|
||
│ File Local Remote │
|
||
│ options.txt 18:25 +0200 18:24 +0200 │
|
||
│ ( ) keep local ( ) use remote │
|
||
│ │
|
||
│ config/voicechat-client.json 22:14 +0200 22:13 +0200 │
|
||
│ ( ) keep local ( ) use remote │
|
||
│ │
|
||
│ [Use local for all] [Use remote for all] [Cancel launch] │
|
||
│ │
|
||
│ [Continue launch] │
|
||
└──────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
Defaults per-row to "use remote" (matches Steam's default — pull is destructive but consistent). User can override per file.
|
||
|
||
**Cancel launch** = abort `client.py`, return non-zero. Player can fix manually then re-run.
|
||
|
||
## What syncs (configurable per distribution)
|
||
|
||
`cloud-scope.json` next to `client.py`:
|
||
|
||
```json
|
||
{
|
||
"include": [
|
||
"options.txt",
|
||
"optionsof.txt",
|
||
"optionsshaders.txt",
|
||
"config/",
|
||
"journeymap/data/",
|
||
"screenshots/"
|
||
],
|
||
"exclude": [
|
||
"config/simple-mod-sync*",
|
||
"config/packwiz*",
|
||
"**/.tmp",
|
||
"**/cache/"
|
||
],
|
||
"max_size_mb_per_file": 50,
|
||
"max_total_mb": 200
|
||
}
|
||
```
|
||
|
||
Defaults are baked into client.py if the file is absent. JourneyMap (`journeymap/data/`) tracks per-server worlds, waypoints, settings — explicitly included.
|
||
|
||
## Auth-service integration
|
||
|
||
cloud-svc → auth-service contract (already exists in `auth-service/server.go`):
|
||
|
||
```
|
||
POST http://auth-service:9090/auth/verify-key
|
||
Authorization: <SM_API_KEY> # cloud-svc's own service token
|
||
Body: { "key": "<player-token>" }
|
||
|
||
200 { "user_id": "<discord_id>", "scopes": ["cloud:rw"] }
|
||
401 { "error": "invalid" }
|
||
403 { "error": "revoked" }
|
||
```
|
||
|
||
cloud-svc caches verified tokens in-memory for 60s to avoid hammering auth-service. Cache invalidated on 401 from client (forced refresh).
|
||
|
||
## Encryption at rest
|
||
|
||
**Optional, deferred.** Today: blobs are raw on `john`'s disk, owned by `automc` user, mode 0600. Filesystem permissions are the only barrier. Acceptable for pre-prod.
|
||
|
||
For production: per-user symmetric key (derived from Discord ID + master secret) encrypts blobs with AES-GCM. Manifest stored with the encrypted blob mappings; client provides key per request. Adds significant complexity — defer until production scale.
|
||
|
||
## Quadlet template
|
||
|
||
```ini
|
||
[Unit]
|
||
Description=automc cloud-svc (player state sync)
|
||
After=automc-pg.service auth-service.service
|
||
|
||
[Container]
|
||
ContainerName=cloud-svc
|
||
Image=git.timemachine.center/timemachine/cloud-svc:latest
|
||
Network=automc-net
|
||
NetworkAlias=cloud-svc
|
||
Environment=TZ={{ tz }}
|
||
EnvironmentFile=%h/automc/secrets/cloud-svc.env
|
||
PublishPort=127.0.0.1:9091:9091
|
||
Volume=%h/automc/cloud-data:/data:Z
|
||
|
||
[Service]
|
||
Restart=always
|
||
|
||
[Install]
|
||
WantedBy=default.target
|
||
```
|
||
|
||
Bound to `127.0.0.1:9091` — players reach it only via SSH tunnel during dev. In real deployment, exposed via a reverse-proxy on the same hostname they use for the packwiz pack URL (e.g., `packs.timemachine.center/cloud/...`).
|
||
|
||
## Out of scope (v1)
|
||
|
||
- **Selective restore from old snapshot** — UI for "go back to last Tuesday's state". The API supports it (`GET /v1/snapshot/{id}` + manual extraction); the UI is deferred.
|
||
- **Multi-device live conflict** (player on PC + laptop simultaneously) — single-machine assumption, race documented.
|
||
- **Compression tuning** — zstd level 3 default. May tune up to 6 if disk pressure observed.
|
||
- **Anti-replay on tokens** — straight bearer auth. If a token leaks, revoke it. Not a primary attack surface.
|
||
- **Cross-server modpack-aware filtering** — cloud-scope.json is per-distribution, not per-server. Different servers might want different scopes; defer.
|
||
|
||
## Stack
|
||
|
||
- **Go 1.24+** (matches other automc services)
|
||
- `jackc/pgx/v5` for pg (if metadata stored there; alternative: sqlite per-deploy)
|
||
- `klauspost/compress/zstd` for tarball compression
|
||
- Standard `archive/tar` for tarball assembly
|
||
- No external HTTP framework — `net/http` + `gorilla/mux` style routing (or stdlib `http.ServeMux` Go 1.22+ pattern syntax)
|
||
|
||
## File / module layout
|
||
|
||
```
|
||
cloud-svc/
|
||
cmd/cloud-svc/main.go
|
||
internal/
|
||
api/ ← HTTP handlers per endpoint
|
||
storage/ ← on-disk blob + tarball R/W, GC
|
||
manifest/ ← manifest.json types + (de)serialize + validate
|
||
auth/ ← auth-service client (token verify + cache)
|
||
quota/ ← per-user quota tracking
|
||
database/migrations/ ← if pg-backed metadata
|
||
Dockerfile
|
||
Makefile
|
||
.gitea/workflows/ci.yaml
|
||
docs/ARCHITECTURE.md
|
||
```
|
||
|
||
~1500 LOC Go total estimate.
|
||
|
||
## Open questions
|
||
|
||
1. **Metadata in automc-pg or sqlite-per-instance?**
|
||
- pg: shared with other services, easier ops, schema migrations same pipeline.
|
||
- sqlite: zero coupling, faster local I/O, harder to query externally.
|
||
- Recommendation: **pg** for consistency with the rest of automc.
|
||
2. **Quota source**: hardcoded per-user, or per-user-row in DB?
|
||
- DB row allows admin to bump limits per player. Use `users.cloud_quota_bytes` column (nullable, default to global).
|
||
3. **Reverse proxy in front of cloud-svc**: needed for player-facing URL (`packs.timemachine.center/cloud/...`)?
|
||
- Either nginx fronting cloud-svc, or expose cloud-svc directly with its own TLS via Caddy/something. Defer.
|