HTTP API + on-disk storage + auth-service token verification + dev mode. 31 tests pass, vet clean. See DESIGN.md for the architecture and README.md for the operator surface. Pending: pg-backed per-user quota override, snapshot retention / blob GC, tarball-vs-manifest content cross-check, end-to-end deploy on john.
12 KiB
cloud-svc design
Steam-Cloud-style per-user file sync for Minecraft clients. Player launches their client → state pulled. Player exits → state pushed. Across machines, conflicts resolved via per-file mtime + a pre-launch dialog when ambiguity remains.
Identity
User identity = Discord ID (already issued by automc's account-card flow). Cloud token is a long-lived API key with scope cloud:rw, issued via auth-service and tied to the Discord ID.
client.py ──── Authorization: Bearer <cloud-token> ────► cloud-svc
│
▼ POST /auth/verify-key
auth-service ──► returns { discord_id, scopes }
cloud-svc never sees Discord IDs directly from the client — it always asks auth-service. Token revocation is a single DB UPDATE on api_keys.revoked.
On-disk layout
~/automc/cloud-data/ on john, owned by automc user. Per-Discord-ID prefix:
cloud-data/
<discord_id>/
manifest.json ← latest snapshot's per-file mtime + sha256 map
snapshots/
<snapshot_id>.tar.zst ← full content tarball (zstd-compressed)
blobs/ ← content-addressed dedupe (sha256-prefixed dirs)
ab/
abcdef0123... ← raw file content, referenced by manifest
cd/
cdef0123...
Why both tarball AND blob store?
- Tarballs are the snapshot's immutable historical record (good for restore-to-version).
- Blob store is the on-line read path: per-file fetch on conflict resolution. Tarballs would force decompress-everything for one file.
- Same content in two places = waste. Resolved by hard-linking the blob from inside the tarball at write time (cloud-svc writes one canonical copy, tarball entries are hardlinks). Linux supports this in tar.
If hardlinks turn out to be a pain across the rootless-podman boundary, fall back to blob-only and synthesize tarballs on demand for download. Defer the call.
Snapshot retention
| Trigger | Action |
|---|---|
cloud_push from client |
New snapshot row. Tar written. Manifest updated. |
Snapshot count > RETAIN_LATEST (default 30) |
Oldest deleted. Hardlinked blobs that lose all refs are GC'd in a periodic job. |
| Per-user quota exceeded | Reject push with HTTP 413 + JSON {error: "quota", used, limit}. Client surfaces in UI. |
Snapshot IDs are ULIDs (timestamp-sortable, unique without coordination). Tarball name = 01J<26 chars>.tar.zst.
Per-file metadata
manifest.json (per user, latest snapshot only):
{
"snapshot_id": "01J9XQK4Z3...",
"created_at": "2026-06-02T18:30:00Z",
"files": {
"options.txt": { "sha256": "ab...", "size": 5234, "mtime": "2026-06-02T18:25:11Z" },
"config/voicechat-client.json": { "sha256": "cd...", "size": 432, "mtime": "2026-06-01T22:14:03Z" },
"journeymap/data/sp/world1/...": { "sha256": "ef...", "size": 88123,"mtime": "2026-06-02T18:29:42Z" }
}
}
Stored only for the latest snapshot. Older snapshots' manifests live alongside .tar.zst. Reading old manifest = open tar header. Manifest mtime is the file's mtime at push time on the client — not the upload time. This is the source of truth for conflict resolution.
HTTP API
All endpoints under /v1/. JSON unless noted. Bearer auth required.
| Method | Path | Purpose |
|---|---|---|
GET |
/v1/manifest |
Return latest manifest for caller. 200 + JSON, or 204 if no snapshots yet. |
GET |
/v1/blob/{sha256} |
Stream raw file content. 200 + bytes, or 404. Used during conflict resolution to fetch a specific file the client wants from the remote. |
POST |
/v1/snapshot |
Upload new snapshot. Body = multipart {manifest.json, snapshot.tar.zst}. Server validates manifest matches tar contents (hash check), assigns snapshot_id, stores. Returns {snapshot_id, snapshot_url}. |
GET |
/v1/snapshots |
List caller's snapshot IDs + timestamps (newest first). Used for restore UI / debugging. |
GET |
/v1/snapshot/{id} |
Download a specific historical tarball. |
DELETE |
/v1/snapshot/{id} |
Delete a specific snapshot (e.g., compromised data). Cannot delete the latest. |
GET |
/v1/quota |
{used_bytes, limit_bytes, snapshots, snapshot_limit} |
All authentication errors return 401 {error: "auth"}. Quota errors return 413 {error: "quota", ...}. Schema validation 400. Unknown user (token revoked / Discord ID stripped) 403.
Pull semantics (client side)
1. GET /v1/manifest → remote manifest
2. Walk local include-paths, compute (path, mtime, sha256) for each file
3. For each path in (remote ∪ local):
remote_only → DOWNLOAD via GET /v1/blob/{sha256}, write file, set mtime to remote.mtime
local_only → no-op (will push on exit)
both, sha matches → no-op
both, sha differs:
remote.mtime > local.mtime → AUTO_REMOTE: download, overwrite, set mtime
local.mtime > remote.mtime → AUTO_LOCAL: keep local (will push on exit)
|diff| ≤ 2s OR same mtime → CONFLICT: surface in dialog
The 2s threshold absorbs FS-level mtime rounding.
Push semantics (client side)
1. Walk local include-paths, build per-file (path, mtime, sha256)
2. GET /v1/manifest, build delta:
in local, not in remote → new
in local, sha differs from remote → changed
in remote, not in local → DELETED (manifest entry, no blob)
3. Build manifest.json for new snapshot:
{snapshot_id, created_at, files: {<full current set>}}
4. Build tarball: only new + changed files (deleted entries omitted)
5. POST /v1/snapshot with manifest + tarball
6. On 200, save snapshot_id to local state file (used for next pull's known-base)
7. On 413 (quota), surface to user; offer pruning or scope reduction
Conflict UI
Pre-launch (after packwiz, before MC starts). When pull finds files in CONFLICT state, render a dialog. Cross-platform via stdlib tkinter:
┌──────────────────────────────────────────────────────────┐
│ Cloud sync — manual resolve needed │
├──────────────────────────────────────────────────────────┤
│ Some files differ between this machine and your cloud: │
│ │
│ File Local Remote │
│ options.txt 18:25 +0200 18:24 +0200 │
│ ( ) keep local ( ) use remote │
│ │
│ config/voicechat-client.json 22:14 +0200 22:13 +0200 │
│ ( ) keep local ( ) use remote │
│ │
│ [Use local for all] [Use remote for all] [Cancel launch] │
│ │
│ [Continue launch] │
└──────────────────────────────────────────────────────────┘
Defaults per-row to "use remote" (matches Steam's default — pull is destructive but consistent). User can override per file.
Cancel launch = abort client.py, return non-zero. Player can fix manually then re-run.
What syncs (configurable per distribution)
cloud-scope.json next to client.py:
{
"include": [
"options.txt",
"optionsof.txt",
"optionsshaders.txt",
"config/",
"journeymap/data/",
"screenshots/"
],
"exclude": [
"config/simple-mod-sync*",
"config/packwiz*",
"**/.tmp",
"**/cache/"
],
"max_size_mb_per_file": 50,
"max_total_mb": 200
}
Defaults are baked into client.py if the file is absent. JourneyMap (journeymap/data/) tracks per-server worlds, waypoints, settings — explicitly included.
Auth-service integration
cloud-svc → auth-service contract (already exists in auth-service/server.go):
POST http://auth-service:9090/auth/verify-key
Authorization: <SM_API_KEY> # cloud-svc's own service token
Body: { "key": "<player-token>" }
200 { "user_id": "<discord_id>", "scopes": ["cloud:rw"] }
401 { "error": "invalid" }
403 { "error": "revoked" }
cloud-svc caches verified tokens in-memory for 60s to avoid hammering auth-service. Cache invalidated on 401 from client (forced refresh).
Encryption at rest
Optional, deferred. Today: blobs are raw on john's disk, owned by automc user, mode 0600. Filesystem permissions are the only barrier. Acceptable for pre-prod.
For production: per-user symmetric key (derived from Discord ID + master secret) encrypts blobs with AES-GCM. Manifest stored with the encrypted blob mappings; client provides key per request. Adds significant complexity — defer until production scale.
Quadlet template
[Unit]
Description=automc cloud-svc (player state sync)
After=automc-pg.service auth-service.service
[Container]
ContainerName=cloud-svc
Image=git.timemachine.center/timemachine/cloud-svc:latest
Network=automc-net
NetworkAlias=cloud-svc
Environment=TZ={{ tz }}
EnvironmentFile=%h/automc/secrets/cloud-svc.env
PublishPort=127.0.0.1:9091:9091
Volume=%h/automc/cloud-data:/data:Z
[Service]
Restart=always
[Install]
WantedBy=default.target
Bound to 127.0.0.1:9091 — players reach it only via SSH tunnel during dev. In real deployment, exposed via a reverse-proxy on the same hostname they use for the packwiz pack URL (e.g., packs.timemachine.center/cloud/...).
Out of scope (v1)
- Selective restore from old snapshot — UI for "go back to last Tuesday's state". The API supports it (
GET /v1/snapshot/{id}+ manual extraction); the UI is deferred. - Multi-device live conflict (player on PC + laptop simultaneously) — single-machine assumption, race documented.
- Compression tuning — zstd level 3 default. May tune up to 6 if disk pressure observed.
- Anti-replay on tokens — straight bearer auth. If a token leaks, revoke it. Not a primary attack surface.
- Cross-server modpack-aware filtering — cloud-scope.json is per-distribution, not per-server. Different servers might want different scopes; defer.
Stack
- Go 1.24+ (matches other automc services)
jackc/pgx/v5for pg (if metadata stored there; alternative: sqlite per-deploy)klauspost/compress/zstdfor tarball compression- Standard
archive/tarfor tarball assembly - No external HTTP framework —
net/http+gorilla/muxstyle routing (or stdlibhttp.ServeMuxGo 1.22+ pattern syntax)
File / module layout
cloud-svc/
cmd/cloud-svc/main.go
internal/
api/ ← HTTP handlers per endpoint
storage/ ← on-disk blob + tarball R/W, GC
manifest/ ← manifest.json types + (de)serialize + validate
auth/ ← auth-service client (token verify + cache)
quota/ ← per-user quota tracking
database/migrations/ ← if pg-backed metadata
Dockerfile
Makefile
.gitea/workflows/ci.yaml
docs/ARCHITECTURE.md
~1500 LOC Go total estimate.
Open questions
- Metadata in automc-pg or sqlite-per-instance?
- pg: shared with other services, easier ops, schema migrations same pipeline.
- sqlite: zero coupling, faster local I/O, harder to query externally.
- Recommendation: pg for consistency with the rest of automc.
- Quota source: hardcoded per-user, or per-user-row in DB?
- DB row allows admin to bump limits per player. Use
users.cloud_quota_bytescolumn (nullable, default to global).
- DB row allows admin to bump limits per player. Use
- Reverse proxy in front of cloud-svc: needed for player-facing URL (
packs.timemachine.center/cloud/...)?- Either nginx fronting cloud-svc, or expose cloud-svc directly with its own TLS via Caddy/something. Defer.