The Admin stack now has a usable operator UI for managing the fleet.
End-to-end verified locally: Client pushes → Admin dashboard reflects
the activity within the CA refresh window.
Backend (Admin-only)
- FleetQueryService: dashboard headline (totals, active count, today's
measurements + kWh from the hourly_per_device CA) and per-customer
detail (sites, devices, last 50 measurements, last 20 ingest events).
- /api/fleet/dashboard and /api/fleet/customers/{id}/detail endpoints.
- DTOs added; Program.cs wires the service + endpoints under RunMode=Admin.
Frontend
- DashboardPage now branches on RunMode — Admin renders the fleet
headline (statistic cards + customer summary table with lag tags),
Client keeps the existing placeholder.
- AdminCustomerDetailPage drills into one customer: descriptions card +
tabs for Recent ingest (with rejection counts, batch sizes, time-spread
for visible firmware-replay waves), Recent measurements, Sites, Devices.
- AdminCustomersPage rows are clickable → /admin/customers/:id (skips
the click when target is a button/popover so action buttons still work).
- App.tsx adds the /admin/customers/:id route, RequireRole-gated.
Grafana
- grafana/dashboards-admin/fleet-overview.json — 4 stat panels (active
customers, total, last-24h samples, today's kWh) plus 2 time series
(per-customer active power, per-customer hourly kWh). Reads from
fleet.hourly_per_device CA.
- grafana/dashboards-admin/customer-drilldown.json — parameterized by
$customer (template variable querying fleet.Customers). Per-device
active power, cumulative kWh, recent ingest events table.
Docs
- README: Phase 15 section describing the new admin UI surface +
pointer to dashboard-admin folder.
- OPERATIONS: new "Fleet aggregator (Admin stack)" section covering
one-time provisioning (Admin portal + Admin Grafana), end-to-end
customer-onboarding workflow (register on Admin → drop token in
customer .env → restart → verify in UI/SQL), common ops (rotate
token, disable, investigate, compression stats, force CA refresh,
decommission), and Admin-DB backup notes.
- README decommissioning note now mentions deleting from fleet.Customers
if the customer was registered for aggregation.
Verified end-to-end
- Phase 14's Client + Admin stacks rebuilt with Phase 15 code.
- /api/fleet/dashboard returns correct totals (1 customer, 1 active,
measurements + kWh derived from CA).
- /api/fleet/customers/{id}/detail returns sites, devices, recent
measurements, recent ingest events.
- Ingested a fresh measurement on Client → after CA refresh, totals
in Admin dashboard advance correctly.
- All 53 tests still passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Tau Acuvim Portal — Operations
Per-customer deployment loop. For background, architecture, and security model, read the README first.
Contents
- Prerequisites (per host)
- Provisioning a new customer
- Updating a customer's stack
- Rotating secrets
- Backup & restore
- Health & monitoring
- Troubleshooting
- Decommissioning a customer
Prerequisites (per host)
These exist once on the host running customer stacks; not per customer.
- Docker Engine (or Docker Desktop on Windows hosts).
- External Traefik instance — running on the same host, joined to a Docker network named
traefik-public. Configured with:- Two entrypoints:
web(80),websecure(443). - A certificate resolver named
le(Let's Encrypt via DNS-01 or HTTP-01). - HTTP → HTTPS redirect.
- Docker provider with
exposedByDefault: false.
- Two entrypoints:
- Wildcard DNS + TLS cert for
*.portal.example.com(or whatever your customer subdomain pattern is). - The
traefik-publicDocker network exists:docker network create traefik-public # one-time - The portal image is built (or pull-able from a registry):
cd /path/to/portal docker compose -f docker-compose.prod.yml build
Provisioning a new customer
Goal: spin up an isolated stack for customer ABC0001 (Compose project abc0001 — lowercase required) at abc0001.portal.example.com.
1. Create the customer directory
A common pattern: one directory per customer holding only an .env file (the compose files are shared from the repo). Adjust to your fleet-management tool of choice (Ansible, Portainer, Helm-on-K8s later).
/srv/portal/abc0001/
└── .env
2. Generate strong secrets
openssl rand -base64 32 # POSTGRES_PASSWORD
openssl rand -base64 32 # GRAFANA_ADMIN_PASSWORD
openssl rand -base64 32 # Authentication__DefaultAdminPassword
3. Fill in .env
COMPOSE_PROJECT_NAME=abc0001
CUSTOMER_HOST=abc0001.portal.example.com
POSTGRES_DB=power_monitoring
POSTGRES_USER=power_user
POSTGRES_PASSWORD=<from step 2>
Authentication__DefaultAdminEmail=admin@abc0001.example.com
Authentication__DefaultAdminPassword=<from step 2>
GRAFANA_ADMIN_PASSWORD=<from step 2>
Grafana__EmbedPathPrefix=/grafana
4. Decide Grafana auth mode
Anonymous is off in the prod compose by default. Pick one of the three options from the README's Security notes and wire it before exposing the stack to anyone:
- (a) Traefik
forwardAuth→ add the middleware to thetraefik.http.routers.${COMPOSE_PROJECT_NAME}-grafanalabels and implement/api/auth/checkon the portal. - (b) Grafana
auth.proxy→ setGF_AUTH_PROXY_ENABLED=true,GF_AUTH_PROXY_HEADER_NAME=X-WEBAUTH-USERenv vars; ensure Traefik (or the portal) sets the header and that no client can. - (c) Render tokens → minted by a portal endpoint; SPA appends
?auth_token=...to the iframe URL.
Without any of these, Grafana refuses anonymous access in prod (intended safe default — iframe will show a login page).
5. Bring it up
cd /srv/portal/abc0001
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml up -d
6. Verify
# Wait for healthy
docker ps --filter "label=com.docker.compose.project=abc0001"
# Health checks
curl -fs https://abc0001.portal.example.com/health # → Healthy
curl -fs https://abc0001.portal.example.com/health/ready # → Healthy
# Migration + seed in the logs
docker logs abc0001_portal | grep -E "Applied migration|Seeded|hypertable"
# expect:
# Applied migration 'InitialCreate'
# TimescaleDB hypertable for monitoring.PowerMeasurements is ready
# Seeded default admin admin@abc0001.example.com
7. First login + handover
- Sign in as
Authentication__DefaultAdminEmailwith the password from step 2. - Settings → Users → create the customer's real admin account; toggle Admin on.
- Sign out, sign in as the customer admin, change the default admin password (or delete the default admin account if the customer admin is the only one needed).
- Settings → Branding → upload customer logo, apply colours.
- Settings → Rates → seed at least one municipality + tariff for cost calc.
- Sites → create the customer's sites/devices so the ingest pipeline knows where measurements belong.
Updating a customer's stack
Code-only update (no migrations, no compose changes)
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \
up -d --build portal
Brief downtime while the new container starts. The DB is untouched.
Update with new migrations
Same command — MigrateAsync on startup applies pending migrations before the app accepts traffic. Watch the logs:
docker logs -f abc0001_portal | grep -E "Applied migration|Failed|hypertable"
If a migration fails the container will exit; fix forward, push a corrected image, retry.
Compose changes (env vars, ports, labels)
Edit the customer's .env (or the central docker-compose.prod.yml) and:
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml up -d
Compose recreates only the containers whose definition changed.
Rolling many customers
There's no built-in fan-out — pick your orchestrator (Ansible playbook, simple bash loop, Portainer stacks). Update one customer first, verify, then roll the rest.
Rotating secrets
Database password
# 1. Change the password inside Postgres
docker exec -it abc0001_timescale psql -U power_user -d power_monitoring \
-c "ALTER USER power_user WITH PASSWORD '<new>';"
# 2. Update .env
sed -i 's/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=<new>/' .env
# 3. Recreate the portal + grafana to pick up new env vars
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \
up -d portal grafana
Grafana admin password
sed -i 's/^GRAFANA_ADMIN_PASSWORD=.*/GRAFANA_ADMIN_PASSWORD=<new>/' .env
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \
up -d grafana
GF_SECURITY_ADMIN_PASSWORD is re-applied on container start.
Default admin password
Once the customer admin exists and has changed their own password, the default admin can be deleted from the Settings → Users UI. After that, Authentication__DefaultAdminPassword is only used if the row is re-seeded (which happens only when no account with that email exists).
Backup & restore
What to back up
| Volume | What's in it | Frequency |
|---|---|---|
<PREFIX>_timescale-data |
All customer data (Identity, branding, tariffs, sites, devices, measurements) | Daily, more for high-write customers |
<PREFIX>_grafana-data |
Grafana's internal SQLite (user prefs, plugin state). Dashboards re-provision from JSON so this is not authoritative. | Weekly is plenty |
<PREFIX>_portal-branding |
Uploaded logos | Daily |
<PREFIX>_portal-keys |
Data Protection key ring (cookie signing). Losing this invalidates all sessions but doesn't lose data. | Weekly |
Postgres dump
docker exec abc0001_timescale \
pg_dump -U power_user -d power_monitoring -F c -f /tmp/backup.dump
docker cp abc0001_timescale:/tmp/backup.dump ./abc0001-$(date +%Y%m%d).dump
docker exec abc0001_timescale rm /tmp/backup.dump
For consistent hypertable backups, prefer Timescale's pg_dump (supports hypertables natively as of PG12+; the above works).
Volume snapshot
For non-DB volumes, simplest is a tar from the volume's mountpoint, or use your storage layer's snapshot facility (LVM, ZFS, EBS, etc.).
Restore
# Fresh DB
docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env down timescaledb
docker volume rm abc0001_timescale-data
docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env up -d timescaledb
# Restore
docker cp abc0001-YYYYMMDD.dump abc0001_timescale:/tmp/backup.dump
docker exec abc0001_timescale \
pg_restore -U power_user -d power_monitoring --clean --if-exists /tmp/backup.dump
docker exec abc0001_timescale rm /tmp/backup.dump
# Start everything
docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env up -d
The TimescaleBootstrapper is idempotent — it will not error on a restored hypertable.
Health & monitoring
Liveness / readiness
GET /health— liveness. Use as Traefik / load-balancer health check.GET /health/ready— readiness (DB reachable). Use for orchestration "in service" decisions.
Logs
Serilog writes JSON to stdout; the Docker logging driver of your choice (json-file, journald, gelf to a central log store) picks it up.
docker logs abc0001_portal --tail 200 --follow
Notable lines:
Database connection resolved via …— confirms how this container resolved its DB at startup.Applied migration '…'— one per pending migration.TimescaleDB hypertable for monitoring.PowerMeasurements is ready— bootstrapper succeeded.Seeded default admin …— first start only; absence on subsequent starts is correct.
DB health from the host
docker exec abc0001_timescale pg_isready -U power_user -d power_monitoring
TimescaleDB chunks
docker exec -it abc0001_timescale psql -U power_user -d power_monitoring -c \
"SELECT chunk_name, range_start, range_end, total_bytes
FROM chunks_detailed_size('monitoring.\"PowerMeasurements\"');"
Troubleshooting
| Symptom | First check |
|---|---|
| Portal container restart-looping | docker logs <PREFIX>_portal — usually a missing env var (default-admin password in prod, missing Postgres password) or a migration failure. |
/health/ready returns Unhealthy |
Postgres container down, or wrong creds. docker logs <PREFIX>_timescale. |
| Grafana iframe loads but no charts | Datasource UID mismatch — confirm grafana/provisioning/datasources/timescaledb.yml has uid: timescaledb and the dashboard JSON references the same. |
| Grafana iframe shows login screen in prod | Expected if no auth mode is wired yet (anonymous off by default). Pick a mode (see README → Security). |
| Branded logo missing after restart | <PREFIX>_portal-branding volume not mounted, or filesystem perms wrong. The container runs as user app; volume must be writable by uid 1000. |
Ingest returns accepted: 0, rejected: N |
Devices don't exist for those externalIds. Create them via the Sites screen first. |
| Cookie auth seems random / sessions lost on restart | <PREFIX>_portal-keys volume not mounted — Data Protection re-keys on every start. |
| Hypertable error on startup | Pre-existing non-empty plain table being converted. migrate_data => TRUE should handle it; if not, restore from backup and check for manual monitoring."PowerMeasurements" schema changes. |
Decommissioning a customer
# Final backup
docker exec abc0001_timescale pg_dump -U power_user -d power_monitoring \
-F c -f /tmp/final.dump
docker cp abc0001_timescale:/tmp/final.dump ./abc0001-final-$(date +%Y%m%d).dump
# Stop and remove containers
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml down
# Remove volumes (destroys data — confirm backup first)
docker volume rm \
abc0001_timescale-data \
abc0001_grafana-data \
abc0001_portal-branding \
abc0001_portal-keys
# Remove customer dir
rm -rf /srv/portal/abc0001
# DNS record + cert (manual or via your DNS automation)
# If using the fleet aggregator: also delete the customer from the Admin
# Customers page (UI Delete) or via psql against the central DB:
# DELETE FROM fleet."Customers" WHERE "Code" = 'ABC0001';
# (cascades to Sites, Devices, PowerMeasurements, IngestEvents)
Fleet aggregator (Admin stack)
For background and the full design see docs/FLEET-DESIGN.md. This section covers the day-to-day ops.
One-time: provisioning the Admin stack
# 1. Create a dedicated Postgres DB for the central fleet
docker exec <timescale-container> createdb -U power_user admin_fleet
# 2. Spin up the Admin portal (same image as a customer stack, different env)
docker run -d --name admin-portal --restart unless-stopped \
--network <shared-network> \
-e Application__RunMode=Admin \
-e ASPNETCORE_ENVIRONMENT=Production \
-e Application__PublicUrl=https://admin.portal.example.com \
-e Database__ConnectionString='Host=<host>;Port=5432;Database=admin_fleet;Username=power_user;Password=<secret>' \
-e Authentication__DefaultAdminEmail=ops@yourco.example \
-e Authentication__DefaultAdminPassword=<strong> \
-v admin-portal-keys:/data/keys \
-v admin-portal-branding:/data/branding \
tau-acuvim-portal:latest
# 3. (Optional) Spin up an Admin-side Grafana pointed at admin_fleet
docker run -d --name admin-grafana --restart unless-stopped \
--network <shared-network> \
-e GF_SECURITY_ADMIN_PASSWORD=<strong> \
-e GF_SECURITY_ALLOW_EMBEDDING=true \
-e GF_AUTH_ANONYMOUS_ENABLED=false \
-e POSTGRES_DB=admin_fleet \
-e POSTGRES_USER=power_user \
-e POSTGRES_PASSWORD=<secret> \
-v admin-grafana-data:/var/lib/grafana \
-v /srv/portal/grafana/provisioning:/etc/grafana/provisioning:ro \
-v /srv/portal/grafana/dashboards-admin:/var/lib/grafana/dashboards:ro \
grafana/grafana:11.4.0
Behind Traefik: add labels on admin-portal and admin-grafana mirroring the per-customer pattern, with Host(admin.portal.example.com) and (for Grafana) && PathPrefix(/grafana). Choose a Grafana auth mode from README Security (forwardAuth / auth.proxy / render tokens) before exposing.
Onboarding a new customer end-to-end
# A. Admin side — register and capture token (one-time per customer)
# 1. Sign in to https://admin.portal.example.com
# 2. Customers → "Register customer" → Code=ABC0001, Name=Acme Corp
# 3. Copy the token shown ONCE.
# B. Customer side — spin up their stack (per OPERATIONS "Provisioning a new customer")
# AND add to their .env:
cat >> /srv/portal/abc0001/.env <<EOF
Application__RunMode=Client
FleetIngest__Enabled=true
FleetIngest__Url=https://admin.portal.example.com/api/fleet/ingest
FleetIngest__Token=<token from step A.3>
FleetIngest__IntervalSeconds=60
FleetIngest__BatchSize=5000
EOF
# 3. Restart the customer's portal so the push service starts.
docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env \
up -d portal
# C. Verify
# 1. In Admin UI → Customers, ABC0001 should show "Last push" advance within a minute.
# 2. Click the row → Customer detail → "Recent ingest" tab should list sites/devices
# batches (and measurements once any are ingested locally).
# 3. From the host:
docker exec <admin-timescale> psql -U power_user -d admin_fleet -c \
'SELECT "BatchType","RowsAccepted","ReceivedAt" FROM fleet."IngestEvents" ORDER BY "ReceivedAt" DESC LIMIT 10;'
Common ops
| What | Where |
|---|---|
| Rotate a customer's push token | Admin UI → Customers → row's "Rotate token" button. Update customer's .env and restart their portal. Brief push gap (until restart) is expected. |
| Disable a customer (stop accepting their data) | Admin UI → Customers → Edit → Active off. Ingest returns 401 immediately; data already in fleet.* is untouched. |
| Investigate "why hasn't ABC0001 shown up?" | Customer detail page → Recent ingest tab. Check for 401s, rejected rows, error messages. Or: SELECT * FROM fleet."IngestEvents" WHERE "CustomerId" = '<id>' ORDER BY "ReceivedAt" DESC; |
| Inspect compression | SELECT * FROM hypertable_compression_stats('fleet."PowerMeasurements"'); |
| Force a continuous aggregate refresh | CALL refresh_continuous_aggregate('fleet.hourly_per_device', NULL, NULL); |
| Decommission a customer from the fleet | Admin UI → Customers → Delete (cascades sites/devices/measurements/events). Customer's local stack is untouched; their portal will get 401s on push until they disable FleetIngest__Enabled or you re-register them. |
Backing up the central DB
Same pg_dump pattern as a customer DB (see above), targeting admin_fleet. Includes hypertable chunks; restore with pg_restore then run the Admin portal once to re-bootstrap the continuous aggregate refresh policy (FleetTimescaleBootstrapper is idempotent).