# Tau Acuvim Portal — Operations Per-customer deployment loop. For background, architecture, and security model, read the [README](./README.md) first. --- ## Contents 1. [Prerequisites (per host)](#prerequisites-per-host) 2. [Provisioning a new customer](#provisioning-a-new-customer) 3. [Updating a customer's stack](#updating-a-customers-stack) 4. [Rotating secrets](#rotating-secrets) 5. [Backup & restore](#backup--restore) 6. [Health & monitoring](#health--monitoring) 7. [Troubleshooting](#troubleshooting) 8. [Decommissioning a customer](#decommissioning-a-customer) --- ## Prerequisites (per host) These exist once on the host running customer stacks; not per customer. 1. **Docker Engine** (or Docker Desktop on Windows hosts). 2. **External Traefik instance** — running on the same host, joined to a Docker network named `traefik-public`. Configured with: - Two entrypoints: `web` (80), `websecure` (443). - A certificate resolver named `le` (Let's Encrypt via DNS-01 or HTTP-01). - HTTP → HTTPS redirect. - Docker provider with `exposedByDefault: false`. 3. **Wildcard DNS + TLS cert** for `*.portal.example.com` (or whatever your customer subdomain pattern is). 4. **The `traefik-public` Docker network exists**: ``` docker network create traefik-public # one-time ``` 5. **The portal image is built** (or pull-able from a registry): ``` cd /path/to/portal docker compose -f docker-compose.prod.yml build ``` --- ## Provisioning a new customer Goal: spin up an isolated stack for customer `ABC0001` (Compose project `abc0001` — lowercase required) at `abc0001.portal.example.com`. ### 1. Create the customer directory A common pattern: one directory per customer holding only an `.env` file (the compose files are shared from the repo). Adjust to your fleet-management tool of choice (Ansible, Portainer, Helm-on-K8s later). ``` /srv/portal/abc0001/ └── .env ``` ### 2. Generate strong secrets ```bash openssl rand -base64 32 # POSTGRES_PASSWORD openssl rand -base64 32 # GRAFANA_ADMIN_PASSWORD openssl rand -base64 32 # Authentication__DefaultAdminPassword ``` ### 3. Fill in `.env` Copy `.env.example` to the customer's directory and fill in: ```ini COMPOSE_PROJECT_NAME=abc0001 CUSTOMER_HOST=abc0001.portal.example.com Application__RunMode=Client POSTGRES_DB=power_monitoring POSTGRES_USER=power_user POSTGRES_PASSWORD= Authentication__DefaultAdminEmail=admin@abc0001.example.com Authentication__DefaultAdminPassword= GRAFANA_ADMIN_PASSWORD= Grafana__EmbedPathPrefix=/grafana # Pre-brand the stack so the customer's first sign-in already shows their # colours and name. Only applied on first boot; later changes are via the UI. WhiteLabel__ApplicationName=Acme Corp Power Monitoring WhiteLabel__PrimaryColor=#0c4a6e WhiteLabel__SecondaryColor=#0e7490 WhiteLabel__AccentColor=#06b6d4 WhiteLabel__FooterText=© Acme Corp ``` See `.env.example` for the full annotated set including the optional `FleetIngest__*` block (added later, when you enable fleet aggregation for this customer). ### 4. Grafana auth (already wired) Production Grafana embedding uses Traefik `forwardAuth` → portal `/api/auth/check`, defined inline on the Grafana router in `docker-compose.prod.yml`. Every Grafana sub-request is gated on a valid portal cookie; anonymous is off. No per-customer action required. To switch to a different mode (e.g. `auth.proxy` for per-user Grafana folders), see the README "Grafana embedding — production auth" section. ### 5. Bring it up ```bash cd /srv/portal/abc0001 docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml up -d ``` ### 6. Verify ```bash # Wait for healthy docker ps --filter "label=com.docker.compose.project=abc0001" # Health checks curl -fs https://abc0001.portal.example.com/health # → Healthy curl -fs https://abc0001.portal.example.com/health/ready # → Healthy # Migration + seed in the logs docker logs abc0001_portal | grep -E "Applied migration|Seeded|hypertable" # expect: # Applied migration 'InitialCreate' # TimescaleDB hypertable for monitoring.PowerMeasurements is ready # Seeded default admin admin@abc0001.example.com ``` ### 7. First login + handover 1. Sign in as `Authentication__DefaultAdminEmail` with the password from step 2. 2. **Settings → Users** → create the customer's real admin account; toggle Admin on. 3. Sign out, sign in as the customer admin, **change the default admin password** (or delete the default admin account if the customer admin is the only one needed). 4. **Settings → Branding** → upload customer logo, apply colours. 5. **Settings → Rates** → seed at least one municipality + tariff for cost calc. 6. **Sites** → create the customer's sites/devices so the ingest pipeline knows where measurements belong. --- ## Updating a customer's stack ### Code-only update (no migrations, no compose changes) ```bash docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \ up -d --build portal ``` Brief downtime while the new container starts. The DB is untouched. ### Update with new migrations Same command — `MigrateAsync` on startup applies pending migrations before the app accepts traffic. Watch the logs: ```bash docker logs -f abc0001_portal | grep -E "Applied migration|Failed|hypertable" ``` If a migration fails the container will exit; fix forward, push a corrected image, retry. ### Compose changes (env vars, ports, labels) Edit the customer's `.env` (or the central `docker-compose.prod.yml`) and: ```bash docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml up -d ``` Compose recreates only the containers whose definition changed. ### Rolling many customers There's no built-in fan-out — pick your orchestrator (Ansible playbook, simple bash loop, Portainer stacks). Update one customer first, verify, then roll the rest. --- ## Rotating secrets ### Database password ```bash # 1. Change the password inside Postgres docker exec -it abc0001_timescale psql -U power_user -d power_monitoring \ -c "ALTER USER power_user WITH PASSWORD '';" # 2. Update .env sed -i 's/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=/' .env # 3. Recreate the portal + grafana to pick up new env vars docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \ up -d portal grafana ``` ### Grafana admin password ```bash sed -i 's/^GRAFANA_ADMIN_PASSWORD=.*/GRAFANA_ADMIN_PASSWORD=/' .env docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \ up -d grafana ``` `GF_SECURITY_ADMIN_PASSWORD` is re-applied on container start. ### Default admin password Once the customer admin exists and has changed their own password, the default admin can be deleted from the **Settings → Users** UI. After that, `Authentication__DefaultAdminPassword` is only used if the row is re-seeded (which happens only when no account with that email exists). --- ## Backup & restore ### What to back up | Volume | What's in it | Frequency | |---|---|---| | `_timescale-data` | All customer data (Identity, branding, tariffs, sites, devices, measurements) | Daily, more for high-write customers | | `_grafana-data` | Grafana's internal SQLite (user prefs, plugin state). Dashboards re-provision from JSON so this is **not authoritative**. | Weekly is plenty | | `_portal-branding` | Uploaded logos | Daily | | `_portal-keys` | Data Protection key ring (cookie signing). Losing this invalidates all sessions but doesn't lose data. | Weekly | ### Postgres dump ```bash docker exec abc0001_timescale \ pg_dump -U power_user -d power_monitoring -F c -f /tmp/backup.dump docker cp abc0001_timescale:/tmp/backup.dump ./abc0001-$(date +%Y%m%d).dump docker exec abc0001_timescale rm /tmp/backup.dump ``` For consistent hypertable backups, prefer Timescale's `pg_dump` (supports hypertables natively as of PG12+; the above works). ### Volume snapshot For non-DB volumes, simplest is a `tar` from the volume's mountpoint, or use your storage layer's snapshot facility (LVM, ZFS, EBS, etc.). ### Restore ```bash # Fresh DB docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env down timescaledb docker volume rm abc0001_timescale-data docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env up -d timescaledb # Restore docker cp abc0001-YYYYMMDD.dump abc0001_timescale:/tmp/backup.dump docker exec abc0001_timescale \ pg_restore -U power_user -d power_monitoring --clean --if-exists /tmp/backup.dump docker exec abc0001_timescale rm /tmp/backup.dump # Start everything docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env up -d ``` The TimescaleBootstrapper is idempotent — it will not error on a restored hypertable. --- ## Health & monitoring ### Liveness / readiness - `GET /health` — liveness. Use as Traefik / load-balancer health check. - `GET /health/ready` — readiness (DB reachable). Use for orchestration "in service" decisions. ### Logs Serilog writes JSON to stdout; the Docker logging driver of your choice (json-file, journald, gelf to a central log store) picks it up. ```bash docker logs abc0001_portal --tail 200 --follow ``` Notable lines: - `Database connection resolved via …` — confirms how this container resolved its DB at startup. - `Applied migration '…'` — one per pending migration. - `TimescaleDB hypertable for monitoring.PowerMeasurements is ready` — bootstrapper succeeded. - `Seeded default admin …` — first start only; absence on subsequent starts is correct. ### DB health from the host ```bash docker exec abc0001_timescale pg_isready -U power_user -d power_monitoring ``` ### TimescaleDB chunks ```bash docker exec -it abc0001_timescale psql -U power_user -d power_monitoring -c \ "SELECT chunk_name, range_start, range_end, total_bytes FROM chunks_detailed_size('monitoring.\"PowerMeasurements\"');" ``` --- ## Troubleshooting | Symptom | First check | |---|---| | Portal container restart-looping | `docker logs _portal` — usually a missing env var (default-admin password in prod, missing Postgres password) or a migration failure. | | `/health/ready` returns Unhealthy | Postgres container down, or wrong creds. `docker logs _timescale`. | | Grafana iframe loads but no charts | Datasource UID mismatch — confirm `grafana/provisioning/datasources/timescaledb.yml` has `uid: timescaledb` and the dashboard JSON references the same. | | Grafana iframe shows login screen in prod | Expected if no auth mode is wired yet (anonymous off by default). Pick a mode (see README → Security). | | Branded logo missing after restart | `_portal-branding` volume not mounted, or filesystem perms wrong. The container runs as user `app`; volume must be writable by uid 1000. | | Ingest returns `accepted: 0, rejected: N` | Devices don't exist for those `externalId`s. Create them via the Sites screen first. | | Cookie auth seems random / sessions lost on restart | `_portal-keys` volume not mounted — Data Protection re-keys on every start. | | Hypertable error on startup | Pre-existing non-empty plain table being converted. `migrate_data => TRUE` should handle it; if not, restore from backup and check for manual `monitoring."PowerMeasurements"` schema changes. | --- ## Decommissioning a customer ```bash # Final backup docker exec abc0001_timescale pg_dump -U power_user -d power_monitoring \ -F c -f /tmp/final.dump docker cp abc0001_timescale:/tmp/final.dump ./abc0001-final-$(date +%Y%m%d).dump # Stop and remove containers docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml down # Remove volumes (destroys data — confirm backup first) docker volume rm \ abc0001_timescale-data \ abc0001_grafana-data \ abc0001_portal-branding \ abc0001_portal-keys # Remove customer dir rm -rf /srv/portal/abc0001 # DNS record + cert (manual or via your DNS automation) # If using the fleet aggregator: also delete the customer from the Admin # Customers page (UI Delete) or via psql against the central DB: # DELETE FROM fleet."Customers" WHERE "Code" = 'ABC0001'; # (cascades to Sites, Devices, PowerMeasurements, IngestEvents) ``` --- ## Fleet aggregator (Admin stack) For background and the full design see [docs/FLEET-DESIGN.md](./docs/FLEET-DESIGN.md). This section covers the day-to-day ops. ### One-time: provisioning the Admin stack ```bash # 1. Create a dedicated Postgres DB for the central fleet docker exec createdb -U power_user admin_fleet # 2. Spin up the Admin portal (same image as a customer stack, different env) docker run -d --name admin-portal --restart unless-stopped \ --network \ -e Application__RunMode=Admin \ -e ASPNETCORE_ENVIRONMENT=Production \ -e Application__PublicUrl=https://admin.portal.example.com \ -e Database__ConnectionString='Host=;Port=5432;Database=admin_fleet;Username=power_user;Password=' \ -e Authentication__DefaultAdminEmail=ops@yourco.example \ -e Authentication__DefaultAdminPassword= \ -v admin-portal-keys:/data/keys \ -v admin-portal-branding:/data/branding \ tau-acuvim-portal:latest # 3. (Optional) Spin up an Admin-side Grafana pointed at admin_fleet docker run -d --name admin-grafana --restart unless-stopped \ --network \ -e GF_SECURITY_ADMIN_PASSWORD= \ -e GF_SECURITY_ALLOW_EMBEDDING=true \ -e GF_AUTH_ANONYMOUS_ENABLED=false \ -e POSTGRES_DB=admin_fleet \ -e POSTGRES_USER=power_user \ -e POSTGRES_PASSWORD= \ -v admin-grafana-data:/var/lib/grafana \ -v /srv/portal/grafana/provisioning:/etc/grafana/provisioning:ro \ -v /srv/portal/grafana/dashboards-admin:/var/lib/grafana/dashboards:ro \ grafana/grafana:11.4.0 ``` Behind Traefik: add labels on `admin-portal` and `admin-grafana` mirroring the per-customer pattern, with `Host(admin.portal.example.com)` and (for Grafana) `&& PathPrefix(/grafana)`. Choose a Grafana auth mode from README Security (forwardAuth / auth.proxy / render tokens) before exposing. ### Onboarding a new customer end-to-end ```bash # A. Admin side — register and capture token (one-time per customer) # 1. Sign in to https://admin.portal.example.com # 2. Customers → "Register customer" → Code=ABC0001, Name=Acme Corp # 3. Copy the token shown ONCE. # B. Customer side — spin up their stack (per OPERATIONS "Provisioning a new customer") # AND add to their .env: cat >> /srv/portal/abc0001/.env < FleetIngest__IntervalSeconds=60 FleetIngest__BatchSize=5000 EOF # 3. Restart the customer's portal so the push service starts. docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env \ up -d portal # C. Verify # 1. In Admin UI → Customers, ABC0001 should show "Last push" advance within a minute. # 2. Click the row → Customer detail → "Recent ingest" tab should list sites/devices # batches (and measurements once any are ingested locally). # 3. From the host: docker exec psql -U power_user -d admin_fleet -c \ 'SELECT "BatchType","RowsAccepted","ReceivedAt" FROM fleet."IngestEvents" ORDER BY "ReceivedAt" DESC LIMIT 10;' ``` ### Common ops | What | Where | |---|---| | Rotate a customer's push token | Admin UI → Customers → row's "Rotate token" button. Update customer's `.env` and restart their portal. Brief push gap (until restart) is expected. | | Disable a customer (stop accepting their data) | Admin UI → Customers → Edit → Active off. Ingest returns 401 immediately; data already in `fleet.*` is untouched. | | Investigate "why hasn't ABC0001 shown up?" | Customer detail page → Recent ingest tab. Check for 401s, rejected rows, error messages. Or: `SELECT * FROM fleet."IngestEvents" WHERE "CustomerId" = '' ORDER BY "ReceivedAt" DESC;` | | Inspect compression | `SELECT * FROM hypertable_compression_stats('fleet."PowerMeasurements"');` | | Force a continuous aggregate refresh | `CALL refresh_continuous_aggregate('fleet.hourly_per_device', NULL, NULL);` | | Decommission a customer from the fleet | Admin UI → Customers → Delete (cascades sites/devices/measurements/events). Customer's local stack is untouched; their portal will get 401s on push until they disable `FleetIngest__Enabled` or you re-register them. | ### Backing up the central DB Same `pg_dump` pattern as a customer DB (see above), targeting `admin_fleet`. Includes hypertable chunks; restore with `pg_restore` then run the Admin portal once to re-bootstrap the continuous aggregate refresh policy (`FleetTimescaleBootstrapper` is idempotent).