New top-level portal/ project, peer to console/ and firmware/. Delivers a .NET 10 + React 18 + TimescaleDB + Grafana stack, one container set per customer behind Traefik. Built in 12 phases per FrontEndPrompt spec; no changes to existing console or firmware. Backend (src/Tau.Acuvim.Portal/): - .NET 10 minimal API, Serilog, ASP.NET Identity (cookie auth, lockout). - Single AppDbContext with identity / app / monitoring schemas. - MigrateAsync + TimescaleBootstrapper (idempotent hypertable creation) + IdentityBootstrapper (seeded admin + branding) on startup. - Pure CostCalculator + DB-backed RateService for tariffs (effective-dated, TOU periods, VAT, fixed charges, per-municipality timezone). - BrandingService with logo upload to mounted volume. - Time-series ingest + bucketed query services (time_bucket aggregates, ON CONFLICT for idempotent re-delivery). - ConfigOverviewService with redaction-by-construction (passwords never in payload). - DataProtection keys persisted to /data/keys volume for cookie survival across container restarts. Frontend (frontend/): - React 18 + TypeScript + Vite + Ant Design 5 + TanStack Query. - BrandingProvider + ThemedRoot for live re-themed white-labelling. - RequireAuth / RequireRole guards. - Pages: Login, Dashboard, Dashboards (embedded Grafana), Sites (admin), Settings tabs (Branding / Rates / Users / Grafana / App config). Infra: - Dev (docker-compose.yml) and prod (docker-compose.prod.yml) compose files. Three services per customer; Traefik subdomain + same-origin /grafana path-prefix routing wired with labels. - Grafana 11 with provisioned timescaledb datasource (uid pinned) and starter power-overview.json dashboard with device template variable. - Compose project name documented as lowercase (Compose v2 requirement). Tests (tests/Tau.Acuvim.Portal.Tests/): - xUnit, 40 tests. Covers CostCalculator (period match, TZ, overlap, VAT, fixed), ConnectionStringResolver (all 4 precedence branches incl. Production refusal), TariffValidator, DayOfWeekFlag. - All passing locally against .NET 10. Docs: - README.md (onboarding + 11 spec sections), OPERATIONS.md (per-customer provisioning, secret rotation, backup, troubleshooting), TESTING.md (manual integration scenarios, frontend test scaffolding recipe). Production safety guards: - Refuses to start if Authentication:DefaultAdminPassword is unchanged default in Production. - Refuses to start if Database:AutoProvisionLocalTimescaleDb=true in Production. - Prod Grafana ships with anonymous off and auth mode unset (three options documented in README Security) so iframe refuses to load until a deliberate prod auth choice is made. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
320 lines
12 KiB
Markdown
320 lines
12 KiB
Markdown
# Tau Acuvim Portal — Operations
|
|
|
|
Per-customer deployment loop. For background, architecture, and security model, read the [README](./README.md) first.
|
|
|
|
---
|
|
|
|
## Contents
|
|
|
|
1. [Prerequisites (per host)](#prerequisites-per-host)
|
|
2. [Provisioning a new customer](#provisioning-a-new-customer)
|
|
3. [Updating a customer's stack](#updating-a-customers-stack)
|
|
4. [Rotating secrets](#rotating-secrets)
|
|
5. [Backup & restore](#backup--restore)
|
|
6. [Health & monitoring](#health--monitoring)
|
|
7. [Troubleshooting](#troubleshooting)
|
|
8. [Decommissioning a customer](#decommissioning-a-customer)
|
|
|
|
---
|
|
|
|
## Prerequisites (per host)
|
|
|
|
These exist once on the host running customer stacks; not per customer.
|
|
|
|
1. **Docker Engine** (or Docker Desktop on Windows hosts).
|
|
2. **External Traefik instance** — running on the same host, joined to a Docker network named `traefik-public`. Configured with:
|
|
- Two entrypoints: `web` (80), `websecure` (443).
|
|
- A certificate resolver named `le` (Let's Encrypt via DNS-01 or HTTP-01).
|
|
- HTTP → HTTPS redirect.
|
|
- Docker provider with `exposedByDefault: false`.
|
|
3. **Wildcard DNS + TLS cert** for `*.portal.example.com` (or whatever your customer subdomain pattern is).
|
|
4. **The `traefik-public` Docker network exists**:
|
|
```
|
|
docker network create traefik-public # one-time
|
|
```
|
|
5. **The portal image is built** (or pull-able from a registry):
|
|
```
|
|
cd /path/to/portal
|
|
docker compose -f docker-compose.prod.yml build
|
|
```
|
|
|
|
---
|
|
|
|
## Provisioning a new customer
|
|
|
|
Goal: spin up an isolated stack for customer `ABC0001` (Compose project `abc0001` — lowercase required) at `abc0001.portal.example.com`.
|
|
|
|
### 1. Create the customer directory
|
|
|
|
A common pattern: one directory per customer holding only an `.env` file (the compose files are shared from the repo). Adjust to your fleet-management tool of choice (Ansible, Portainer, Helm-on-K8s later).
|
|
|
|
```
|
|
/srv/portal/abc0001/
|
|
└── .env
|
|
```
|
|
|
|
### 2. Generate strong secrets
|
|
|
|
```bash
|
|
openssl rand -base64 32 # POSTGRES_PASSWORD
|
|
openssl rand -base64 32 # GRAFANA_ADMIN_PASSWORD
|
|
openssl rand -base64 32 # Authentication__DefaultAdminPassword
|
|
```
|
|
|
|
### 3. Fill in `.env`
|
|
|
|
```ini
|
|
COMPOSE_PROJECT_NAME=abc0001
|
|
CUSTOMER_HOST=abc0001.portal.example.com
|
|
|
|
POSTGRES_DB=power_monitoring
|
|
POSTGRES_USER=power_user
|
|
POSTGRES_PASSWORD=<from step 2>
|
|
|
|
Authentication__DefaultAdminEmail=admin@abc0001.example.com
|
|
Authentication__DefaultAdminPassword=<from step 2>
|
|
|
|
GRAFANA_ADMIN_PASSWORD=<from step 2>
|
|
Grafana__EmbedPathPrefix=/grafana
|
|
```
|
|
|
|
### 4. Decide Grafana auth mode
|
|
|
|
Anonymous is **off** in the prod compose by default. Pick one of the three options from the README's Security notes and wire it before exposing the stack to anyone:
|
|
|
|
- (a) Traefik `forwardAuth` → add the middleware to the `traefik.http.routers.${COMPOSE_PROJECT_NAME}-grafana` labels and implement `/api/auth/check` on the portal.
|
|
- (b) Grafana `auth.proxy` → set `GF_AUTH_PROXY_ENABLED=true`, `GF_AUTH_PROXY_HEADER_NAME=X-WEBAUTH-USER` env vars; ensure Traefik (or the portal) sets the header and that no client can.
|
|
- (c) Render tokens → minted by a portal endpoint; SPA appends `?auth_token=...` to the iframe URL.
|
|
|
|
Without any of these, Grafana refuses anonymous access in prod (intended safe default — iframe will show a login page).
|
|
|
|
### 5. Bring it up
|
|
|
|
```bash
|
|
cd /srv/portal/abc0001
|
|
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml up -d
|
|
```
|
|
|
|
### 6. Verify
|
|
|
|
```bash
|
|
# Wait for healthy
|
|
docker ps --filter "label=com.docker.compose.project=abc0001"
|
|
|
|
# Health checks
|
|
curl -fs https://abc0001.portal.example.com/health # → Healthy
|
|
curl -fs https://abc0001.portal.example.com/health/ready # → Healthy
|
|
|
|
# Migration + seed in the logs
|
|
docker logs abc0001_portal | grep -E "Applied migration|Seeded|hypertable"
|
|
# expect:
|
|
# Applied migration 'InitialCreate'
|
|
# TimescaleDB hypertable for monitoring.PowerMeasurements is ready
|
|
# Seeded default admin admin@abc0001.example.com
|
|
```
|
|
|
|
### 7. First login + handover
|
|
|
|
1. Sign in as `Authentication__DefaultAdminEmail` with the password from step 2.
|
|
2. **Settings → Users** → create the customer's real admin account; toggle Admin on.
|
|
3. Sign out, sign in as the customer admin, **change the default admin password** (or delete the default admin account if the customer admin is the only one needed).
|
|
4. **Settings → Branding** → upload customer logo, apply colours.
|
|
5. **Settings → Rates** → seed at least one municipality + tariff for cost calc.
|
|
6. **Sites** → create the customer's sites/devices so the ingest pipeline knows where measurements belong.
|
|
|
|
---
|
|
|
|
## Updating a customer's stack
|
|
|
|
### Code-only update (no migrations, no compose changes)
|
|
|
|
```bash
|
|
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \
|
|
up -d --build portal
|
|
```
|
|
|
|
Brief downtime while the new container starts. The DB is untouched.
|
|
|
|
### Update with new migrations
|
|
|
|
Same command — `MigrateAsync` on startup applies pending migrations before the app accepts traffic. Watch the logs:
|
|
|
|
```bash
|
|
docker logs -f abc0001_portal | grep -E "Applied migration|Failed|hypertable"
|
|
```
|
|
|
|
If a migration fails the container will exit; fix forward, push a corrected image, retry.
|
|
|
|
### Compose changes (env vars, ports, labels)
|
|
|
|
Edit the customer's `.env` (or the central `docker-compose.prod.yml`) and:
|
|
|
|
```bash
|
|
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml up -d
|
|
```
|
|
|
|
Compose recreates only the containers whose definition changed.
|
|
|
|
### Rolling many customers
|
|
|
|
There's no built-in fan-out — pick your orchestrator (Ansible playbook, simple bash loop, Portainer stacks). Update one customer first, verify, then roll the rest.
|
|
|
|
---
|
|
|
|
## Rotating secrets
|
|
|
|
### Database password
|
|
|
|
```bash
|
|
# 1. Change the password inside Postgres
|
|
docker exec -it abc0001_timescale psql -U power_user -d power_monitoring \
|
|
-c "ALTER USER power_user WITH PASSWORD '<new>';"
|
|
|
|
# 2. Update .env
|
|
sed -i 's/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=<new>/' .env
|
|
|
|
# 3. Recreate the portal + grafana to pick up new env vars
|
|
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \
|
|
up -d portal grafana
|
|
```
|
|
|
|
### Grafana admin password
|
|
|
|
```bash
|
|
sed -i 's/^GRAFANA_ADMIN_PASSWORD=.*/GRAFANA_ADMIN_PASSWORD=<new>/' .env
|
|
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml \
|
|
up -d grafana
|
|
```
|
|
|
|
`GF_SECURITY_ADMIN_PASSWORD` is re-applied on container start.
|
|
|
|
### Default admin password
|
|
|
|
Once the customer admin exists and has changed their own password, the default admin can be deleted from the **Settings → Users** UI. After that, `Authentication__DefaultAdminPassword` is only used if the row is re-seeded (which happens only when no account with that email exists).
|
|
|
|
---
|
|
|
|
## Backup & restore
|
|
|
|
### What to back up
|
|
|
|
| Volume | What's in it | Frequency |
|
|
|---|---|---|
|
|
| `<PREFIX>_timescale-data` | All customer data (Identity, branding, tariffs, sites, devices, measurements) | Daily, more for high-write customers |
|
|
| `<PREFIX>_grafana-data` | Grafana's internal SQLite (user prefs, plugin state). Dashboards re-provision from JSON so this is **not authoritative**. | Weekly is plenty |
|
|
| `<PREFIX>_portal-branding` | Uploaded logos | Daily |
|
|
| `<PREFIX>_portal-keys` | Data Protection key ring (cookie signing). Losing this invalidates all sessions but doesn't lose data. | Weekly |
|
|
|
|
### Postgres dump
|
|
|
|
```bash
|
|
docker exec abc0001_timescale \
|
|
pg_dump -U power_user -d power_monitoring -F c -f /tmp/backup.dump
|
|
docker cp abc0001_timescale:/tmp/backup.dump ./abc0001-$(date +%Y%m%d).dump
|
|
docker exec abc0001_timescale rm /tmp/backup.dump
|
|
```
|
|
|
|
For consistent hypertable backups, prefer Timescale's `pg_dump` (supports hypertables natively as of PG12+; the above works).
|
|
|
|
### Volume snapshot
|
|
|
|
For non-DB volumes, simplest is a `tar` from the volume's mountpoint, or use your storage layer's snapshot facility (LVM, ZFS, EBS, etc.).
|
|
|
|
### Restore
|
|
|
|
```bash
|
|
# Fresh DB
|
|
docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env down timescaledb
|
|
docker volume rm abc0001_timescale-data
|
|
docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env up -d timescaledb
|
|
|
|
# Restore
|
|
docker cp abc0001-YYYYMMDD.dump abc0001_timescale:/tmp/backup.dump
|
|
docker exec abc0001_timescale \
|
|
pg_restore -U power_user -d power_monitoring --clean --if-exists /tmp/backup.dump
|
|
docker exec abc0001_timescale rm /tmp/backup.dump
|
|
|
|
# Start everything
|
|
docker compose -f /path/to/portal/docker-compose.prod.yml --env-file .env up -d
|
|
```
|
|
|
|
The TimescaleBootstrapper is idempotent — it will not error on a restored hypertable.
|
|
|
|
---
|
|
|
|
## Health & monitoring
|
|
|
|
### Liveness / readiness
|
|
|
|
- `GET /health` — liveness. Use as Traefik / load-balancer health check.
|
|
- `GET /health/ready` — readiness (DB reachable). Use for orchestration "in service" decisions.
|
|
|
|
### Logs
|
|
|
|
Serilog writes JSON to stdout; the Docker logging driver of your choice (json-file, journald, gelf to a central log store) picks it up.
|
|
|
|
```bash
|
|
docker logs abc0001_portal --tail 200 --follow
|
|
```
|
|
|
|
Notable lines:
|
|
- `Database connection resolved via …` — confirms how this container resolved its DB at startup.
|
|
- `Applied migration '…'` — one per pending migration.
|
|
- `TimescaleDB hypertable for monitoring.PowerMeasurements is ready` — bootstrapper succeeded.
|
|
- `Seeded default admin …` — first start only; absence on subsequent starts is correct.
|
|
|
|
### DB health from the host
|
|
|
|
```bash
|
|
docker exec abc0001_timescale pg_isready -U power_user -d power_monitoring
|
|
```
|
|
|
|
### TimescaleDB chunks
|
|
|
|
```bash
|
|
docker exec -it abc0001_timescale psql -U power_user -d power_monitoring -c \
|
|
"SELECT chunk_name, range_start, range_end, total_bytes
|
|
FROM chunks_detailed_size('monitoring.\"PowerMeasurements\"');"
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
| Symptom | First check |
|
|
|---|---|
|
|
| Portal container restart-looping | `docker logs <PREFIX>_portal` — usually a missing env var (default-admin password in prod, missing Postgres password) or a migration failure. |
|
|
| `/health/ready` returns Unhealthy | Postgres container down, or wrong creds. `docker logs <PREFIX>_timescale`. |
|
|
| Grafana iframe loads but no charts | Datasource UID mismatch — confirm `grafana/provisioning/datasources/timescaledb.yml` has `uid: timescaledb` and the dashboard JSON references the same. |
|
|
| Grafana iframe shows login screen in prod | Expected if no auth mode is wired yet (anonymous off by default). Pick a mode (see README → Security). |
|
|
| Branded logo missing after restart | `<PREFIX>_portal-branding` volume not mounted, or filesystem perms wrong. The container runs as user `app`; volume must be writable by uid 1000. |
|
|
| Ingest returns `accepted: 0, rejected: N` | Devices don't exist for those `externalId`s. Create them via the Sites screen first. |
|
|
| Cookie auth seems random / sessions lost on restart | `<PREFIX>_portal-keys` volume not mounted — Data Protection re-keys on every start. |
|
|
| Hypertable error on startup | Pre-existing non-empty plain table being converted. `migrate_data => TRUE` should handle it; if not, restore from backup and check for manual `monitoring."PowerMeasurements"` schema changes. |
|
|
|
|
---
|
|
|
|
## Decommissioning a customer
|
|
|
|
```bash
|
|
# Final backup
|
|
docker exec abc0001_timescale pg_dump -U power_user -d power_monitoring \
|
|
-F c -f /tmp/final.dump
|
|
docker cp abc0001_timescale:/tmp/final.dump ./abc0001-final-$(date +%Y%m%d).dump
|
|
|
|
# Stop and remove containers
|
|
docker compose --env-file .env -f /path/to/portal/docker-compose.prod.yml down
|
|
|
|
# Remove volumes (destroys data — confirm backup first)
|
|
docker volume rm \
|
|
abc0001_timescale-data \
|
|
abc0001_grafana-data \
|
|
abc0001_portal-branding \
|
|
abc0001_portal-keys
|
|
|
|
# Remove customer dir
|
|
rm -rf /srv/portal/abc0001
|
|
|
|
# DNS record + cert (manual or via your DNS automation)
|
|
```
|