Per-customer cost over a UTC time range, joining the fleet hierarchy:
hourly_per_device CA × Devices × Sites × Municipalities × active Tariff
× period (in the municipality's local time) → per-bucket kWh × rate.
Per-day rollup + per-device breakdown. VAT applied per bucket so cost
stays correct across mid-window tariff changes.
Backend
- FleetCostService.ComputeAsync(customerId, fromUtc, toUtc):
- Loads device→site→municipality mapping (small per customer).
- Loads tariffs (with periods) grouped by municipality, ordered by
EffectiveFrom desc for active-tariff lookup.
- Reads hourly buckets in range from fleet.hourly_per_device via
raw ADO (the CA isn't an EF entity).
- For each bucket: pick active tariff for the bucket's date, convert
bucket start to local time via Municipality.TimeZoneId, pick the
matching period (or default rate), compute base + VAT.
- Rolls up per UTC date + per device. Tracks BucketsWithoutTariff
when site has no muni or no tariff covers the bucket date.
- New DTOs: FleetCostDto / FleetCostDayDto / FleetCostDeviceDto.
- Endpoint: GET /api/fleet/customers/{id}/cost?from&to. AdminOnly.
Validation: to > from, range <= 366 days (BadRequest on violation).
- Period-selection helper duplicated from CostCalculator (5 lines;
generic abstraction across TariffPeriod / FleetTariffPeriod is
more code than the duplication).
- Fixed monthly charges deliberately NOT applied (whole-month billing
concept; FixedMonthlyChargesIncluded=false in the response).
Frontend
- AdminCustomerDetailPage gets a Cost tab:
- RangePicker with quick ranges (Today, Last 7d, Last 30d, This
month). Default last 7 days.
- 4 Statistic cards: total kWh, base cost, VAT, total.
- Warning alerts: when buckets-without-tariff > 0; always-on info
that fixed monthly charges aren't included.
- Per-day table + per-device table.
Verified end-to-end on the running stack
- Patched DEV0001's existing site to LocalMunicipalityId=1
(Phase 23 test municipality with Domestic TOU tariff).
- Ingested 3 measurements at 10:00 / 10:20 / 10:40 UTC with kWh
totals 2000 / 2020 / 2050 → hourly CA bucket has delta = 50 kWh.
- Total kWh in window = 56 (50 from new bucket + 6 from earlier
Phase 14 backfill bucket).
- Tariff resolution: 10:00-12:00 UTC = 12:00-14:00 SAST, which is
neither Peak (17:00-20:00 weekdays) nor Off-Peak (weekends only)
→ defaults to 2.50/kWh.
- 56 × 2.50 = 140.00 base ✓
- 140 × 0.15 = 21.00 VAT ✓
- Total 161.00 ✓ exactly what the API returned.
Docs: FLEET-DESIGN.md §11 row updated — tariff sync + cost compute
both marked as shipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
Fleet ingest — design (locked)
Status: approved, ready for Phase 13 implementation.
This is the design reference for the Admin / Fleet aggregation feature. Read this before touching any code in Phases 13–15.
1. Goals & non-goals
In scope (v1):
- Mirror every customer's raw
monitoring.PowerMeasurements,Sites, andDevicesinto a central DB on an Admin stack, withCustomerIdattached. - Admin operator sees fleet-wide and per-customer dashboards.
- Customer stacks keep working if Admin is unreachable; pushes resume when it comes back.
- Same portal binary in two modes (
RunMode=Client/Admin), config-selected.
Not in scope (v1):
- Cross-customer cost / tariff aggregation (Admin shows kWh + power; per-customer instance still owns billing).
- Branding / user sync (Admin has its own brand + users — separate fleets of humans).
- Bidirectional sync (Admin → Customer commands, OTA, etc.).
- Multi-region or sharded central DB.
2. Topology
[Traefik public]
├─ abc0001.portal.example.com → Client stack (portal + timescale + grafana)
├─ abc0002.portal.example.com → Client stack
├─ ...
└─ admin.portal.example.com → Admin stack (portal + timescale + grafana)
[Client stack abc0001] --batches POST--> [Admin stack /api/fleet/ingest]
[Client stack abc0002] --batches POST--> (central TimescaleDB)
[Client stack ...] --batches POST-->
Admin stack reuses docker-compose.prod.yml and the same image, with:
COMPOSE_PROJECT_NAME=adminCUSTOMER_HOST=admin.portal.example.comRunMode=Admin- Its own Postgres volume (the central fleet DB)
The whole feature is opt-in. With no Admin stack deployed, customer stacks set FleetIngest__Enabled=false and work exactly as today.
3. RunMode mechanics
Application.RunMode is Client (default) or Admin. Single binary; Program.cs branches:
| Concern | Client | Admin |
|---|---|---|
| DbContext target | local Postgres | central Postgres |
Endpoints /api/sites/*, /api/measurements/*, /api/ingest/measurements, /api/admin/sites/* |
mapped | not mapped |
Endpoints /api/fleet/*, /api/admin/customers/* |
not mapped | mapped |
FleetPushService (BackgroundService) |
started if FleetIngest__Enabled=true |
not registered |
FleetIngestService |
not registered | registered |
| EF migration set | app + monitoring |
app + fleet |
| Identity / Branding / Users / Settings | same | same |
| Settings → Rates tab | shown | hidden |
| Sidebar nav | Dashboard / Dashboards / Sites / Settings | Dashboard / Dashboards / Customers / Settings |
Startup guards:
- Reject
RunMode=AdminifDatabase:ConnectionStringis empty (no auto-provision for Admin — there is no obvious local default for a fleet DB). - Reject
RunMode=Client + FleetIngest__Enabled=trueifFleetIngest__UrlorFleetIngest__Tokenis empty. - Reject the wrong mode against an existing DB by sniffing for a marker table:
RunMode=Clientagainst a DB containingfleet.Customers→ fatal.RunMode=Adminagainst a DB containingmonitoring.PowerMeasurements→ fatal.
- Stops the "I pointed dev at the prod DB" disaster.
4. Schemas
4.1 Central DB (Admin) — fleet schema
fleet.Customers — registry
| Column | Type | Notes |
|---|---|---|
| Id | uuid PK | Admin-generated, stable |
| Code | varchar(50) UNIQUE | e.g. ABC0001 — human handle (preserve case for display, lowercase for compose) |
| Name | varchar(200) | |
| TokenHash | varchar(64) UNIQUE indexed | SHA-256 hex of the push token |
| TokenIssuedAt | timestamptz | |
| TokenRotatedAt | timestamptz nullable | |
| IsActive | bool | If false, ingest returns 401 |
| FirstSeenAt, LastSeenAt | timestamptz nullable | When their first / most-recent push landed |
| CreatedAt | timestamptz |
fleet.Sites — mirrors customer-side, identity preserved
| Column | Type | Notes |
|---|---|---|
| Id | uuid | Customer-side Site UUID, preserved |
| CustomerId | uuid FK→Customers | |
| Name, Address, IsActive | … | |
| LocalMunicipalityId | int nullable | Opaque on Admin |
| ReceivedAt | timestamptz | When Admin upserted this row |
| PK | (CustomerId, Id) |
fleet.Devices — same pattern
| Column | Type | Notes |
|---|---|---|
| Id | uuid | Preserved |
| CustomerId | uuid FK | |
| SiteId | uuid | (CustomerId, SiteId) FK→Sites composite |
| Name, ExternalId, Description, IsActive | … | |
| ReceivedAt | timestamptz | |
| PK | (CustomerId, Id) |
fleet.PowerMeasurements — hypertable, the big one
| Column | Type | Notes |
|---|---|---|
| Time | timestamptz | Hypertable partition |
| CustomerId | uuid | |
| DeviceId | uuid | (CustomerId, DeviceId) FK→Devices |
| ActivePowerKw, ReactivePowerKvar, ApparentPowerKva, PowerFactor, VoltageV, FrequencyHz, EnergyImportedKwh, EnergyExportedKwh | double | |
| Source | varchar(50) nullable | |
| PK | (Time, CustomerId, DeviceId) | Time must be in any unique constraint on a hypertable |
| Indexes | (CustomerId, Time DESC); (CustomerId, DeviceId, Time DESC) |
fleet.IngestEvents — observability
| Column | Type | Notes |
|---|---|---|
| Id | uuid PK | |
| CustomerId | uuid FK | |
| ReceivedAt | timestamptz | |
| BatchType | varchar(20) | sites / devices / measurements |
| RowsAccepted, RowsRejected | int | |
| BatchBytes | int | |
| ClientHwm | varchar(50) | Cursor the client claims this batch covers |
| TimeSpread | interval nullable | max(Time) - min(Time) in the batch — visible burst back-fills |
| Error | varchar(500) nullable |
4.2 Client DB additions
One new table: app.FleetPushState
| Column | Type | Notes |
|---|---|---|
| ResourceType | varchar(20) PK | sites / devices / measurements |
| LastCursor | timestamptz nullable | For measurements: max(ReceivedAt) pushed. For sites/devices: max(UpdatedAt) |
| LastSyncedAt | timestamptz nullable | |
| LastError | varchar(500) nullable | |
| ConsecutiveFailures | int default 0 | Drives exponential backoff |
Plus: add ReceivedAt timestamptz (default NOW()) to monitoring.PowerMeasurements, indexed on (ReceivedAt). See §5.1 for the rationale.
4.3 Migration assembly split
Two DbContext classes — ClientDbContext (existing AppDbContext, renamed) and AdminDbContext. Each owns its migrations folder. Only the one matching RunMode is registered with DI. Ugly-but-reliable wins over single-context-with-tooling-gymnastics.
5. Push pipeline
5.1 Late arrivals — push by ReceivedAt, not Time
The firmware (ESP32 / TTGO T-Call) buffers samples offline and replays them when connectivity returns. Late arrivals are a designed-for case, not an edge.
If we pushed by Time > LastCursor, a device offline for 6 hours would have its back-fill skipped (its samples' Time is in the past).
Fix: every row carries ReceivedAt assigned by the local DB on insert. Push selects WHERE ReceivedAt > LastCursor[measurements]. Back-fills are picked up on the next push tick after the firmware replays.
Time stays as-is for queries; nothing else changes about how the data is used.
5.2 Burst handling
When a firmware replay drops thousands of rows in one local insert, the next push tick sees a huge backlog. To avoid one customer monopolising network / Admin:
- Per request: up to
FleetIngest__BatchSizerows (default 5000) orFleetIngest__BatchMaxBytes(default 1 MB), whichever hits first. - Per tick: at most 3 successful batches per resource type, then yield.
- Result: a 30k-row back-fill drains over ~5 minutes (10 ticks @ 60s) without starving sites/devices syncing.
- Observable:
fleet.IngestEvents.TimeSpreadshows themax(Time) - min(Time)per batch; a wave of back-fill is obvious in Admin logs without grepping.
5.3 Cadence
- Default push interval: 60 s.
- Configurable:
FleetIngest__IntervalSeconds.
5.4 Order of operations per tick
- Sites:
UpdatedAt > LastCursor[sites]→ POST batch (full upsert, all rows; small data). - Devices: same.
- Measurements:
ReceivedAt > LastCursor[measurements]ordered byReceivedAt→ POST batches until drained or limit.
Sites and devices first so Admin's measurement FK insert doesn't reject rows for unknown devices.
5.5 Endpoint contract
POST /api/fleet/ingest
Headers:
X-Customer-Token: <opaque 32-byte hex>
X-Batch-Type: sites | devices | measurements
X-Push-Cursor: <ISO8601> (highest ReceivedAt/UpdatedAt in this batch)
Content-Type: application/json
Body:
JSON array of rows matching the type
Response 200:
{ accepted: int, rejected: int, errors: [{ row: int, reason: string }] }
Response 401: unknown / inactive / wrong token
Response 413: batch too large
Response 429: rate-limited
Response 5xx: transient — retry
CustomerId is derived from the token server-side. Clients never send it. Prevents one customer from forging rows for another.
5.6 Failure handling
| Failure | Client behavior |
|---|---|
| Network timeout / 5xx | Exponential backoff (1m → 2m → 5m → 10m, cap 30m); ConsecutiveFailures++ |
| 401 | Stop trying for this resource; log loudly; surface in Settings → App config ("Admin link down") |
200 with rejected > 0 on measurements |
Re-push sites/devices on next tick before retrying measurements |
| 413 | Halve BatchSize for this resource and retry |
| 429 | Honor Retry-After; otherwise back off 1 minute |
Push runs on its own DI scope and its own DbContext. If the push DB query hangs, only push is blocked — never user-facing API.
6. Auth — token lifecycle
- Token = 32 random bytes, hex-encoded (64 chars). Generated server-side via
RandomNumberGenerator. - Stored as SHA-256 hex hash, UNIQUE indexed → ingest endpoint does a single O(log N) indexed lookup. Bcrypt is the wrong tool for a high-throughput, high-entropy machine token.
- Shown once in the Admin UI at creation/rotation. Admin only stores the hash. Lost token → rotate.
- Rotation: dual-token with 24h grace window (shipped after the initial design). On rotate, the current
TokenHashis copied intoPreviousTokenHashwithPreviousTokenExpiresAt = now + 24h, then a fresh current is generated.FindByTokenAsyncmatches either current OR (previous AND not expired ANDIsActive). Customer ops has the full window to update.env+ restart without dropping a push tick. A second rotation within the grace window overwrites the previous slot — at most one previous token is ever honoured at a time. Grace duration isCustomerService.DefaultTokenGracePeriod(parameterizable onRotateTokenAsyncfor tests; lift to config if you need per-customer tuning). Customers.IsActive=false→ all that customer's pushes get 401 until reactivated.
7. Continuous aggregates + retention
7.1 Aggregates (created post-migration by FleetTimescaleBootstrapper)
fleet_hourly_per_device—time_bucket('1 hour', Time),CustomerId,DeviceId→ avg/max/minActivePowerKw,max(EnergyImported) - min(EnergyImported)asKwhDelta, sample count.fleet_daily_per_customer—time_bucket('1 day', Time),CustomerId→ totals across all that customer's devices.- Realtime (
materialized_only = false) so late back-fills appear in CA queries immediately, with the unmaterialized tail served live from the hypertable until next refresh. - Refresh policy:
fleet_hourly_per_device: every 5 min;start_offset = 30 days,end_offset = 1 hour.fleet_daily_per_customer: every 1 hour;start_offset = 365 days,end_offset = 1 day.
- Wide
start_offsetmeans firmware back-fills within those windows get materialized on the next refresh tick. TimescaleDB's invalidation log means only chunks that actually changed get re-processed.
7.2 Retention vs compression
Retention policy: NONE. All data kept indefinitely for historical / trend analysis.
Compression policy (this is what makes "forever" cheap):
fleet.PowerMeasurements: compress chunks older than 7 days.- Compression settings:
segmentby = (CustomerId, DeviceId),orderby = Time DESC. - Typical ratio for power-monitoring data: 5–10×. Point queries by
(CustomerId, DeviceId, Time range)stay fast because of the segmentby clustering. - Bootstrapper enables compression and adds the policy idempotently.
CA tables are tiny by comparison; no compression needed there.
8. Admin UI surface
| Page | What |
|---|---|
| Dashboard | Fleet headline — customer count, active customers (push in last hour), aggregate live active power, total kWh today, last-24h lag chart |
| Customers | Table — Code / Name / Last seen / Sites / Devices / Today kWh / Status / Actions (rotate token, disable, view) |
| Customer detail | Sites + devices for the selected customer + Grafana embed scoped to customer_id |
| Dashboards | Grafana embed — fleet-overview default, customer-drilldown parameterized |
| Settings → Branding | Admin's own brand |
| Settings → Users | Admin operator accounts |
| Settings → Grafana | Read-only info card |
| Settings → App config | Read-only config snapshot (RunMode visible at the top) |
Hidden in Admin mode: Sites top-level (no local sites), Settings → Rates tab (no local tariffs).
9. Observability
Two questions a real operator asks, two answers:
- "Is customer X pushing?" — Customers page shows
LastSeenAt,ConsecutiveFailures, last batch size, recent burst spread. - "Why did this batch not land?" —
SELECT * FROM fleet.IngestEvents WHERE CustomerId = … ORDER BY ReceivedAt DESC LIMIT 50;
Plus Serilog:
- Client:
[INF] Pushed N measurements to fleet ingest (cursor=…) accepted=N rejected=0 - Admin:
[INF] Accepted N measurements from {Code} ({CustomerId}), rejected 0
10. Per-stack config
Customer .env additions:
RunMode=Client # default — can omit
FleetIngest__Enabled=true
FleetIngest__Url=https://admin.portal.example.com/api/fleet/ingest
FleetIngest__Token=<32-byte hex token from Admin Customers page>
FleetIngest__IntervalSeconds=60
FleetIngest__BatchSize=5000
FleetIngest__BatchMaxBytes=1048576
FleetIngest__Enabled=false → push service doesn't start; customer runs as today, no data leaves.
Admin .env:
RunMode=Admin
COMPOSE_PROJECT_NAME=admin
CUSTOMER_HOST=admin.portal.example.com
Database__ConnectionString=Host=timescaledb;... # central fleet DB (separate volume)
# All other vars same as Client compose.
11. Open seams (deferred, with the obvious extension paths noted)
| Seam | Where v2 picks it up |
|---|---|
Both shipped. fleet.Municipalities + fleet.Tariffs + fleet.TariffPeriods mirror the customer-side rate hierarchy scoped by CustomerId (per-customer per-municipality rates preserved). FleetCostService joins fleet.hourly_per_device × Devices × Sites × Municipalities × active Tariff × period (in municipality local time) → per-bucket cost. Endpoint: GET /api/fleet/customers/{id}/cost?from&to. Per-day rollup + per-device breakdown. VAT applied per bucket from the active tariff. Fixed monthly charges intentionally excluded (whole-month billing concept). 366-day range cap. UI: Cost tab on customer detail with RangePicker, summary cards, daily + per-device tables. |
|
| Per-customer Postgres RLS for multi-Admin-user setups | Add current_setting('app.customer_filter')-based RLS policies on fleet.* tables; Admin role to customer-scope mapping in IdentityRole.CustomerId. |
| Bidirectional Admin → Customer commands | New WebSocket or long-poll channel on customer side; gated by mutual cert or a second token. |
| Branding sync (for the "Admin sees customer's brand when drilling in" niceness) | Push branding row from customer; Admin renders the customer's brand on Customer-detail pages. |
| Forward compatibility | Versioned URL /v1/fleet/ingest. Admin tolerates unknown JSON fields. Strictness on response shape only. |
Shipped. See §6 — Customers.PreviousTokenHash + PreviousTokenExpiresAt; ingest accepts either while previous is unexpired. Default 24h. |
|
| Sharded Admin (10000+ customers) | Customer's FleetIngest__Url already supports any host — point different customers at different Admin instances; aggregate at a tier above with Promscale or similar if needed. |
| Hard-delete / GDPR | Admin Customers page → "Decommission" action: DELETE FROM fleet.* WHERE CustomerId = … cascade. Logged. |
12. Phase split
| Phase | What | End-state |
|---|---|---|
| 13 | RunMode plumbing + Fleet schema + AdminDbContext split + Customers registry + token issuance UI. No data movement. |
Spin up an Admin stack, register a customer, see their token. Client stack picks up RunMode=Client + push config but doesn't yet push. |
| 14 | FleetPushService (Client), /api/fleet/ingest + FleetIngestService (Admin), IngestEvents, ReceivedAt migration on Client, FleetPushState, continuous aggregates SQL, compression policy SQL. |
Two local stacks: one Client pushes to one Admin. Rows visible in fleet.PowerMeasurements. Late-arrival firmware-replay scenario verified with fixtures. |
| 15 | Admin UI (Dashboard, Customers list, Customer detail, Customers token rotation), Grafana provisioning of fleet-overview.json + customer-drilldown.json, README + OPERATIONS updates for Admin deployment + customer onboarding workflow. |
Full feature usable end-to-end. |
13. Sanity checks (carried over from the design conversation)
- No central data without the central DB existing. Whole feature is opt-in via
FleetIngest__Enabled. - Customer model is unchanged. Every customer keeps full control of their own data. We copy roll-ups (and raw measurements for fidelity), not authority.
- One ingest endpoint, no fan-out. N customers all POST to the same
/api/fleet/ingest. Admin scales vertically first, shards horizontally later if needed (each customer'sFleetIngest__Urlcan point at a specific Admin instance).