Tau.Acuvim/docs/acuvim-spec-06.md
Renier Forster 84a0668c54 Initial commit: Tau Acuvim IoT monitoring system
Complete IoT monitoring platform for Acuvim II power meters via ESP32.

Firmware (Phases 1-7):
- ESP32-WROVER-B (TTGO T-Call v1.4) with RS485 Modbus RTU
- WiFi STA+AP concurrent mode with GSM/GPRS failover
- Transport abstraction layer with 4 priority modes
- MQTT protocol with 20 commands, LWT, QoS, exponential backoff
- SD card offline buffering with JSONL rotation and non-blocking drain
- OTA firmware updates with dual partition rollback protection
- Watchdog timer, crash loop detection, Acuvim health monitoring
- Captive portal provisioning with AP mode

Console backend (Phase 8):
- .NET 10 minimal API with PostgreSQL + EF Core
- JWT authentication, SignalR real-time updates
- MQTTnet 5.x bridge service with health monitoring
- Device, telemetry, firmware, alert, group management
- Rate limiting, security headers, Swagger/OpenAPI

Frontend (Phase 9):
- React 18 + TypeScript + Vite with Ant Design 5
- ECharts telemetry visualization, TanStack Query
- SignalR live updates, device management UI
- Dashboard, fleet management, firmware deployment

Testing & Production (Phase 10):
- 28 firmware unit tests (Modbus, JSON, config, version)
- 23 xUnit backend tests (device, telemetry, command, alert)
- Docker Compose with nginx, TLS MQTT, PostgreSQL
- Production deployment, commissioning, and troubleshooting docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-16 19:05:32 +02:00

507 lines
15 KiB
Markdown

# Phase 6: OTA Firmware Updates
## Objective
Implement over-the-air firmware update capability via both WiFi and GSM. The device checks for updates from the console application's firmware server, downloads and validates the binary, and performs a safe update with rollback protection. Supports both push (console-initiated) and pull (device-initiated) update models.
## Prerequisites
- Phase 5 complete (all transport and storage working)
- Console application firmware hosting endpoint (Phase 8, or temporary HTTP server for testing)
- Dual OTA partition table configured
## Deliverables
1. OTA partition table (two app partitions for safe rollback)
2. HTTP-based firmware download over WiFi and GSM
3. Push OTA via MQTT command from console
4. Pull OTA via periodic version check
5. Update validation and automatic rollback on boot failure
6. Firmware version reporting
---
## 6.1 Partition Table
### Custom Partition Table for OTA
Replace the default partition table with an OTA-capable layout. Create `firmware/partitions_ota.csv`:
```csv
# Name, Type, SubType, Offset, Size, Flags
nvs, data, nvs, 0x9000, 0x5000,
otadata, data, ota, 0xe000, 0x2000,
app0, app, ota_0, 0x10000, 0x1E0000,
app1, app, ota_1, 0x1F0000, 0x1E0000,
littlefs, data, spiffs, 0x3D0000, 0x20000,
nvs_keys, data, nvs_keys, 0x3F0000, 0x1000,
```
**Layout summary (8MB flash):**
| Partition | Size | Purpose |
|-----------|------|---------|
| nvs | 20KB | Non-volatile storage (config) |
| otadata | 8KB | OTA state tracking |
| app0 | 1,920KB | Application slot 0 |
| app1 | 1,920KB | Application slot 1 |
| littlefs | 128KB | Web UI files |
| nvs_keys | 4KB | NVS encryption keys (optional) |
**Update `platformio.ini`:**
```ini
board_build.partitions = partitions_ota.csv
```
**Note:** Each app partition is ~1.9MB. Monitor firmware binary size — if it approaches this limit, consider optimizing or increasing partition size by reducing LittleFS.
## 6.2 OTA Manager
### `ota_manager.h` / `ota_manager.cpp`
```cpp
enum class OtaStatus {
IDLE,
CHECKING,
DOWNLOADING,
INSTALLING,
SUCCESS,
FAILED,
ROLLING_BACK
};
struct FirmwareInfo {
String version; // Semantic version (e.g., "1.2.0")
String url; // Download URL
uint32_t size; // Binary size in bytes
String checksum; // MD5 or SHA256 hash
String releaseNotes; // Optional description
bool mandatory; // Force update regardless of version
};
class OtaManager {
public:
void begin(ConfigManager& config, TransportManager& transport);
void loop(); // Periodic version check
// Pull model (device checks for updates)
bool checkForUpdate(FirmwareInfo& info); // Query server for latest version
bool isUpdateAvailable();
// Push model (console sends update command)
bool startUpdate(const String& url, const String& expectedChecksum = "");
// Status
OtaStatus getStatus();
uint8_t getProgress(); // 0-100%
String getStatusMessage();
String getCurrentVersion();
// Rollback
bool markValid(); // Mark current firmware as good
bool rollback(); // Revert to previous firmware
void onProgress(std::function<void(uint8_t percent)> callback);
void onComplete(std::function<void(bool success, const String& message)> callback);
private:
bool performUpdate(const String& url);
bool validateChecksum(const String& expected);
bool compareVersions(const String& current, const String& available);
ConfigManager* config;
TransportManager* transport;
OtaStatus status;
uint8_t progress;
unsigned long lastCheck;
};
```
## 6.3 Firmware Version Scheme
### Semantic Versioning
Format: `MAJOR.MINOR.PATCH` (e.g., `1.2.3`)
- **MAJOR:** Breaking changes (protocol changes, incompatible config format)
- **MINOR:** New features (new register groups, new settings)
- **PATCH:** Bug fixes, optimizations
### Version in Firmware
Define in `version.h`:
```cpp
#define FW_VERSION_MAJOR 1
#define FW_VERSION_MINOR 0
#define FW_VERSION_PATCH 0
#define FW_VERSION "1.0.0"
#define FW_BUILD_DATE __DATE__
#define FW_BUILD_TIME __TIME__
```
### Version Comparison
```cpp
// Returns true if available > current
bool OtaManager::compareVersions(const String& current, const String& available) {
int curMaj, curMin, curPat;
int avlMaj, avlMin, avlPat;
sscanf(current.c_str(), "%d.%d.%d", &curMaj, &curMin, &curPat);
sscanf(available.c_str(), "%d.%d.%d", &avlMaj, &avlMin, &avlPat);
if (avlMaj != curMaj) return avlMaj > curMaj;
if (avlMin != curMin) return avlMin > curMin;
return avlPat > curPat;
}
```
## 6.4 Pull Model (Device-Initiated Check)
### Version Check Endpoint
The device periodically queries the console application:
```
GET {console_url}/api/firmware/check?device_id={id}&current_version={ver}&hardware={hw}
Response (update available):
{
"update_available": true,
"version": "1.2.0",
"url": "https://console.example.com/api/firmware/download/1.2.0",
"size": 1234567,
"checksum": "sha256:abcdef1234567890...",
"release_notes": "Added THD monitoring improvements",
"mandatory": false
}
Response (no update):
{
"update_available": false,
"current_version": "1.0.0"
}
```
### Check Interval
- Default: every 6 hours
- Configurable via NVS (`ota_check_interval_hours`)
- Also checks immediately on boot (after 60 second delay to allow connections to stabilize)
- Can be triggered manually via MQTT command or web UI
### Auto-Update Policy
Configurable behavior when an update is found:
- **Auto-update:** Download and install immediately (default for patch versions)
- **Notify only:** Report available update to console, wait for push command
- **Manual only:** Never auto-update, require explicit trigger
Add to `DeviceConfig`:
```cpp
uint8_t ota_check_interval_hours; // default: 6
uint8_t ota_auto_update; // 0=manual, 1=notify, 2=auto
```
## 6.5 Push Model (Console-Initiated)
### MQTT OTA Command
The console publishes to `{prefix}/{device_id}/cmd`:
```json
{
"cmd": "ota_update",
"request_id": "req-12345",
"url": "https://console.example.com/api/firmware/download/1.2.0",
"version": "1.2.0",
"checksum": "sha256:abcdef1234567890...",
"mandatory": false
}
```
### Device Response Flow
```
1. Receive OTA command
2. Publish ACK: {"request_id":"req-12345","status":"accepted","message":"Starting update..."}
3. Publish progress: {"request_id":"req-12345","status":"downloading","progress":45}
4. Publish progress: {"request_id":"req-12345","status":"installing","progress":100}
5. Reboot
6. After reboot, publish: {"status":"updated","version":"1.2.0"}
OR if rollback: {"status":"rollback","version":"1.0.0","reason":"Boot validation failed"}
```
## 6.6 Update Process
### Download and Install
```cpp
bool OtaManager::performUpdate(const String& url) {
status = OtaStatus::DOWNLOADING;
// Use HTTPClient with either WiFiClient or TinyGSMClient
HTTPClient http;
Client& client = transport->getClient();
http.begin(client, url);
int httpCode = http.GET();
if (httpCode != HTTP_CODE_OK) {
status = OtaStatus::FAILED;
return false;
}
int contentLength = http.getSize();
if (contentLength <= 0) {
status = OtaStatus::FAILED;
return false;
}
// Start OTA update
if (!Update.begin(contentLength)) {
status = OtaStatus::FAILED;
return false;
}
status = OtaStatus::INSTALLING;
// Stream firmware to flash
WiFiClient* stream = http.getStreamPtr();
uint8_t buf[1024];
int totalRead = 0;
while (http.connected() && totalRead < contentLength) {
int available = stream->available();
if (available > 0) {
int read = stream->readBytes(buf, min(available, (int)sizeof(buf)));
Update.write(buf, read);
totalRead += read;
progress = (totalRead * 100) / contentLength;
if (progressCallback) progressCallback(progress);
}
yield();
}
if (Update.end(true)) {
status = OtaStatus::SUCCESS;
return true;
} else {
status = OtaStatus::FAILED;
return false;
}
}
```
### GSM Considerations for OTA
- SIM800L is 2G only — download speeds ~10-20 KB/s
- A 1MB firmware takes ~50-100 seconds over GPRS
- SIM7600 (4G) will be much faster (~100+ KB/s)
- Use chunked download with progress reporting
- Implement download resume if connection drops (Range header)
- Timeout: 10 minutes for GSM download, 2 minutes for WiFi
## 6.7 Rollback Protection
### Boot Validation
ESP32's OTA library tracks which partition is "pending verification." On first boot after update:
```cpp
void setup() {
// Check if this is first boot after OTA
const esp_partition_t* running = esp_ota_get_running_partition();
if (esp_ota_check_rollback_is_possible()) {
// Run self-test
bool healthy = selfTest();
if (healthy) {
esp_ota_mark_app_valid_cancel_rollback();
Serial.println("OTA update validated");
} else {
Serial.println("OTA self-test failed, rolling back");
esp_ota_mark_app_invalid_rollback_and_reboot();
}
}
}
```
### Self-Test Criteria
Before marking the new firmware as valid:
1. Serial port initialized (basic hardware check)
2. NVS configuration loads successfully
3. WiFi or GSM connects within 60 seconds
4. Modbus communication with Acuvim II succeeds (at least one read)
5. MQTT connection succeeds
6. No crash within first 30 seconds
If any critical test fails: rollback to previous firmware automatically.
### Rollback Flow
```
1. New firmware boots
2. Self-test fails (e.g., MQTT won't connect)
3. esp_ota_mark_app_invalid_rollback_and_reboot() called
4. ESP32 reboots into previous firmware
5. Previous firmware publishes rollback notification to MQTT
6. Console marks update as failed for this device
```
## 6.8 Web UI Updates
### Device Tab Addition
```
┌──────────────────────────────────────┐
│ Firmware │
│ │
│ Current Version: v1.0.0 │
│ Build Date: May 16 2026 │
│ Partition: app0 │
│ │
│ Auto-Update: [Notify Only ▼] │
│ Check Interval: [6] hours │
│ │
│ [Check for Update] │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Update Available: v1.2.0 │ │
│ │ Size: 1.2 MB │ │
│ │ Notes: Added THD improvements │ │
│ │ │ │
│ │ [████████░░░░░░░░] 45% │ │
│ │ [Install Update] │ │
│ └──────────────────────────────────┘ │
│ │
│ Manual Upload: │
│ [Choose File] firmware.bin │
│ [Upload & Install] │
└──────────────────────────────────────┘
```
### Manual Upload Endpoint
For cases where the device has no internet access (local AP mode):
```
POST /api/ota/upload
Content-Type: multipart/form-data
Body: firmware binary file
Response:
{
"success": true,
"message": "Firmware uploaded. Installing and rebooting..."
}
```
This allows uploading a `.bin` file directly through the web browser.
### API Endpoints
```
GET /api/ota/status
Response:
{
"current_version": "1.0.0",
"build_date": "May 16 2026",
"partition": "app0",
"update_available": true,
"available_version": "1.2.0",
"auto_update": "notify",
"check_interval_hours": 6,
"last_check": 1716000000,
"ota_status": "idle",
"ota_progress": 0
}
POST /api/ota/check
Response:
{
"update_available": true,
"version": "1.2.0",
"size": 1234567,
"release_notes": "Added THD improvements"
}
POST /api/ota/install
Body:
{
"url": "https://console.example.com/api/firmware/download/1.2.0"
}
Response:
{
"success": true,
"message": "Download started..."
}
POST /api/ota/rollback
Response:
{
"success": true,
"message": "Rolling back to previous firmware..."
}
```
## 6.9 Security
### Firmware Integrity
- **Checksum validation:** SHA256 hash comparison before applying update
- **Size validation:** Compare Content-Length with expected size
- **HTTPS:** Download firmware over HTTPS when possible (WiFi). GSM may use HTTP if TLS memory is too high on SIM800L.
- **Signed firmware (future):** ESP32 supports secure boot with RSA-3072 signed images. Can be enabled later for production.
### Preventing Bricked Devices
- Dual partition scheme ensures there's always a working firmware to fall back to
- Self-test on boot with automatic rollback
- Manual upload via web UI as last resort (AP mode always works)
- Factory firmware can be flashed via USB as absolute fallback
## 6.10 Testing & Validation
| Test | Method | Pass Criteria |
|------|--------|---------------|
| OTA check (pull) | Set up test HTTP server | Device detects available update |
| OTA download WiFi | Trigger update over WiFi | Firmware downloaded, installed, reboots |
| OTA download GSM | Disable WiFi, trigger OTA | Firmware downloaded over GPRS |
| Push OTA via MQTT | Publish OTA command | Device downloads and installs |
| Progress reporting | Monitor MQTT during update | Progress updates received (0-100%) |
| Checksum validation | Serve firmware with wrong checksum | Update rejected |
| Rollback | Upload firmware that fails self-test | Auto-rollback to previous version |
| Manual upload | Upload .bin via web UI | Firmware installed via browser |
| Version comparison | Check with same version | No update triggered |
| No internet | Check without connectivity | Graceful failure, no crash |
| Large firmware | Upload near-max-size binary | Succeeds within partition limit |
| Interrupted download | Kill network mid-download | Clean failure, no corruption |
| Post-update config | Update firmware | All NVS config preserved |
## 6.11 Phase 6 Completion Criteria
- [ ] Dual OTA partition table configured and working
- [ ] Pull model: device checks console for firmware updates
- [ ] Push model: MQTT command triggers firmware update
- [ ] OTA works over WiFi
- [ ] OTA works over GSM
- [ ] SHA256 checksum validation before applying update
- [ ] Progress reported via MQTT and web UI
- [ ] Self-test on first boot after update
- [ ] Automatic rollback on failed self-test
- [ ] Manual firmware upload via web UI (AP mode)
- [ ] Version reported in heartbeat and status API
- [ ] All NVS configuration preserved across updates
- [ ] OTA settings configurable (auto-update policy, check interval)
---
**Previous Phase:** [Phase 5 — SD Card Offline Buffering](acuvim-spec-05.md)
**Next Phase:** [Phase 7 — Heartbeat, Health & Device Registration](acuvim-spec-07.md)