Tau.Acuvim/docs/acuvim-spec-06.md
Renier Forster 84a0668c54 Initial commit: Tau Acuvim IoT monitoring system
Complete IoT monitoring platform for Acuvim II power meters via ESP32.

Firmware (Phases 1-7):
- ESP32-WROVER-B (TTGO T-Call v1.4) with RS485 Modbus RTU
- WiFi STA+AP concurrent mode with GSM/GPRS failover
- Transport abstraction layer with 4 priority modes
- MQTT protocol with 20 commands, LWT, QoS, exponential backoff
- SD card offline buffering with JSONL rotation and non-blocking drain
- OTA firmware updates with dual partition rollback protection
- Watchdog timer, crash loop detection, Acuvim health monitoring
- Captive portal provisioning with AP mode

Console backend (Phase 8):
- .NET 10 minimal API with PostgreSQL + EF Core
- JWT authentication, SignalR real-time updates
- MQTTnet 5.x bridge service with health monitoring
- Device, telemetry, firmware, alert, group management
- Rate limiting, security headers, Swagger/OpenAPI

Frontend (Phase 9):
- React 18 + TypeScript + Vite with Ant Design 5
- ECharts telemetry visualization, TanStack Query
- SignalR live updates, device management UI
- Dashboard, fleet management, firmware deployment

Testing & Production (Phase 10):
- 28 firmware unit tests (Modbus, JSON, config, version)
- 23 xUnit backend tests (device, telemetry, command, alert)
- Docker Compose with nginx, TLS MQTT, PostgreSQL
- Production deployment, commissioning, and troubleshooting docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-16 19:05:32 +02:00

15 KiB

Phase 6: OTA Firmware Updates

Objective

Implement over-the-air firmware update capability via both WiFi and GSM. The device checks for updates from the console application's firmware server, downloads and validates the binary, and performs a safe update with rollback protection. Supports both push (console-initiated) and pull (device-initiated) update models.

Prerequisites

  • Phase 5 complete (all transport and storage working)
  • Console application firmware hosting endpoint (Phase 8, or temporary HTTP server for testing)
  • Dual OTA partition table configured

Deliverables

  1. OTA partition table (two app partitions for safe rollback)
  2. HTTP-based firmware download over WiFi and GSM
  3. Push OTA via MQTT command from console
  4. Pull OTA via periodic version check
  5. Update validation and automatic rollback on boot failure
  6. Firmware version reporting

6.1 Partition Table

Custom Partition Table for OTA

Replace the default partition table with an OTA-capable layout. Create firmware/partitions_ota.csv:

# Name,    Type,  SubType,  Offset,    Size,     Flags
nvs,       data,  nvs,      0x9000,    0x5000,
otadata,   data,  ota,      0xe000,    0x2000,
app0,      app,   ota_0,    0x10000,   0x1E0000,
app1,      app,   ota_1,    0x1F0000,  0x1E0000,
littlefs,  data,  spiffs,   0x3D0000,  0x20000,
nvs_keys,  data,  nvs_keys, 0x3F0000,  0x1000,

Layout summary (8MB flash):

Partition Size Purpose
nvs 20KB Non-volatile storage (config)
otadata 8KB OTA state tracking
app0 1,920KB Application slot 0
app1 1,920KB Application slot 1
littlefs 128KB Web UI files
nvs_keys 4KB NVS encryption keys (optional)

Update platformio.ini:

board_build.partitions = partitions_ota.csv

Note: Each app partition is ~1.9MB. Monitor firmware binary size — if it approaches this limit, consider optimizing or increasing partition size by reducing LittleFS.

6.2 OTA Manager

ota_manager.h / ota_manager.cpp

enum class OtaStatus {
    IDLE,
    CHECKING,
    DOWNLOADING,
    INSTALLING,
    SUCCESS,
    FAILED,
    ROLLING_BACK
};

struct FirmwareInfo {
    String version;           // Semantic version (e.g., "1.2.0")
    String url;               // Download URL
    uint32_t size;             // Binary size in bytes
    String checksum;           // MD5 or SHA256 hash
    String releaseNotes;       // Optional description
    bool mandatory;            // Force update regardless of version
};

class OtaManager {
public:
    void begin(ConfigManager& config, TransportManager& transport);
    void loop();                                // Periodic version check

    // Pull model (device checks for updates)
    bool checkForUpdate(FirmwareInfo& info);     // Query server for latest version
    bool isUpdateAvailable();

    // Push model (console sends update command)
    bool startUpdate(const String& url, const String& expectedChecksum = "");

    // Status
    OtaStatus getStatus();
    uint8_t getProgress();                       // 0-100%
    String getStatusMessage();
    String getCurrentVersion();

    // Rollback
    bool markValid();                            // Mark current firmware as good
    bool rollback();                             // Revert to previous firmware

    void onProgress(std::function<void(uint8_t percent)> callback);
    void onComplete(std::function<void(bool success, const String& message)> callback);

private:
    bool performUpdate(const String& url);
    bool validateChecksum(const String& expected);
    bool compareVersions(const String& current, const String& available);

    ConfigManager* config;
    TransportManager* transport;
    OtaStatus status;
    uint8_t progress;
    unsigned long lastCheck;
};

6.3 Firmware Version Scheme

Semantic Versioning

Format: MAJOR.MINOR.PATCH (e.g., 1.2.3)

  • MAJOR: Breaking changes (protocol changes, incompatible config format)
  • MINOR: New features (new register groups, new settings)
  • PATCH: Bug fixes, optimizations

Version in Firmware

Define in version.h:

#define FW_VERSION_MAJOR 1
#define FW_VERSION_MINOR 0
#define FW_VERSION_PATCH 0
#define FW_VERSION "1.0.0"
#define FW_BUILD_DATE __DATE__
#define FW_BUILD_TIME __TIME__

Version Comparison

// Returns true if available > current
bool OtaManager::compareVersions(const String& current, const String& available) {
    int curMaj, curMin, curPat;
    int avlMaj, avlMin, avlPat;
    sscanf(current.c_str(), "%d.%d.%d", &curMaj, &curMin, &curPat);
    sscanf(available.c_str(), "%d.%d.%d", &avlMaj, &avlMin, &avlPat);

    if (avlMaj != curMaj) return avlMaj > curMaj;
    if (avlMin != curMin) return avlMin > curMin;
    return avlPat > curPat;
}

6.4 Pull Model (Device-Initiated Check)

Version Check Endpoint

The device periodically queries the console application:

GET {console_url}/api/firmware/check?device_id={id}&current_version={ver}&hardware={hw}

Response (update available):
{
  "update_available": true,
  "version": "1.2.0",
  "url": "https://console.example.com/api/firmware/download/1.2.0",
  "size": 1234567,
  "checksum": "sha256:abcdef1234567890...",
  "release_notes": "Added THD monitoring improvements",
  "mandatory": false
}

Response (no update):
{
  "update_available": false,
  "current_version": "1.0.0"
}

Check Interval

  • Default: every 6 hours
  • Configurable via NVS (ota_check_interval_hours)
  • Also checks immediately on boot (after 60 second delay to allow connections to stabilize)
  • Can be triggered manually via MQTT command or web UI

Auto-Update Policy

Configurable behavior when an update is found:

  • Auto-update: Download and install immediately (default for patch versions)
  • Notify only: Report available update to console, wait for push command
  • Manual only: Never auto-update, require explicit trigger

Add to DeviceConfig:

uint8_t ota_check_interval_hours;  // default: 6
uint8_t ota_auto_update;           // 0=manual, 1=notify, 2=auto

6.5 Push Model (Console-Initiated)

MQTT OTA Command

The console publishes to {prefix}/{device_id}/cmd:

{
  "cmd": "ota_update",
  "request_id": "req-12345",
  "url": "https://console.example.com/api/firmware/download/1.2.0",
  "version": "1.2.0",
  "checksum": "sha256:abcdef1234567890...",
  "mandatory": false
}

Device Response Flow

1. Receive OTA command
2. Publish ACK: {"request_id":"req-12345","status":"accepted","message":"Starting update..."}
3. Publish progress: {"request_id":"req-12345","status":"downloading","progress":45}
4. Publish progress: {"request_id":"req-12345","status":"installing","progress":100}
5. Reboot
6. After reboot, publish: {"status":"updated","version":"1.2.0"}
   OR if rollback: {"status":"rollback","version":"1.0.0","reason":"Boot validation failed"}

6.6 Update Process

Download and Install

bool OtaManager::performUpdate(const String& url) {
    status = OtaStatus::DOWNLOADING;

    // Use HTTPClient with either WiFiClient or TinyGSMClient
    HTTPClient http;
    Client& client = transport->getClient();
    http.begin(client, url);

    int httpCode = http.GET();
    if (httpCode != HTTP_CODE_OK) {
        status = OtaStatus::FAILED;
        return false;
    }

    int contentLength = http.getSize();
    if (contentLength <= 0) {
        status = OtaStatus::FAILED;
        return false;
    }

    // Start OTA update
    if (!Update.begin(contentLength)) {
        status = OtaStatus::FAILED;
        return false;
    }

    status = OtaStatus::INSTALLING;

    // Stream firmware to flash
    WiFiClient* stream = http.getStreamPtr();
    uint8_t buf[1024];
    int totalRead = 0;

    while (http.connected() && totalRead < contentLength) {
        int available = stream->available();
        if (available > 0) {
            int read = stream->readBytes(buf, min(available, (int)sizeof(buf)));
            Update.write(buf, read);
            totalRead += read;
            progress = (totalRead * 100) / contentLength;
            if (progressCallback) progressCallback(progress);
        }
        yield();
    }

    if (Update.end(true)) {
        status = OtaStatus::SUCCESS;
        return true;
    } else {
        status = OtaStatus::FAILED;
        return false;
    }
}

GSM Considerations for OTA

  • SIM800L is 2G only — download speeds ~10-20 KB/s
  • A 1MB firmware takes ~50-100 seconds over GPRS
  • SIM7600 (4G) will be much faster (~100+ KB/s)
  • Use chunked download with progress reporting
  • Implement download resume if connection drops (Range header)
  • Timeout: 10 minutes for GSM download, 2 minutes for WiFi

6.7 Rollback Protection

Boot Validation

ESP32's OTA library tracks which partition is "pending verification." On first boot after update:

void setup() {
    // Check if this is first boot after OTA
    const esp_partition_t* running = esp_ota_get_running_partition();

    if (esp_ota_check_rollback_is_possible()) {
        // Run self-test
        bool healthy = selfTest();

        if (healthy) {
            esp_ota_mark_app_valid_cancel_rollback();
            Serial.println("OTA update validated");
        } else {
            Serial.println("OTA self-test failed, rolling back");
            esp_ota_mark_app_invalid_rollback_and_reboot();
        }
    }
}

Self-Test Criteria

Before marking the new firmware as valid:

  1. Serial port initialized (basic hardware check)
  2. NVS configuration loads successfully
  3. WiFi or GSM connects within 60 seconds
  4. Modbus communication with Acuvim II succeeds (at least one read)
  5. MQTT connection succeeds
  6. No crash within first 30 seconds

If any critical test fails: rollback to previous firmware automatically.

Rollback Flow

1. New firmware boots
2. Self-test fails (e.g., MQTT won't connect)
3. esp_ota_mark_app_invalid_rollback_and_reboot() called
4. ESP32 reboots into previous firmware
5. Previous firmware publishes rollback notification to MQTT
6. Console marks update as failed for this device

6.8 Web UI Updates

Device Tab Addition

┌──────────────────────────────────────┐
│ Firmware                             │
│                                      │
│ Current Version: v1.0.0              │
│ Build Date: May 16 2026              │
│ Partition: app0                      │
│                                      │
│ Auto-Update: [Notify Only ▼]         │
│ Check Interval: [6] hours            │
│                                      │
│ [Check for Update]                   │
│                                      │
│ ┌──────────────────────────────────┐ │
│ │ Update Available: v1.2.0        │ │
│ │ Size: 1.2 MB                    │ │
│ │ Notes: Added THD improvements   │ │
│ │                                 │ │
│ │ [████████░░░░░░░░] 45%         │ │
│ │        [Install Update]         │ │
│ └──────────────────────────────────┘ │
│                                      │
│ Manual Upload:                       │
│ [Choose File] firmware.bin           │
│        [Upload & Install]            │
└──────────────────────────────────────┘

Manual Upload Endpoint

For cases where the device has no internet access (local AP mode):

POST /api/ota/upload
Content-Type: multipart/form-data
Body: firmware binary file

Response:
{
  "success": true,
  "message": "Firmware uploaded. Installing and rebooting..."
}

This allows uploading a .bin file directly through the web browser.

API Endpoints

GET /api/ota/status
Response:
{
  "current_version": "1.0.0",
  "build_date": "May 16 2026",
  "partition": "app0",
  "update_available": true,
  "available_version": "1.2.0",
  "auto_update": "notify",
  "check_interval_hours": 6,
  "last_check": 1716000000,
  "ota_status": "idle",
  "ota_progress": 0
}

POST /api/ota/check
Response:
{
  "update_available": true,
  "version": "1.2.0",
  "size": 1234567,
  "release_notes": "Added THD improvements"
}

POST /api/ota/install
Body:
{
  "url": "https://console.example.com/api/firmware/download/1.2.0"
}
Response:
{
  "success": true,
  "message": "Download started..."
}

POST /api/ota/rollback
Response:
{
  "success": true,
  "message": "Rolling back to previous firmware..."
}

6.9 Security

Firmware Integrity

  • Checksum validation: SHA256 hash comparison before applying update
  • Size validation: Compare Content-Length with expected size
  • HTTPS: Download firmware over HTTPS when possible (WiFi). GSM may use HTTP if TLS memory is too high on SIM800L.
  • Signed firmware (future): ESP32 supports secure boot with RSA-3072 signed images. Can be enabled later for production.

Preventing Bricked Devices

  • Dual partition scheme ensures there's always a working firmware to fall back to
  • Self-test on boot with automatic rollback
  • Manual upload via web UI as last resort (AP mode always works)
  • Factory firmware can be flashed via USB as absolute fallback

6.10 Testing & Validation

Test Method Pass Criteria
OTA check (pull) Set up test HTTP server Device detects available update
OTA download WiFi Trigger update over WiFi Firmware downloaded, installed, reboots
OTA download GSM Disable WiFi, trigger OTA Firmware downloaded over GPRS
Push OTA via MQTT Publish OTA command Device downloads and installs
Progress reporting Monitor MQTT during update Progress updates received (0-100%)
Checksum validation Serve firmware with wrong checksum Update rejected
Rollback Upload firmware that fails self-test Auto-rollback to previous version
Manual upload Upload .bin via web UI Firmware installed via browser
Version comparison Check with same version No update triggered
No internet Check without connectivity Graceful failure, no crash
Large firmware Upload near-max-size binary Succeeds within partition limit
Interrupted download Kill network mid-download Clean failure, no corruption
Post-update config Update firmware All NVS config preserved

6.11 Phase 6 Completion Criteria

  • Dual OTA partition table configured and working
  • Pull model: device checks console for firmware updates
  • Push model: MQTT command triggers firmware update
  • OTA works over WiFi
  • OTA works over GSM
  • SHA256 checksum validation before applying update
  • Progress reported via MQTT and web UI
  • Self-test on first boot after update
  • Automatic rollback on failed self-test
  • Manual firmware upload via web UI (AP mode)
  • Version reported in heartbeat and status API
  • All NVS configuration preserved across updates
  • OTA settings configurable (auto-update policy, check interval)

Previous Phase: Phase 5 — SD Card Offline Buffering Next Phase: Phase 7 — Heartbeat, Health & Device Registration