Tau.Acuvim/docs/acuvim-spec-05.md
Renier Forster 84a0668c54 Initial commit: Tau Acuvim IoT monitoring system
Complete IoT monitoring platform for Acuvim II power meters via ESP32.

Firmware (Phases 1-7):
- ESP32-WROVER-B (TTGO T-Call v1.4) with RS485 Modbus RTU
- WiFi STA+AP concurrent mode with GSM/GPRS failover
- Transport abstraction layer with 4 priority modes
- MQTT protocol with 20 commands, LWT, QoS, exponential backoff
- SD card offline buffering with JSONL rotation and non-blocking drain
- OTA firmware updates with dual partition rollback protection
- Watchdog timer, crash loop detection, Acuvim health monitoring
- Captive portal provisioning with AP mode

Console backend (Phase 8):
- .NET 10 minimal API with PostgreSQL + EF Core
- JWT authentication, SignalR real-time updates
- MQTTnet 5.x bridge service with health monitoring
- Device, telemetry, firmware, alert, group management
- Rate limiting, security headers, Swagger/OpenAPI

Frontend (Phase 9):
- React 18 + TypeScript + Vite with Ant Design 5
- ECharts telemetry visualization, TanStack Query
- SignalR live updates, device management UI
- Dashboard, fleet management, firmware deployment

Testing & Production (Phase 10):
- 28 firmware unit tests (Modbus, JSON, config, version)
- 23 xUnit backend tests (device, telemetry, command, alert)
- Docker Compose with nginx, TLS MQTT, PostgreSQL
- Production deployment, commissioning, and troubleshooting docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-16 19:05:32 +02:00

12 KiB

Phase 5: SD Card Offline Buffering

Objective

Add SD card storage as a last-resort data buffer when both WiFi and GSM are unavailable. Telemetry data is written to the SD card in a replay-friendly format and automatically drained to MQTT when connectivity is restored.

Prerequisites

  • Phase 4 complete (transport failover working)
  • MicroSD card (FAT32 formatted, up to 32GB)
  • SD card module or breakout (SPI interface)

Deliverables

  1. SPI-based SD card driver with hot-plug detection
  2. Offline telemetry buffer (newline-delimited JSON files)
  3. Automatic queue drain on connectivity restore
  4. File rotation and cleanup (prevent SD from filling up)
  5. SD card status in web UI and telemetry

5.1 SD Card Hardware

SPI Pin Mapping (from Phase 1)

SD_CS    = GPIO 15     # Chip Select
SD_MOSI  = GPIO 2      # Master Out Slave In
SD_MISO  = GPIO 14     # Master In Slave Out
SD_CLK   = GPIO 12     # Clock

Important: These pins share the HSPI bus. The SD card SPI runs independently of the internal flash (which uses VSPI). Ensure SPI bus is not conflicting with any other peripheral.

SPI Initialization

#include <SPI.h>
#include <SD.h>

SPIClass hspi(HSPI);
hspi.begin(SD_CLK, SD_MISO, SD_MOSI, SD_CS);
SD.begin(SD_CS, hspi);

5.2 SD Card Manager

sd_manager.h / sd_manager.cpp

class SdManager {
public:
    bool begin();                              // Initialize SD card
    bool isAvailable();                        // Card present and mounted
    bool isHealthy();                          // Card mounted and writable

    // Buffering
    bool bufferTelemetry(const AcuvimData& data);  // Write one record
    uint32_t getQueuedCount();                      // Records waiting to send
    bool hasQueuedData();                           // Any records to drain

    // Drain (replay to MQTT)
    bool drainNext(String& payload);           // Get next record for sending
    void confirmDrained(const String& filename, uint32_t position);
    void drainAll(MqttClient& mqtt);           // Drain entire queue

    // Maintenance
    uint64_t getTotalSpace();                  // Total SD capacity
    uint64_t getUsedSpace();                   // Used space
    uint64_t getFreeSpace();                   // Available space
    uint32_t getFileCount();                   // Number of buffer files
    bool cleanup(uint32_t maxAgeDays = 30);    // Delete old files
    bool format();                             // Format SD card (destructive)

private:
    bool mounted;
    String currentFileName;
    uint32_t recordsInCurrentFile;

    String generateFileName();
    bool rotateFile();
    bool shouldRotate();
};

5.3 File Format and Structure

Directory Structure

SD Card Root/
├── /telemetry/
│   ├── 2026-05-16_001.jsonl      # Newline-delimited JSON
│   ├── 2026-05-16_002.jsonl
│   └── 2026-05-17_001.jsonl
├── /drain/                        # Files currently being drained
│   └── (moved here during drain, deleted after)
└── /logs/                         # Optional diagnostic logs
    └── boot.log

File Naming Convention

  • Format: YYYY-MM-DD_NNN.jsonl
  • NNN: Sequential file number within the day (001, 002, ...)
  • .jsonl: JSON Lines format (one JSON object per line)

Record Format

Each line is a complete, self-contained telemetry JSON (same format as MQTT payload from Phase 2):

{"ts":1716000000,"dev":"ACV-AABBCCDDEEFF","v":{"a":230.1,"b":231.4,"c":229.8},"i":{"a":15.2,"b":14.8,"c":15.5},"p":{"total":10.5},"f":50.01,"e":{"imp_act":12345.6},"src":"sd"}
  • Added "src":"sd" field to indicate this was buffered data (vs. real-time)
  • Each line is independently parseable (no array wrapping)
  • File can be read line by line without loading entire file into memory

File Rotation

Rotate to a new file when:

  • Current file exceeds 500KB (~2,000-3,000 records depending on content)
  • Date changes (new day = new file)
  • Current file has 5,000 records (hard limit)

5.4 Write Strategy

Buffering Flow

Acuvim poll cycle:
  ├── Transport connected?
  │   ├── YES: Publish to MQTT directly
  │   │        └── Also check: hasQueuedData()?
  │   │            └── YES: drain queued records in background
  │   └── NO: Buffer to SD card
  │        └── SD available?
  │            ├── YES: Write to current .jsonl file
  │            └── NO: Data is lost (log warning)
  └── Continue polling

Write Implementation

bool SdManager::bufferTelemetry(const AcuvimData& data) {
    if (!mounted) return false;

    if (shouldRotate()) {
        rotateFile();
    }

    File file = SD.open(currentFileName, FILE_APPEND);
    if (!file) return false;

    // Serialize AcuvimData to compact JSON (same format as MQTT)
    StaticJsonDocument<512> doc;
    // ... populate doc from data ...
    doc["src"] = "sd";

    String line;
    serializeJson(doc, line);
    file.println(line);
    file.close();

    recordsInCurrentFile++;
    return true;
}

Design notes:

  • Open file, write, close immediately (crash-safe, no data loss on power failure)
  • Do NOT keep files open between writes (SD card can be removed)
  • FILE_APPEND mode ensures data is added to end of file
  • Each write is atomic at the line level

5.5 Drain Strategy

Drain Flow

When connectivity is restored, queued records are replayed to MQTT:

drainAll() logic:
  1. List files in /telemetry/ sorted by name (oldest first)
  2. For each file:
     a. Move file to /drain/ directory (prevents double-read)
     b. Open file, read line by line
     c. For each line:
        - Parse JSON
        - Publish to MQTT telemetry topic
        - Throttle: max 10 records/second (avoid MQTT flood)
     d. After all lines published: delete file from /drain/
     e. If MQTT disconnects mid-drain: stop, leave remaining files
  3. Log drain summary (records sent, time taken)

Drain Throttling

  • Max drain rate: 10 records per second
  • Yield between publishes to allow main loop tasks (Modbus polling, MQTT keepalive)
  • Drain runs as a background task in the main loop, not a blocking operation
  • During drain, live telemetry continues to publish normally (live data takes priority)

Drain Integration in Main Loop

void loop() {
    mqtt.loop();
    transport.loop();

    // Normal telemetry polling
    if (pollTimer.elapsed()) {
        AcuvimData data;
        if (acuvim.readAll(data)) {
            if (transport.isConnected() && mqtt.isConnected()) {
                mqtt.publishTelemetry(data);
            } else if (sdManager.isAvailable()) {
                sdManager.bufferTelemetry(data);
            }
        }
    }

    // Background drain (non-blocking, processes a few records per loop)
    if (transport.isConnected() && mqtt.isConnected() && sdManager.hasQueuedData()) {
        sdManager.drainBatch(mqtt, 5);  // Drain up to 5 records per loop iteration
    }
}

5.6 Storage Management

Capacity Planning

Poll Interval Record Size Records/Hour MB/Day Days on 4GB
5 seconds ~250 bytes 720 ~4.3 ~900
10 seconds ~250 bytes 360 ~2.2 ~1,800
30 seconds ~250 bytes 120 ~0.7 ~5,500

At 5-second polling, a 4GB SD card holds approximately 2.5 years of data. Storage is not a concern.

Cleanup Policy

  • Default: delete files older than 30 days
  • Configurable via NVS (sd_retention_days)
  • If SD card is >90% full: delete oldest files regardless of age
  • Cleanup runs once per hour

Error Handling

  • SD card removed: set mounted = false, log warning, data lost until reinserted
  • SD card full: delete oldest file, retry write
  • Corrupt file during drain: skip file, move to /errors/, continue with next
  • SD write failure: retry once, then skip and log

5.7 Hot-Plug Detection

Check SD card presence periodically (every 30 seconds):

void SdManager::checkCard() {
    if (mounted && !SD.exists("/")) {
        // Card was removed
        mounted = false;
        Serial.println("SD card removed");
    } else if (!mounted) {
        // Try to remount
        if (SD.begin(SD_CS, hspi)) {
            mounted = true;
            ensureDirectories();
            Serial.println("SD card inserted");
        }
    }
}

5.8 Devices Without SD Card

Similar to GSM (Phase 4), handle gracefully:

  • On boot, attempt SD card initialization
  • If no SD card: set sd_available = false
  • No error logs for expected absence
  • Telemetry flow: transport only (WiFi/GSM), data lost if no transport
  • Web UI shows "No SD card" status

5.9 Web UI Updates

Status Page Addition

┌──────────────────────────────────────┐
│ Storage                              │
│ ● SD Card: 4.0 GB (0.1% used)       │
│   Queued: 0 records                  │
│   Files: 3                           │
│   Retention: 30 days                 │
│                                      │
│ [Drain Now]  [Cleanup]               │
└──────────────────────────────────────┘

API Endpoints

GET /api/sd/status
Response:
{
  "available": true,
  "total_bytes": 4294967296,
  "used_bytes": 4194304,
  "free_bytes": 4290772992,
  "queued_records": 0,
  "file_count": 3,
  "retention_days": 30
}

POST /api/sd/drain
Response:
{
  "success": true,
  "message": "Drain started. 0 records queued."
}

POST /api/sd/cleanup
Response:
{
  "success": true,
  "deleted_files": 2,
  "freed_bytes": 1048576
}

5.10 Testing & Validation

Test Method Pass Criteria
SD card init Insert formatted SD card Mounted, directories created
Buffer write Disable WiFi and GSM Records written to .jsonl file
File format Read .jsonl on PC Valid JSON per line, parseable
File rotation Write >500KB New file created with incremented number
Date rotation Cross midnight while buffering New date-prefixed file
Drain on reconnect Restore WiFi after buffering Records published to MQTT in order
Drain throttle Buffer 1000 records, reconnect Drain rate <= 10/sec, main loop responsive
Live + drain Reconnect during polling Live telemetry published + drain in background
No SD card Boot without SD card Graceful, no errors, WiFi/GSM only
SD card removal Remove SD during operation Detected within 30s, no crash
SD card reinsert Reinsert after removal Remounted, buffering resumes
Cleanup Create files > 30 days old Old files deleted
SD full Fill SD card Oldest file deleted, write continues
Power loss during write Kill power mid-write At most last line lost, file intact

5.11 Phase 5 Completion Criteria

  • SD card initializes and creates directory structure
  • Telemetry buffered to SD when no transport available
  • JSONL format verified (parseable per line)
  • File rotation works (size and date based)
  • Automatic drain on connectivity restore
  • Drain is non-blocking and throttled
  • Live telemetry continues during drain
  • Hot-plug detection (removal and reinsertion)
  • Graceful handling when no SD card present
  • Cleanup of old files works
  • Storage stats visible in web UI
  • SD status included in device status API

Previous Phase: Phase 4 — GSM & Transport Failover Next Phase: Phase 6 — OTA Firmware Updates