Tau.Acuvim/docs/acuvim-spec-05.md
Renier Forster 84a0668c54 Initial commit: Tau Acuvim IoT monitoring system
Complete IoT monitoring platform for Acuvim II power meters via ESP32.

Firmware (Phases 1-7):
- ESP32-WROVER-B (TTGO T-Call v1.4) with RS485 Modbus RTU
- WiFi STA+AP concurrent mode with GSM/GPRS failover
- Transport abstraction layer with 4 priority modes
- MQTT protocol with 20 commands, LWT, QoS, exponential backoff
- SD card offline buffering with JSONL rotation and non-blocking drain
- OTA firmware updates with dual partition rollback protection
- Watchdog timer, crash loop detection, Acuvim health monitoring
- Captive portal provisioning with AP mode

Console backend (Phase 8):
- .NET 10 minimal API with PostgreSQL + EF Core
- JWT authentication, SignalR real-time updates
- MQTTnet 5.x bridge service with health monitoring
- Device, telemetry, firmware, alert, group management
- Rate limiting, security headers, Swagger/OpenAPI

Frontend (Phase 9):
- React 18 + TypeScript + Vite with Ant Design 5
- ECharts telemetry visualization, TanStack Query
- SignalR live updates, device management UI
- Dashboard, fleet management, firmware deployment

Testing & Production (Phase 10):
- 28 firmware unit tests (Modbus, JSON, config, version)
- 23 xUnit backend tests (device, telemetry, command, alert)
- Docker Compose with nginx, TLS MQTT, PostgreSQL
- Production deployment, commissioning, and troubleshooting docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-16 19:05:32 +02:00

375 lines
12 KiB
Markdown

# Phase 5: SD Card Offline Buffering
## Objective
Add SD card storage as a last-resort data buffer when both WiFi and GSM are unavailable. Telemetry data is written to the SD card in a replay-friendly format and automatically drained to MQTT when connectivity is restored.
## Prerequisites
- Phase 4 complete (transport failover working)
- MicroSD card (FAT32 formatted, up to 32GB)
- SD card module or breakout (SPI interface)
## Deliverables
1. SPI-based SD card driver with hot-plug detection
2. Offline telemetry buffer (newline-delimited JSON files)
3. Automatic queue drain on connectivity restore
4. File rotation and cleanup (prevent SD from filling up)
5. SD card status in web UI and telemetry
---
## 5.1 SD Card Hardware
### SPI Pin Mapping (from Phase 1)
```
SD_CS = GPIO 15 # Chip Select
SD_MOSI = GPIO 2 # Master Out Slave In
SD_MISO = GPIO 14 # Master In Slave Out
SD_CLK = GPIO 12 # Clock
```
**Important:** These pins share the HSPI bus. The SD card SPI runs independently of the internal flash (which uses VSPI). Ensure SPI bus is not conflicting with any other peripheral.
### SPI Initialization
```cpp
#include <SPI.h>
#include <SD.h>
SPIClass hspi(HSPI);
hspi.begin(SD_CLK, SD_MISO, SD_MOSI, SD_CS);
SD.begin(SD_CS, hspi);
```
## 5.2 SD Card Manager
### `sd_manager.h` / `sd_manager.cpp`
```cpp
class SdManager {
public:
bool begin(); // Initialize SD card
bool isAvailable(); // Card present and mounted
bool isHealthy(); // Card mounted and writable
// Buffering
bool bufferTelemetry(const AcuvimData& data); // Write one record
uint32_t getQueuedCount(); // Records waiting to send
bool hasQueuedData(); // Any records to drain
// Drain (replay to MQTT)
bool drainNext(String& payload); // Get next record for sending
void confirmDrained(const String& filename, uint32_t position);
void drainAll(MqttClient& mqtt); // Drain entire queue
// Maintenance
uint64_t getTotalSpace(); // Total SD capacity
uint64_t getUsedSpace(); // Used space
uint64_t getFreeSpace(); // Available space
uint32_t getFileCount(); // Number of buffer files
bool cleanup(uint32_t maxAgeDays = 30); // Delete old files
bool format(); // Format SD card (destructive)
private:
bool mounted;
String currentFileName;
uint32_t recordsInCurrentFile;
String generateFileName();
bool rotateFile();
bool shouldRotate();
};
```
## 5.3 File Format and Structure
### Directory Structure
```
SD Card Root/
├── /telemetry/
│ ├── 2026-05-16_001.jsonl # Newline-delimited JSON
│ ├── 2026-05-16_002.jsonl
│ └── 2026-05-17_001.jsonl
├── /drain/ # Files currently being drained
│ └── (moved here during drain, deleted after)
└── /logs/ # Optional diagnostic logs
└── boot.log
```
### File Naming Convention
- Format: `YYYY-MM-DD_NNN.jsonl`
- `NNN`: Sequential file number within the day (001, 002, ...)
- `.jsonl`: JSON Lines format (one JSON object per line)
### Record Format
Each line is a complete, self-contained telemetry JSON (same format as MQTT payload from Phase 2):
```json
{"ts":1716000000,"dev":"ACV-AABBCCDDEEFF","v":{"a":230.1,"b":231.4,"c":229.8},"i":{"a":15.2,"b":14.8,"c":15.5},"p":{"total":10.5},"f":50.01,"e":{"imp_act":12345.6},"src":"sd"}
```
- Added `"src":"sd"` field to indicate this was buffered data (vs. real-time)
- Each line is independently parseable (no array wrapping)
- File can be read line by line without loading entire file into memory
### File Rotation
Rotate to a new file when:
- Current file exceeds 500KB (~2,000-3,000 records depending on content)
- Date changes (new day = new file)
- Current file has 5,000 records (hard limit)
## 5.4 Write Strategy
### Buffering Flow
```
Acuvim poll cycle:
├── Transport connected?
│ ├── YES: Publish to MQTT directly
│ │ └── Also check: hasQueuedData()?
│ │ └── YES: drain queued records in background
│ └── NO: Buffer to SD card
│ └── SD available?
│ ├── YES: Write to current .jsonl file
│ └── NO: Data is lost (log warning)
└── Continue polling
```
### Write Implementation
```cpp
bool SdManager::bufferTelemetry(const AcuvimData& data) {
if (!mounted) return false;
if (shouldRotate()) {
rotateFile();
}
File file = SD.open(currentFileName, FILE_APPEND);
if (!file) return false;
// Serialize AcuvimData to compact JSON (same format as MQTT)
StaticJsonDocument<512> doc;
// ... populate doc from data ...
doc["src"] = "sd";
String line;
serializeJson(doc, line);
file.println(line);
file.close();
recordsInCurrentFile++;
return true;
}
```
**Design notes:**
- Open file, write, close immediately (crash-safe, no data loss on power failure)
- Do NOT keep files open between writes (SD card can be removed)
- `FILE_APPEND` mode ensures data is added to end of file
- Each write is atomic at the line level
## 5.5 Drain Strategy
### Drain Flow
When connectivity is restored, queued records are replayed to MQTT:
```
drainAll() logic:
1. List files in /telemetry/ sorted by name (oldest first)
2. For each file:
a. Move file to /drain/ directory (prevents double-read)
b. Open file, read line by line
c. For each line:
- Parse JSON
- Publish to MQTT telemetry topic
- Throttle: max 10 records/second (avoid MQTT flood)
d. After all lines published: delete file from /drain/
e. If MQTT disconnects mid-drain: stop, leave remaining files
3. Log drain summary (records sent, time taken)
```
### Drain Throttling
- Max drain rate: 10 records per second
- Yield between publishes to allow main loop tasks (Modbus polling, MQTT keepalive)
- Drain runs as a background task in the main loop, not a blocking operation
- During drain, live telemetry continues to publish normally (live data takes priority)
### Drain Integration in Main Loop
```cpp
void loop() {
mqtt.loop();
transport.loop();
// Normal telemetry polling
if (pollTimer.elapsed()) {
AcuvimData data;
if (acuvim.readAll(data)) {
if (transport.isConnected() && mqtt.isConnected()) {
mqtt.publishTelemetry(data);
} else if (sdManager.isAvailable()) {
sdManager.bufferTelemetry(data);
}
}
}
// Background drain (non-blocking, processes a few records per loop)
if (transport.isConnected() && mqtt.isConnected() && sdManager.hasQueuedData()) {
sdManager.drainBatch(mqtt, 5); // Drain up to 5 records per loop iteration
}
}
```
## 5.6 Storage Management
### Capacity Planning
| Poll Interval | Record Size | Records/Hour | MB/Day | Days on 4GB |
|---------------|-------------|--------------|--------|-------------|
| 5 seconds | ~250 bytes | 720 | ~4.3 | ~900 |
| 10 seconds | ~250 bytes | 360 | ~2.2 | ~1,800 |
| 30 seconds | ~250 bytes | 120 | ~0.7 | ~5,500 |
At 5-second polling, a 4GB SD card holds approximately 2.5 years of data. Storage is not a concern.
### Cleanup Policy
- Default: delete files older than 30 days
- Configurable via NVS (`sd_retention_days`)
- If SD card is >90% full: delete oldest files regardless of age
- Cleanup runs once per hour
### Error Handling
- SD card removed: set `mounted = false`, log warning, data lost until reinserted
- SD card full: delete oldest file, retry write
- Corrupt file during drain: skip file, move to `/errors/`, continue with next
- SD write failure: retry once, then skip and log
## 5.7 Hot-Plug Detection
Check SD card presence periodically (every 30 seconds):
```cpp
void SdManager::checkCard() {
if (mounted && !SD.exists("/")) {
// Card was removed
mounted = false;
Serial.println("SD card removed");
} else if (!mounted) {
// Try to remount
if (SD.begin(SD_CS, hspi)) {
mounted = true;
ensureDirectories();
Serial.println("SD card inserted");
}
}
}
```
## 5.8 Devices Without SD Card
Similar to GSM (Phase 4), handle gracefully:
- On boot, attempt SD card initialization
- If no SD card: set `sd_available = false`
- No error logs for expected absence
- Telemetry flow: transport only (WiFi/GSM), data lost if no transport
- Web UI shows "No SD card" status
## 5.9 Web UI Updates
### Status Page Addition
```
┌──────────────────────────────────────┐
│ Storage │
│ ● SD Card: 4.0 GB (0.1% used) │
│ Queued: 0 records │
│ Files: 3 │
│ Retention: 30 days │
│ │
│ [Drain Now] [Cleanup] │
└──────────────────────────────────────┘
```
### API Endpoints
```
GET /api/sd/status
Response:
{
"available": true,
"total_bytes": 4294967296,
"used_bytes": 4194304,
"free_bytes": 4290772992,
"queued_records": 0,
"file_count": 3,
"retention_days": 30
}
POST /api/sd/drain
Response:
{
"success": true,
"message": "Drain started. 0 records queued."
}
POST /api/sd/cleanup
Response:
{
"success": true,
"deleted_files": 2,
"freed_bytes": 1048576
}
```
## 5.10 Testing & Validation
| Test | Method | Pass Criteria |
|------|--------|---------------|
| SD card init | Insert formatted SD card | Mounted, directories created |
| Buffer write | Disable WiFi and GSM | Records written to .jsonl file |
| File format | Read .jsonl on PC | Valid JSON per line, parseable |
| File rotation | Write >500KB | New file created with incremented number |
| Date rotation | Cross midnight while buffering | New date-prefixed file |
| Drain on reconnect | Restore WiFi after buffering | Records published to MQTT in order |
| Drain throttle | Buffer 1000 records, reconnect | Drain rate <= 10/sec, main loop responsive |
| Live + drain | Reconnect during polling | Live telemetry published + drain in background |
| No SD card | Boot without SD card | Graceful, no errors, WiFi/GSM only |
| SD card removal | Remove SD during operation | Detected within 30s, no crash |
| SD card reinsert | Reinsert after removal | Remounted, buffering resumes |
| Cleanup | Create files > 30 days old | Old files deleted |
| SD full | Fill SD card | Oldest file deleted, write continues |
| Power loss during write | Kill power mid-write | At most last line lost, file intact |
## 5.11 Phase 5 Completion Criteria
- [ ] SD card initializes and creates directory structure
- [ ] Telemetry buffered to SD when no transport available
- [ ] JSONL format verified (parseable per line)
- [ ] File rotation works (size and date based)
- [ ] Automatic drain on connectivity restore
- [ ] Drain is non-blocking and throttled
- [ ] Live telemetry continues during drain
- [ ] Hot-plug detection (removal and reinsertion)
- [ ] Graceful handling when no SD card present
- [ ] Cleanup of old files works
- [ ] Storage stats visible in web UI
- [ ] SD status included in device status API
---
**Previous Phase:** [Phase 4 — GSM & Transport Failover](acuvim-spec-04.md)
**Next Phase:** [Phase 6 — OTA Firmware Updates](acuvim-spec-06.md)