Complete IoT monitoring platform for Acuvim II power meters via ESP32. Firmware (Phases 1-7): - ESP32-WROVER-B (TTGO T-Call v1.4) with RS485 Modbus RTU - WiFi STA+AP concurrent mode with GSM/GPRS failover - Transport abstraction layer with 4 priority modes - MQTT protocol with 20 commands, LWT, QoS, exponential backoff - SD card offline buffering with JSONL rotation and non-blocking drain - OTA firmware updates with dual partition rollback protection - Watchdog timer, crash loop detection, Acuvim health monitoring - Captive portal provisioning with AP mode Console backend (Phase 8): - .NET 10 minimal API with PostgreSQL + EF Core - JWT authentication, SignalR real-time updates - MQTTnet 5.x bridge service with health monitoring - Device, telemetry, firmware, alert, group management - Rate limiting, security headers, Swagger/OpenAPI Frontend (Phase 9): - React 18 + TypeScript + Vite with Ant Design 5 - ECharts telemetry visualization, TanStack Query - SignalR live updates, device management UI - Dashboard, fleet management, firmware deployment Testing & Production (Phase 10): - 28 firmware unit tests (Modbus, JSON, config, version) - 23 xUnit backend tests (device, telemetry, command, alert) - Docker Compose with nginx, TLS MQTT, PostgreSQL - Production deployment, commissioning, and troubleshooting docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
524 lines
16 KiB
Markdown
524 lines
16 KiB
Markdown
# Phase 10: Integration Testing, Security & Production Hardening
|
|
|
|
## Objective
|
|
|
|
Perform end-to-end integration testing across all components, harden security for production deployment, optimize performance, and prepare the system for field deployment. This phase validates that all previous phases work together as a complete system.
|
|
|
|
## Prerequisites
|
|
|
|
- Phases 1-9 complete
|
|
- Test environment with physical hardware (ESP32 + Acuvim II)
|
|
- Staging MQTT broker and console deployment
|
|
- Multiple test devices (with and without GSM, SD card)
|
|
|
|
## Deliverables
|
|
|
|
1. End-to-end integration test suite
|
|
2. Security hardening (TLS, credentials, firmware signing)
|
|
3. Performance optimization (memory, bandwidth, latency)
|
|
4. Field deployment checklist
|
|
5. Monitoring and alerting for production
|
|
6. Documentation (deployment guide, troubleshooting)
|
|
|
|
---
|
|
|
|
## 10.1 End-to-End Integration Tests
|
|
|
|
### Test Scenarios
|
|
|
|
#### Scenario 1: Full Device Lifecycle
|
|
|
|
```
|
|
1. Factory-fresh ESP32 boots (no config)
|
|
2. AP mode activates, captive portal accessible
|
|
3. Connect phone, configure WiFi + MQTT via web UI
|
|
4. Device connects to WiFi, registers with console
|
|
5. Console shows device as "online"
|
|
6. Telemetry streams to MQTT, visible in console dashboard
|
|
7. Heartbeat appears at configured interval
|
|
8. Push config change from console (e.g., change poll interval)
|
|
9. Device applies new config, confirms via response
|
|
10. Trigger OTA update from console
|
|
11. Device downloads, installs, reboots
|
|
12. Device comes back online with new firmware version
|
|
13. Console reflects updated version
|
|
```
|
|
|
|
#### Scenario 2: Transport Failover
|
|
|
|
```
|
|
1. Device operating normally on WiFi
|
|
2. WiFi router powered off
|
|
3. Device detects WiFi loss within 30 seconds
|
|
4. Device switches to GSM, MQTT reconnects
|
|
5. Console sees transport change in heartbeat (wifi -> gsm)
|
|
6. Telemetry continues via GSM
|
|
7. WiFi router powered on
|
|
8. Device detects WiFi available, switches back
|
|
9. Console sees transport change (gsm -> wifi)
|
|
10. Verify no data loss during transitions
|
|
```
|
|
|
|
#### Scenario 3: Complete Connectivity Loss + SD Buffering
|
|
|
|
```
|
|
1. Device operating normally on WiFi
|
|
2. Disable WiFi and GSM simultaneously
|
|
3. Device detects no transport
|
|
4. Telemetry written to SD card
|
|
5. Verify SD files contain valid JSONL
|
|
6. Re-enable WiFi after 10 minutes
|
|
7. Device connects, MQTT connects
|
|
8. SD queue drains automatically
|
|
9. Console receives buffered records (with "src":"sd" marker)
|
|
10. Verify chronological order preserved
|
|
11. Verify live telemetry resumes alongside drain
|
|
```
|
|
|
|
#### Scenario 4: Crash Recovery
|
|
|
|
```
|
|
1. Device operating normally
|
|
2. Simulate crash (panic or watchdog timeout in test build)
|
|
3. Device reboots automatically
|
|
4. Watchdog reset reason logged
|
|
5. Device reconnects, registers
|
|
6. Boot count incremented
|
|
7. Console shows reset event
|
|
8. Repeat 5 times -> factory reset triggered
|
|
9. Device enters AP mode for reconfiguration
|
|
```
|
|
|
|
#### Scenario 5: OTA Rollback
|
|
|
|
```
|
|
1. Upload intentionally broken firmware to console
|
|
2. Deploy to test device
|
|
3. Device downloads and installs
|
|
4. Device reboots into new firmware
|
|
5. Self-test fails (e.g., can't connect to MQTT)
|
|
6. Device rolls back to previous firmware
|
|
7. Device comes back online with old version
|
|
8. Console shows rollback event
|
|
```
|
|
|
|
#### Scenario 6: WiFi-Only Device (No GSM Module)
|
|
|
|
```
|
|
1. Boot device without SIM800L/SIM7600 module
|
|
2. GSM auto-detected as unavailable
|
|
3. Transport priority forced to WiFi-only
|
|
4. Web UI hides GSM configuration
|
|
5. All features work normally via WiFi
|
|
6. OTA works via WiFi
|
|
7. If WiFi drops, data buffers to SD
|
|
```
|
|
|
|
#### Scenario 7: Console Manages Multiple Devices
|
|
|
|
```
|
|
1. Register 5+ devices with console
|
|
2. Dashboard shows all devices
|
|
3. Group devices (e.g., "Building A", "Building B")
|
|
4. Deploy firmware to a group
|
|
5. All devices in group update
|
|
6. Send config command to group
|
|
7. All devices apply config
|
|
8. One device goes offline -> console detects and alerts
|
|
```
|
|
|
|
### Automated Test Framework
|
|
|
|
For the ESP32 firmware, testing is primarily hardware-in-the-loop:
|
|
|
|
```
|
|
Test Categories:
|
|
├── Unit tests (PlatformIO native test runner)
|
|
│ ├── JSON serialization/deserialization
|
|
│ ├── Version comparison logic
|
|
│ ├── Config validation
|
|
│ └── Register parsing (Float32 from Modbus)
|
|
│
|
|
├── Integration tests (requires hardware)
|
|
│ ├── Modbus read from Acuvim II
|
|
│ ├── WiFi connect/disconnect
|
|
│ ├── MQTT publish/subscribe
|
|
│ ├── GSM connect (if available)
|
|
│ ├── SD card write/read
|
|
│ └── NVS save/load
|
|
│
|
|
└── System tests (full system, manual + scripted)
|
|
├── Scenarios 1-7 above
|
|
├── 72-hour stability test
|
|
└── Power cycle resilience
|
|
```
|
|
|
|
For the console application:
|
|
|
|
```
|
|
Test Categories:
|
|
├── Unit tests (xUnit)
|
|
│ ├── Service logic
|
|
│ ├── Version comparison
|
|
│ └── Data transformations
|
|
│
|
|
├── Integration tests
|
|
│ ├── Database operations
|
|
│ ├── MQTT message handling
|
|
│ ├── API endpoints
|
|
│ └── SignalR events
|
|
│
|
|
└── E2E tests (Playwright)
|
|
├── Login flow
|
|
├── Device list and detail
|
|
├── Config push
|
|
├── Firmware deployment
|
|
└── Alert management
|
|
```
|
|
|
|
## 10.2 Security Hardening
|
|
|
|
### MQTT Security
|
|
|
|
```
|
|
1. Enable TLS on MQTT broker (port 8883)
|
|
- Generate server certificate (Let's Encrypt or self-signed for internal)
|
|
- Configure Mosquitto with TLS listener
|
|
|
|
2. MQTT Authentication
|
|
- Unique username/password per device
|
|
- Console has its own MQTT credentials
|
|
- Disable anonymous access
|
|
|
|
3. MQTT Authorization (ACL)
|
|
- Device ACV-001 can only publish/subscribe to:
|
|
acuvim/ACV-001/#
|
|
devices/register
|
|
- Console can publish/subscribe to all topics
|
|
|
|
4. ESP32 TLS Configuration
|
|
- Embed CA certificate in firmware (for server verification)
|
|
- WiFiClientSecure for WiFi TLS
|
|
- Note: SIM800L has limited TLS support (TLS 1.0/1.1 only)
|
|
SIM7600 supports TLS 1.2
|
|
```
|
|
|
|
### Console Application Security
|
|
|
|
```
|
|
1. HTTPS everywhere (Let's Encrypt)
|
|
2. JWT secret: minimum 256-bit, stored in environment variable
|
|
3. Password hashing: bcrypt with cost factor 12
|
|
4. Rate limiting on auth endpoints (10 attempts per minute)
|
|
5. CORS: restrict to console domain only
|
|
6. Input validation on all API endpoints
|
|
7. SQL injection protection (EF Core parameterized queries)
|
|
8. XSS prevention (React's default escaping + CSP headers)
|
|
9. File upload validation (firmware binaries only, size limit)
|
|
10. API key rotation capability
|
|
```
|
|
|
|
### Firmware Security
|
|
|
|
```
|
|
1. Secure Boot (optional, production)
|
|
- Enable ESP32 secure boot v2
|
|
- Sign firmware with RSA-3072 key
|
|
- Burn eFuse to prevent unsigned firmware
|
|
|
|
2. Flash Encryption (optional, production)
|
|
- Encrypt flash contents
|
|
- Prevent firmware extraction via JTAG/UART
|
|
|
|
3. OTA Security
|
|
- HTTPS firmware download
|
|
- SHA256 checksum verification
|
|
- Version downgrade prevention (optional)
|
|
|
|
4. Credentials Storage
|
|
- WiFi passwords in NVS (encrypted NVS available)
|
|
- MQTT passwords in NVS
|
|
- Never expose passwords in API responses or logs
|
|
```
|
|
|
|
### Network Security Checklist
|
|
|
|
| Item | Status | Notes |
|
|
|------|--------|-------|
|
|
| MQTT TLS enabled | | Port 8883, valid certificate |
|
|
| MQTT auth enabled | | Per-device credentials |
|
|
| MQTT ACL configured | | Topic-level access control |
|
|
| Console HTTPS | | Let's Encrypt or equivalent |
|
|
| JWT tokens | | Secure secret, expiry, refresh |
|
|
| CORS configured | | Console domain only |
|
|
| Rate limiting | | Auth endpoints |
|
|
| Input validation | | All API endpoints |
|
|
| Firmware checksums | | SHA256 on every OTA |
|
|
| Secure boot | | Optional for production |
|
|
| NVS encryption | | Optional for production |
|
|
|
|
## 10.3 Performance Optimization
|
|
|
|
### ESP32 Memory
|
|
|
|
```
|
|
Optimization targets:
|
|
- Free heap should stay above 50KB during normal operation
|
|
- Total firmware binary should be under 1.5MB (fit in OTA partition)
|
|
- Stack sizes: main task 8KB, MQTT task 4KB
|
|
|
|
Techniques:
|
|
- Use StaticJsonDocument where possible (stack allocation)
|
|
- Limit MQTT payload to 1KB max
|
|
- Reuse buffers (don't allocate per message)
|
|
- Use PSRAM for large buffers (SD card read buffer)
|
|
- Monitor with esp_get_free_heap_size() in heartbeat
|
|
- Profile with CONFIG_HEAP_POISONING_COMPREHENSIVE in debug builds
|
|
```
|
|
|
|
### MQTT Optimization
|
|
|
|
```
|
|
- QoS 0 for telemetry (acceptable loss)
|
|
- QoS 1 for commands and heartbeat (guaranteed delivery)
|
|
- Keep-alive interval: 60 seconds
|
|
- Clean session: false (broker retains subscriptions)
|
|
- Compact JSON keys (saves ~40% payload size)
|
|
- Consider MessagePack for high-frequency telemetry (future)
|
|
```
|
|
|
|
### Console Database Optimization
|
|
|
|
```
|
|
- Index on (device_id, timestamp) for telemetry queries
|
|
- Partition telemetry table by month (for scale)
|
|
- Data retention policy: auto-delete telemetry older than N days
|
|
- Connection pooling: max 20 connections
|
|
- Consider TimescaleDB extension for time-series queries
|
|
- Cache device status in memory (invalidate on heartbeat)
|
|
- Aggregate older telemetry (5-min averages after 7 days, hourly after 30 days)
|
|
```
|
|
|
|
### GSM Data Optimization
|
|
|
|
```
|
|
- Longer poll interval on GSM (30s vs 5s on WiFi)
|
|
- Batch telemetry (6 records per publish)
|
|
- Skip non-essential fields on GSM (e.g., THD)
|
|
- Monitor monthly data usage per device
|
|
- Alert if device exceeds data budget
|
|
```
|
|
|
|
## 10.4 Monitoring and Observability
|
|
|
|
### Console Application Monitoring
|
|
|
|
```
|
|
Metrics to track:
|
|
- API response times (P50, P95, P99)
|
|
- MQTT message throughput (messages/second)
|
|
- Database query times
|
|
- Active WebSocket connections
|
|
- Error rates by endpoint
|
|
- Device registration rate
|
|
- OTA deployment success rate
|
|
|
|
Tools:
|
|
- Health check endpoint: GET /health
|
|
- Prometheus metrics: GET /metrics (optional)
|
|
- Structured logging with Serilog (JSON to file or ELK)
|
|
- Application Insights or Grafana dashboards
|
|
```
|
|
|
|
### Device Fleet Monitoring (via Console)
|
|
|
|
```
|
|
Dashboard metrics:
|
|
- Total devices: online / degraded / offline
|
|
- Average telemetry latency (device timestamp vs. received timestamp)
|
|
- MQTT broker connection status
|
|
- Failed OTA count
|
|
- Active alerts by severity
|
|
- Data throughput (messages per minute)
|
|
- SD card buffer status across fleet
|
|
```
|
|
|
|
## 10.5 Production Deployment Checklist
|
|
|
|
### Pre-Deployment
|
|
|
|
```
|
|
Infrastructure:
|
|
[ ] Server/VPS provisioned (minimum: 2 CPU, 4GB RAM, 50GB SSD)
|
|
[ ] Domain name configured with DNS
|
|
[ ] SSL/TLS certificates obtained
|
|
[ ] PostgreSQL installed and secured
|
|
[ ] Mosquitto installed and configured (TLS, auth, ACL)
|
|
[ ] Docker and Docker Compose installed
|
|
[ ] Firewall configured (ports: 443, 8883)
|
|
[ ] Backup strategy for database
|
|
[ ] Monitoring/alerting configured
|
|
|
|
Console Application:
|
|
[ ] appsettings.Production.json configured
|
|
[ ] JWT secret set as environment variable
|
|
[ ] Database migrations applied
|
|
[ ] Admin user created
|
|
[ ] MQTT credentials configured
|
|
[ ] Firmware storage directory created
|
|
[ ] CORS origins set to production domain
|
|
[ ] Logging configured (file rotation, retention)
|
|
|
|
Device Firmware:
|
|
[ ] Production MQTT broker address set
|
|
[ ] TLS CA certificate embedded
|
|
[ ] Console URL configured
|
|
[ ] OTA endpoint configured
|
|
[ ] Firmware version set correctly
|
|
[ ] Debug logging disabled
|
|
[ ] Watchdog enabled
|
|
[ ] Default AP password changed from default
|
|
[ ] Tested with production broker
|
|
```
|
|
|
|
### First Device Deployment
|
|
|
|
```
|
|
Physical Setup:
|
|
[ ] ESP32 mounted and powered (USB or battery)
|
|
[ ] RS485 wired to Acuvim II (A, B, GND)
|
|
[ ] SIM card inserted (if GSM required)
|
|
[ ] SD card inserted (if offline buffer required)
|
|
[ ] Antenna connected (GSM antenna if applicable)
|
|
|
|
Commissioning:
|
|
[ ] Power on device
|
|
[ ] Connect phone to AP (Acuvim-XXXXXX)
|
|
[ ] Open captive portal
|
|
[ ] Configure WiFi
|
|
[ ] Configure MQTT (broker, port, credentials)
|
|
[ ] Set console URL
|
|
[ ] Verify device appears in console
|
|
[ ] Verify telemetry flowing
|
|
[ ] Verify heartbeat received
|
|
[ ] Test remote command (ping)
|
|
[ ] Test OTA update
|
|
[ ] Document installation (photos, device ID, location)
|
|
```
|
|
|
|
## 10.6 Troubleshooting Guide
|
|
|
|
### Common Issues
|
|
|
|
| Symptom | Likely Cause | Resolution |
|
|
|---------|-------------|------------|
|
|
| No AP visible | Firmware not loaded | Re-flash via USB |
|
|
| AP visible, no captive portal | DNS redirect not working | Navigate manually to 192.168.4.1 |
|
|
| WiFi won't connect | Wrong password or channel issue | Re-enter credentials, try manual SSID entry |
|
|
| MQTT not connecting | Wrong broker/port/credentials | Check with MQTT test button |
|
|
| No Modbus data | Wiring, wrong address, wrong baud | Check A/B/GND wiring, verify slave address |
|
|
| GSM not connecting | No SIM, wrong APN, no signal | Check SIM inserted, APN settings, antenna |
|
|
| OTA fails | Download timeout, wrong URL | Check console URL, try WiFi instead of GSM |
|
|
| Device offline | Power loss, crash loop, network issue | Check physical power, serial monitor |
|
|
| Data gaps | Transport was down, SD was full | Check SD queue, verify drain completed |
|
|
| High memory usage | Memory leak in firmware | Check min_free_heap trend in heartbeats |
|
|
|
|
### Diagnostic Commands (via MQTT)
|
|
|
|
```json
|
|
{"cmd": "ping"} // Connectivity test
|
|
{"cmd": "get_status"} // Full device status
|
|
{"cmd": "get_config"} // Current configuration
|
|
{"cmd": "restart"} // Restart device
|
|
{"cmd": "factory_reset", "params": {"confirm": true}} // Last resort
|
|
```
|
|
|
|
### Serial Monitor
|
|
|
|
For local debugging, connect USB and monitor at 115200 baud:
|
|
|
|
```
|
|
pio device monitor --baud 115200
|
|
```
|
|
|
|
Firmware logs boot sequence, connection events, Modbus results, and errors.
|
|
|
|
## 10.7 72-Hour Stability Test
|
|
|
|
### Test Protocol
|
|
|
|
```
|
|
1. Deploy firmware to 3+ devices
|
|
2. Configure all transports (WiFi, GSM, SD)
|
|
3. Run for 72 hours continuously
|
|
4. Monitor:
|
|
a. Memory trend (min_free_heap in heartbeats — should be flat)
|
|
b. Telemetry gaps (query for missing timestamps)
|
|
c. Error rate (Modbus errors, MQTT disconnects)
|
|
d. Boot count (should be 1 unless intentional restart)
|
|
e. Transport switches (should be stable if network is stable)
|
|
5. Induce failures during test:
|
|
a. Kill WiFi for 1 hour (verify GSM failover + SD buffer)
|
|
b. Kill MQTT broker for 30 minutes (verify SD buffer + drain)
|
|
c. Power cycle one device (verify clean recovery)
|
|
6. Pass criteria:
|
|
- No memory leak (min_free_heap stable ± 5%)
|
|
- No unplanned reboots
|
|
- Zero data loss (all records accounted for)
|
|
- All induced failures recovered automatically
|
|
- Console shows accurate device status throughout
|
|
```
|
|
|
|
## 10.8 Documentation Deliverables
|
|
|
|
```
|
|
docs/
|
|
├── acuvim-spec-01.md through acuvim-spec-10.md (these specs)
|
|
├── deployment-guide.md # Server setup, Docker, SSL
|
|
├── device-commissioning.md # Step-by-step field installation
|
|
├── modbus-register-map.md # Complete Acuvim II register reference
|
|
├── mqtt-protocol.md # Topic structure, payload formats, commands
|
|
├── troubleshooting.md # Common issues and resolutions
|
|
├── api-reference.md # REST API documentation (or Swagger)
|
|
└── hardware-wiring.md # ESP32 + RS485 + SIM800/SIM7600 wiring diagrams
|
|
```
|
|
|
|
## 10.9 Phase 10 Completion Criteria
|
|
|
|
- [ ] All 7 integration test scenarios pass
|
|
- [ ] MQTT TLS enabled and tested
|
|
- [ ] MQTT per-device authentication working
|
|
- [ ] Console HTTPS enabled
|
|
- [ ] JWT authentication tested end-to-end
|
|
- [ ] Memory stability verified (72-hour test)
|
|
- [ ] No data loss during transport failover
|
|
- [ ] SD card buffering and drain verified in production
|
|
- [ ] OTA update and rollback verified in production
|
|
- [ ] Crash loop recovery working
|
|
- [ ] Console monitoring dashboard accurate
|
|
- [ ] Deployment documentation complete
|
|
- [ ] Troubleshooting guide complete
|
|
- [ ] 3+ devices deployed and stable for 72 hours
|
|
|
|
---
|
|
|
|
**Previous Phase:** [Phase 9 — Console Application Frontend](acuvim-spec-09.md)
|
|
|
|
## Project Summary
|
|
|
|
| Phase | Focus | Estimated Effort |
|
|
|-------|-------|-----------------|
|
|
| 1 | Project Setup + Modbus RS485 | 1-2 weeks |
|
|
| 2 | WiFi + MQTT + NVS Config | 1-2 weeks |
|
|
| 3 | Captive Portal Web Interface | 1-2 weeks |
|
|
| 4 | GSM Integration + Failover | 1-2 weeks |
|
|
| 5 | SD Card Offline Buffering | 1 week |
|
|
| 6 | OTA Firmware Updates | 1-2 weeks |
|
|
| 7 | Heartbeat + Health + Registration | 1 week |
|
|
| 8 | Console Backend (.NET) | 2-3 weeks |
|
|
| 9 | Console Frontend (React) | 2-3 weeks |
|
|
| 10 | Integration + Production | 1-2 weeks |
|
|
| | **Total** | **12-20 weeks** |
|
|
|
|
Effort estimates assume a single developer working full-time. Phases 1-7 (firmware) can be developed in parallel with Phases 8-9 (console) if two developers are available.
|