# Phase 10: Integration Testing, Security & Production Hardening ## Objective Perform end-to-end integration testing across all components, harden security for production deployment, optimize performance, and prepare the system for field deployment. This phase validates that all previous phases work together as a complete system. ## Prerequisites - Phases 1-9 complete - Test environment with physical hardware (ESP32 + Acuvim II) - Staging MQTT broker and console deployment - Multiple test devices (with and without GSM, SD card) ## Deliverables 1. End-to-end integration test suite 2. Security hardening (TLS, credentials, firmware signing) 3. Performance optimization (memory, bandwidth, latency) 4. Field deployment checklist 5. Monitoring and alerting for production 6. Documentation (deployment guide, troubleshooting) --- ## 10.1 End-to-End Integration Tests ### Test Scenarios #### Scenario 1: Full Device Lifecycle ``` 1. Factory-fresh ESP32 boots (no config) 2. AP mode activates, captive portal accessible 3. Connect phone, configure WiFi + MQTT via web UI 4. Device connects to WiFi, registers with console 5. Console shows device as "online" 6. Telemetry streams to MQTT, visible in console dashboard 7. Heartbeat appears at configured interval 8. Push config change from console (e.g., change poll interval) 9. Device applies new config, confirms via response 10. Trigger OTA update from console 11. Device downloads, installs, reboots 12. Device comes back online with new firmware version 13. Console reflects updated version ``` #### Scenario 2: Transport Failover ``` 1. Device operating normally on WiFi 2. WiFi router powered off 3. Device detects WiFi loss within 30 seconds 4. Device switches to GSM, MQTT reconnects 5. Console sees transport change in heartbeat (wifi -> gsm) 6. Telemetry continues via GSM 7. WiFi router powered on 8. Device detects WiFi available, switches back 9. Console sees transport change (gsm -> wifi) 10. Verify no data loss during transitions ``` #### Scenario 3: Complete Connectivity Loss + SD Buffering ``` 1. Device operating normally on WiFi 2. Disable WiFi and GSM simultaneously 3. Device detects no transport 4. Telemetry written to SD card 5. Verify SD files contain valid JSONL 6. Re-enable WiFi after 10 minutes 7. Device connects, MQTT connects 8. SD queue drains automatically 9. Console receives buffered records (with "src":"sd" marker) 10. Verify chronological order preserved 11. Verify live telemetry resumes alongside drain ``` #### Scenario 4: Crash Recovery ``` 1. Device operating normally 2. Simulate crash (panic or watchdog timeout in test build) 3. Device reboots automatically 4. Watchdog reset reason logged 5. Device reconnects, registers 6. Boot count incremented 7. Console shows reset event 8. Repeat 5 times -> factory reset triggered 9. Device enters AP mode for reconfiguration ``` #### Scenario 5: OTA Rollback ``` 1. Upload intentionally broken firmware to console 2. Deploy to test device 3. Device downloads and installs 4. Device reboots into new firmware 5. Self-test fails (e.g., can't connect to MQTT) 6. Device rolls back to previous firmware 7. Device comes back online with old version 8. Console shows rollback event ``` #### Scenario 6: WiFi-Only Device (No GSM Module) ``` 1. Boot device without SIM800L/SIM7600 module 2. GSM auto-detected as unavailable 3. Transport priority forced to WiFi-only 4. Web UI hides GSM configuration 5. All features work normally via WiFi 6. OTA works via WiFi 7. If WiFi drops, data buffers to SD ``` #### Scenario 7: Console Manages Multiple Devices ``` 1. Register 5+ devices with console 2. Dashboard shows all devices 3. Group devices (e.g., "Building A", "Building B") 4. Deploy firmware to a group 5. All devices in group update 6. Send config command to group 7. All devices apply config 8. One device goes offline -> console detects and alerts ``` ### Automated Test Framework For the ESP32 firmware, testing is primarily hardware-in-the-loop: ``` Test Categories: ├── Unit tests (PlatformIO native test runner) │ ├── JSON serialization/deserialization │ ├── Version comparison logic │ ├── Config validation │ └── Register parsing (Float32 from Modbus) │ ├── Integration tests (requires hardware) │ ├── Modbus read from Acuvim II │ ├── WiFi connect/disconnect │ ├── MQTT publish/subscribe │ ├── GSM connect (if available) │ ├── SD card write/read │ └── NVS save/load │ └── System tests (full system, manual + scripted) ├── Scenarios 1-7 above ├── 72-hour stability test └── Power cycle resilience ``` For the console application: ``` Test Categories: ├── Unit tests (xUnit) │ ├── Service logic │ ├── Version comparison │ └── Data transformations │ ├── Integration tests │ ├── Database operations │ ├── MQTT message handling │ ├── API endpoints │ └── SignalR events │ └── E2E tests (Playwright) ├── Login flow ├── Device list and detail ├── Config push ├── Firmware deployment └── Alert management ``` ## 10.2 Security Hardening ### MQTT Security ``` 1. Enable TLS on MQTT broker (port 8883) - Generate server certificate (Let's Encrypt or self-signed for internal) - Configure Mosquitto with TLS listener 2. MQTT Authentication - Unique username/password per device - Console has its own MQTT credentials - Disable anonymous access 3. MQTT Authorization (ACL) - Device ACV-001 can only publish/subscribe to: acuvim/ACV-001/# devices/register - Console can publish/subscribe to all topics 4. ESP32 TLS Configuration - Embed CA certificate in firmware (for server verification) - WiFiClientSecure for WiFi TLS - Note: SIM800L has limited TLS support (TLS 1.0/1.1 only) SIM7600 supports TLS 1.2 ``` ### Console Application Security ``` 1. HTTPS everywhere (Let's Encrypt) 2. JWT secret: minimum 256-bit, stored in environment variable 3. Password hashing: bcrypt with cost factor 12 4. Rate limiting on auth endpoints (10 attempts per minute) 5. CORS: restrict to console domain only 6. Input validation on all API endpoints 7. SQL injection protection (EF Core parameterized queries) 8. XSS prevention (React's default escaping + CSP headers) 9. File upload validation (firmware binaries only, size limit) 10. API key rotation capability ``` ### Firmware Security ``` 1. Secure Boot (optional, production) - Enable ESP32 secure boot v2 - Sign firmware with RSA-3072 key - Burn eFuse to prevent unsigned firmware 2. Flash Encryption (optional, production) - Encrypt flash contents - Prevent firmware extraction via JTAG/UART 3. OTA Security - HTTPS firmware download - SHA256 checksum verification - Version downgrade prevention (optional) 4. Credentials Storage - WiFi passwords in NVS (encrypted NVS available) - MQTT passwords in NVS - Never expose passwords in API responses or logs ``` ### Network Security Checklist | Item | Status | Notes | |------|--------|-------| | MQTT TLS enabled | | Port 8883, valid certificate | | MQTT auth enabled | | Per-device credentials | | MQTT ACL configured | | Topic-level access control | | Console HTTPS | | Let's Encrypt or equivalent | | JWT tokens | | Secure secret, expiry, refresh | | CORS configured | | Console domain only | | Rate limiting | | Auth endpoints | | Input validation | | All API endpoints | | Firmware checksums | | SHA256 on every OTA | | Secure boot | | Optional for production | | NVS encryption | | Optional for production | ## 10.3 Performance Optimization ### ESP32 Memory ``` Optimization targets: - Free heap should stay above 50KB during normal operation - Total firmware binary should be under 1.5MB (fit in OTA partition) - Stack sizes: main task 8KB, MQTT task 4KB Techniques: - Use StaticJsonDocument where possible (stack allocation) - Limit MQTT payload to 1KB max - Reuse buffers (don't allocate per message) - Use PSRAM for large buffers (SD card read buffer) - Monitor with esp_get_free_heap_size() in heartbeat - Profile with CONFIG_HEAP_POISONING_COMPREHENSIVE in debug builds ``` ### MQTT Optimization ``` - QoS 0 for telemetry (acceptable loss) - QoS 1 for commands and heartbeat (guaranteed delivery) - Keep-alive interval: 60 seconds - Clean session: false (broker retains subscriptions) - Compact JSON keys (saves ~40% payload size) - Consider MessagePack for high-frequency telemetry (future) ``` ### Console Database Optimization ``` - Index on (device_id, timestamp) for telemetry queries - Partition telemetry table by month (for scale) - Data retention policy: auto-delete telemetry older than N days - Connection pooling: max 20 connections - Consider TimescaleDB extension for time-series queries - Cache device status in memory (invalidate on heartbeat) - Aggregate older telemetry (5-min averages after 7 days, hourly after 30 days) ``` ### GSM Data Optimization ``` - Longer poll interval on GSM (30s vs 5s on WiFi) - Batch telemetry (6 records per publish) - Skip non-essential fields on GSM (e.g., THD) - Monitor monthly data usage per device - Alert if device exceeds data budget ``` ## 10.4 Monitoring and Observability ### Console Application Monitoring ``` Metrics to track: - API response times (P50, P95, P99) - MQTT message throughput (messages/second) - Database query times - Active WebSocket connections - Error rates by endpoint - Device registration rate - OTA deployment success rate Tools: - Health check endpoint: GET /health - Prometheus metrics: GET /metrics (optional) - Structured logging with Serilog (JSON to file or ELK) - Application Insights or Grafana dashboards ``` ### Device Fleet Monitoring (via Console) ``` Dashboard metrics: - Total devices: online / degraded / offline - Average telemetry latency (device timestamp vs. received timestamp) - MQTT broker connection status - Failed OTA count - Active alerts by severity - Data throughput (messages per minute) - SD card buffer status across fleet ``` ## 10.5 Production Deployment Checklist ### Pre-Deployment ``` Infrastructure: [ ] Server/VPS provisioned (minimum: 2 CPU, 4GB RAM, 50GB SSD) [ ] Domain name configured with DNS [ ] SSL/TLS certificates obtained [ ] PostgreSQL installed and secured [ ] Mosquitto installed and configured (TLS, auth, ACL) [ ] Docker and Docker Compose installed [ ] Firewall configured (ports: 443, 8883) [ ] Backup strategy for database [ ] Monitoring/alerting configured Console Application: [ ] appsettings.Production.json configured [ ] JWT secret set as environment variable [ ] Database migrations applied [ ] Admin user created [ ] MQTT credentials configured [ ] Firmware storage directory created [ ] CORS origins set to production domain [ ] Logging configured (file rotation, retention) Device Firmware: [ ] Production MQTT broker address set [ ] TLS CA certificate embedded [ ] Console URL configured [ ] OTA endpoint configured [ ] Firmware version set correctly [ ] Debug logging disabled [ ] Watchdog enabled [ ] Default AP password changed from default [ ] Tested with production broker ``` ### First Device Deployment ``` Physical Setup: [ ] ESP32 mounted and powered (USB or battery) [ ] RS485 wired to Acuvim II (A, B, GND) [ ] SIM card inserted (if GSM required) [ ] SD card inserted (if offline buffer required) [ ] Antenna connected (GSM antenna if applicable) Commissioning: [ ] Power on device [ ] Connect phone to AP (Acuvim-XXXXXX) [ ] Open captive portal [ ] Configure WiFi [ ] Configure MQTT (broker, port, credentials) [ ] Set console URL [ ] Verify device appears in console [ ] Verify telemetry flowing [ ] Verify heartbeat received [ ] Test remote command (ping) [ ] Test OTA update [ ] Document installation (photos, device ID, location) ``` ## 10.6 Troubleshooting Guide ### Common Issues | Symptom | Likely Cause | Resolution | |---------|-------------|------------| | No AP visible | Firmware not loaded | Re-flash via USB | | AP visible, no captive portal | DNS redirect not working | Navigate manually to 192.168.4.1 | | WiFi won't connect | Wrong password or channel issue | Re-enter credentials, try manual SSID entry | | MQTT not connecting | Wrong broker/port/credentials | Check with MQTT test button | | No Modbus data | Wiring, wrong address, wrong baud | Check A/B/GND wiring, verify slave address | | GSM not connecting | No SIM, wrong APN, no signal | Check SIM inserted, APN settings, antenna | | OTA fails | Download timeout, wrong URL | Check console URL, try WiFi instead of GSM | | Device offline | Power loss, crash loop, network issue | Check physical power, serial monitor | | Data gaps | Transport was down, SD was full | Check SD queue, verify drain completed | | High memory usage | Memory leak in firmware | Check min_free_heap trend in heartbeats | ### Diagnostic Commands (via MQTT) ```json {"cmd": "ping"} // Connectivity test {"cmd": "get_status"} // Full device status {"cmd": "get_config"} // Current configuration {"cmd": "restart"} // Restart device {"cmd": "factory_reset", "params": {"confirm": true}} // Last resort ``` ### Serial Monitor For local debugging, connect USB and monitor at 115200 baud: ``` pio device monitor --baud 115200 ``` Firmware logs boot sequence, connection events, Modbus results, and errors. ## 10.7 72-Hour Stability Test ### Test Protocol ``` 1. Deploy firmware to 3+ devices 2. Configure all transports (WiFi, GSM, SD) 3. Run for 72 hours continuously 4. Monitor: a. Memory trend (min_free_heap in heartbeats — should be flat) b. Telemetry gaps (query for missing timestamps) c. Error rate (Modbus errors, MQTT disconnects) d. Boot count (should be 1 unless intentional restart) e. Transport switches (should be stable if network is stable) 5. Induce failures during test: a. Kill WiFi for 1 hour (verify GSM failover + SD buffer) b. Kill MQTT broker for 30 minutes (verify SD buffer + drain) c. Power cycle one device (verify clean recovery) 6. Pass criteria: - No memory leak (min_free_heap stable ± 5%) - No unplanned reboots - Zero data loss (all records accounted for) - All induced failures recovered automatically - Console shows accurate device status throughout ``` ## 10.8 Documentation Deliverables ``` docs/ ├── acuvim-spec-01.md through acuvim-spec-10.md (these specs) ├── deployment-guide.md # Server setup, Docker, SSL ├── device-commissioning.md # Step-by-step field installation ├── modbus-register-map.md # Complete Acuvim II register reference ├── mqtt-protocol.md # Topic structure, payload formats, commands ├── troubleshooting.md # Common issues and resolutions ├── api-reference.md # REST API documentation (or Swagger) └── hardware-wiring.md # ESP32 + RS485 + SIM800/SIM7600 wiring diagrams ``` ## 10.9 Phase 10 Completion Criteria - [ ] All 7 integration test scenarios pass - [ ] MQTT TLS enabled and tested - [ ] MQTT per-device authentication working - [ ] Console HTTPS enabled - [ ] JWT authentication tested end-to-end - [ ] Memory stability verified (72-hour test) - [ ] No data loss during transport failover - [ ] SD card buffering and drain verified in production - [ ] OTA update and rollback verified in production - [ ] Crash loop recovery working - [ ] Console monitoring dashboard accurate - [ ] Deployment documentation complete - [ ] Troubleshooting guide complete - [ ] 3+ devices deployed and stable for 72 hours --- **Previous Phase:** [Phase 9 — Console Application Frontend](acuvim-spec-09.md) ## Project Summary | Phase | Focus | Estimated Effort | |-------|-------|-----------------| | 1 | Project Setup + Modbus RS485 | 1-2 weeks | | 2 | WiFi + MQTT + NVS Config | 1-2 weeks | | 3 | Captive Portal Web Interface | 1-2 weeks | | 4 | GSM Integration + Failover | 1-2 weeks | | 5 | SD Card Offline Buffering | 1 week | | 6 | OTA Firmware Updates | 1-2 weeks | | 7 | Heartbeat + Health + Registration | 1 week | | 8 | Console Backend (.NET) | 2-3 weeks | | 9 | Console Frontend (React) | 2-3 weeks | | 10 | Integration + Production | 1-2 weeks | | | **Total** | **12-20 weeks** | Effort estimates assume a single developer working full-time. Phases 1-7 (firmware) can be developed in parallel with Phases 8-9 (console) if two developers are available.