Solar Inverter Real-Time Data Pipeline
Replaced weekly manual log pulls with continuous 5-second telemetry, cutting anomaly detection from days to under 5 minutes. Caught three degrading inverters weeks before failure, saving ~45 MWh in lost generation.
The Challenge
The client's solar sites were running blind. Inverters fed data to legacy monitoring systems, but it was raw, disconnected, and impossible to correlate across sites. When an inverter degraded or faulted, nobody knew until someone manually reviewed logs days or weeks later — by which time they'd lost revenue, missed the window for quick fixes, and sometimes violated grid compliance rules. The operations team had no real-time visibility.
The Impact
Anomaly detection dropped from days to under 5 minutes — fast enough to catch a degrading inverter before it fails. The team identified and replaced three inverters weeks before failure, saving an estimated 45 MWh in lost generation (tens of thousands of dollars in uninterrupted revenue). They also stopped logging into four separate manufacturer portals daily, freeing roughly 8 hours per week for actual preventive maintenance.
What We Built
A pipeline that continuously streams data from every inverter, 24/7. Instead of waiting for someone to manually review logs, the system watches each inverter's power output in real time and flags deviations from expected performance within minutes. If an inverter is degrading, the team knows immediately — not days later. Every site has a dashboard showing its performance; the fleet has a dashboard showing all sites at once.
Technical Diagrams
Pipeline Architecture
Monitoring Dashboard
Background
The client manages a portfolio of distributed solar generation assets — rooftop installations, carport arrays, and small ground-mount systems across multiple commercial and industrial sites. Each site runs a mix of string inverters and microinverters from different manufacturers, all nominally compliant with the SunSpec Modbus standard but with significant variation in register maps and data formats.
Their existing monitoring relied on manufacturer-provided cloud platforms (SolarEdge, Enphase, SMA Sunny Portal), each with its own dashboard, alerting logic, and data export format. Fleet-wide visibility required logging into multiple platforms and manually comparing data in spreadsheets. When an inverter underperformed or faulted, the operations team typically discovered it during a weekly manual review — meaning lost generation revenue and delayed maintenance response.
The goal was a unified data pipeline that could ingest telemetry from any SunSpec-compliant inverter, normalize it into a consistent schema, stream it in near-real-time to a central analytics platform, and flag anomalies automatically.
Technical Approach
Edge Data Collection
We built the edge collector in Rust for two reasons: deterministic memory usage on resource-constrained edge hardware (Raspberry Pi 4 and industrial gateways), and the ability to maintain precise polling intervals without GC pauses.
The collector handles:
- SunSpec discovery — automatically scans Modbus register blocks to identify inverter model, capabilities, and register map. Handles manufacturer-specific deviations from the SunSpec standard gracefully
- Concurrent polling — polls up to 32 inverters per RS-485 bus with configurable intervals (default 5 seconds), managing Modbus transaction IDs and timeout recovery
- Local buffering — writes telemetry to a local WAL (write-ahead log) before forwarding, ensuring no data loss during network outages. On reconnection, the backlog drains automatically
- Avro serialization — each reading is serialized as an Avro record with a schema registry reference, enabling schema evolution without breaking downstream consumers
Streaming Infrastructure
Telemetry flows from edge collectors to a Kafka cluster over TLS-encrypted connections. The topic architecture partitions data by site ID, ensuring ordered processing per installation while allowing horizontal scaling of consumers.
Key design decisions:
- Exactly-once semantics — Kafka transactions combined with idempotent producers ensure no duplicate readings, which is critical for accurate energy yield calculations
- Schema Registry — Confluent Schema Registry manages Avro schema versions, allowing us to add new telemetry fields (e.g., reactive power measurements added in phase 2) without disrupting existing consumers
- Retention policy — raw telemetry retained in Kafka for 72 hours, providing a replay window for reprocessing or backfilling after consumer updates
Time-Series Storage & Analytics
A Kafka Connect sink writes validated telemetry into TimescaleDB, a PostgreSQL extension optimized for time-series workloads. The schema design uses hypertables partitioned by time with chunk intervals tuned to the query patterns:
- 1-hour chunks for real-time dashboards (last 24 hours of data)
- Continuous aggregates for hourly and daily rollups used in trend analysis
- Compression policies that reduce storage footprint by roughly 10x for data older than 30 days
Anomaly Detection
A stream processor compares each inverter’s actual power output against an expected power curve derived from:
- Irradiance data from on-site pyranometers or satellite-based estimates
- Inverter nameplate capacity and derating curves
- Historical performance baselines per device
Deviations beyond configurable thresholds trigger alerts via PagerDuty with contextual metadata: which inverter, what the expected vs. actual output was, and suggested diagnostic steps. The system flags both sudden faults (communication loss, grid disconnect) and gradual degradation (declining conversion efficiency over weeks).
Visualization
Grafana dashboards provide three levels of visibility:
- Fleet overview — total generation, availability percentage, and alert summary across all sites
- Site detail — per-inverter performance heatmaps, string-level comparison, and environmental conditions
- Device drill-down — individual inverter telemetry with overlay of expected vs. actual power curves
Results
The pipeline transformed the client’s operations monitoring:
- Continuous 5-second telemetry replaced weekly manual log pulls, providing real-time visibility across the entire fleet
- Anomaly detection latency dropped from days (manual discovery) to under 5 minutes (automated alerting)
- The operations team identified and replaced three degrading inverters 2–3 weeks before projected failure, avoiding an estimated 45 MWh of lost generation
- System reliably processes 50,000+ data points per minute with sub-second end-to-end latency from edge to dashboard
- The normalized data layer eliminated the need to log into four separate manufacturer portals, saving the operations team roughly 8 hours per week