Telemetry Synchronisation

Overview
Telemetry Flow
- Basic Flow
- Edge Root Rule Chain
Synchronisation Modes
Sync Configuration
Offline Behaviour
- During Offline Period
- Reconnection Process
Telemetry Storage
- Local Storage (Edge)
- Cloud Storage
Data Consistency
- Timestamp Handling
- Conflict Resolution
Performance Optimisation
Monitoring Sync Status
Troubleshooting
Best Practices
Next Steps

Overview

Telemetry synchronisation between IndustryOS Edge and Cloud Platform enables flexible data management. You control what data stays local, what syncs to the cloud, and when synchronisation occurs.

Telemetry Flow

Basic Flow

Device
  │
  │ MQTT/HTTP/CoAP
  ▼
Edge Transport Layer
  │
  ▼
Edge Rule Engine
  ├─→ Save to Local PostgreSQL (always)
  ├─→ Update Dashboard (websocket)
  ├─→ Check Alarm Conditions
  └─→ Push to Cloud (conditional)
        │
        │ gRPC (port 7070)
        ▼
     Cloud Platform

Edge Root Rule Chain

The default Edge Root Rule Chain controls telemetry flow:

Message Type Switch
  │
  ├──[Post telemetry]─→ Save Timeseries
  │                         │
  │                         ▼
  │                    Push to Cloud
  │
  ├──[Post attributes]─→ Save Server Attributes
  │                         │
  │                         ▼
  │                    Push to Cloud
  │
  └──[RPC Request]────→ Handle RPC locally

Synchronisation Modes

Mode 1: Full Sync

All telemetry syncs to cloud:

// No filtering - all messages pushed
Save Timeseries → Push to Cloud

Use Cases:

Cloud-based analytics on full dataset
Compliance requirements (all data in cloud)
Edge used primarily for local dashboards

Bandwidth Impact:

High (all data transmitted)
Example: 100 devices × 1 msg/sec × 100 bytes = 10 KB/sec = 864 MB/day

Mode 2: Filtered Sync

Only specific conditions sync:

// Script filter node
var temp = msg.temperature;
var threshold = metadata.ss_tempThreshold || 30;

if (temp > threshold) {
    return {msg: msg, metadata: metadata, msgType: "Push to Cloud"};
}
return {msg: msg, metadata: metadata, msgType: "Local Only"};

Use Cases:

Exception monitoring (only anomalies)
Bandwidth-constrained environments
Privacy-sensitive data (most stays local)

Bandwidth Impact:

Low (1-5% of full sync typical)
Example: 100 devices, 5% anomaly rate = 432 KB/day vs. 864 MB/day

Mode 3: Aggregated Sync

Periodic summaries sync:

Telemetry → Aggregate Node (1 hour window)
            │
            ├─→ Calculate: AVG, MIN, MAX, COUNT
            │
            ▼
         Push to Cloud (1 msg/hour instead of 3600)

Configuration:

{
  "interval": 3600,
  "intervalTimeUnit": "SECONDS",
  "aggregateKeys": ["temperature", "humidity"],
  "aggregateFunctions": ["AVG", "MIN", "MAX", "COUNT"]
}

Use Cases:

Trend analysis (hourly/daily averages)
Long-term historical data
Bandwidth optimisation

Bandwidth Impact:

Minimal (99.97% reduction)
Example: 100 devices, hourly avg = 2.4 KB/day vs. 864 MB/day

Mode 4: On-Demand Sync

Manually triggered sync:

// Only push on explicit command
if (metadata.pushCommand === "true") {
    metadata.pushCommand = "false"; // Reset
    return {msg: msg, metadata: metadata, msgType: "Push to Cloud"};
}
return {msg: msg, metadata: metadata, msgType: "Local Only"};

Trigger Methods:

Dashboard button
Scheduled task
External API call
Alarm condition

Use Cases:

Offline-first deployments
Periodic batch uploads (e.g., nightly)
Data sovereignty requirements

Sync Configuration

Rule Chain Configuration

1. Navigate to Rule Chains:

Edge UI → Rule Chains → Edge Root Rule Chain

2. Add Filter Node:

Drag Script node
Add after “Save Timeseries”
Configure filter logic

3. Connect to Push Node:

Script output → Push to Cloud node

Example Filter Scripts:

Threshold Filter:

var temp = msg.temperature;
return temp > 30 || temp < 10; // Only extremes

Delta Filter (change detection):

var current = msg.temperature;
var last = metadata.lastTemp || current;
var delta = Math.abs(current - last);

if (delta > 0.5) {
    metadata.lastTemp = current;
    return true; // Push to cloud
}
return false; // Local only

Time-Based Filter:

// Only sync during business hours
var hour = new Date().getHours();
return hour >= 8 && hour < 18;

Queue Configuration

File: /etc/industryos-edge/conf/industryos-edge.conf

# Queue settings
export EDGE_STORAGE_MAX_READ_RECORDS_COUNT="1000"  # Batch size
export CLOUD_RPC_TIMEOUT="60000"                   # Timeout (ms)
export CLOUD_RPC_KEEP_ALIVE_TIME="10"              # Keepalive (sec)

Queue Behaviour:

Messages stored in PostgreSQL
Survives edge restart
Automatic drain when cloud available
Oldest messages sent first (FIFO)

Bandwidth Management

Compression:

# Enable gRPC compression
export CLOUD_RPC_COMPRESSION="true"

Typical Compression Ratios:

JSON telemetry: 60-80% reduction
Example: 864 MB/day → 173 MB/day

Offline Behaviour

During Offline Period

Edge Continues:

Accept device telemetry
Save to local PostgreSQL
Process rule chains
Update dashboards
Create alarms
Queue messages for cloud

Queue Growth:

Time Offline: 24 hours
Devices: 100
Message Rate: 1/sec/device
Queue Growth: 100 × 1 × 86,400 = 8,640,000 messages
Storage: ~860 MB (100 bytes/message)

Queue Limits:

# Maximum queue size
max_queue_size: 100000  # messages

# Behaviour when full:
# - Drop oldest messages (default)
# - Stop accepting new telemetry (optional)

Reconnection Process

Step-by-Step:

Detect Cloud Availability:
- Periodic connection attempts (30 sec interval)
- Exponential backoff on failures
Re-establish gRPC:
- Authenticate with edge key/secret
- Verify cloud accepts connection
Sync Metadata:
- Entity updates (new devices/assets)
- Attribute changes
- Alarm states
Drain Queue:
- Batch size: 1000 messages
- Interval: 100ms between batches
- Priority: Alarms > Entities > Telemetry
Resume Normal Operation:
- Real-time telemetry sync
- Bidirectional communication

Queue Drain Rate:

Batch Size: 1000 messages
Interval: 100ms
Drain Rate: 10,000 messages/sec
Time to drain 100k messages: ~10 seconds

Telemetry Storage

Local Storage (Edge)

PostgreSQL Tables:

ts_kv (time-series):

CREATE TABLE ts_kv (
  entity_id UUID,
  key VARCHAR,
  ts BIGINT,  -- Timestamp (milliseconds)
  bool_v BOOLEAN,
  str_v VARCHAR,
  long_v BIGINT,
  dbl_v DOUBLE PRECISION,
  json_v JSONB
) PARTITION BY RANGE (ts);

-- Monthly partitions
CREATE TABLE ts_kv_2024_03 PARTITION OF ts_kv
  FOR VALUES FROM (1709251200000) TO (1711929600000);

Retention Policy:

-- Drop partitions older than 90 days
DROP TABLE ts_kv_2023_12;

Indexes:

CREATE INDEX idx_ts_kv_entity_ts ON ts_kv(entity_id, ts DESC);
CREATE INDEX idx_ts_kv_key ON ts_kv(key);

Cloud Storage

Cloud Receives:

Filtered telemetry (based on edge rules)
Entity metadata (device names, types)
Timestamps (from edge, preserving original time)

Cloud Benefits:

Unlimited retention (no partition cleanup)
Cross-edge analytics
Historical reporting
Compliance archives

Data Consistency

Timestamp Handling

Device Timestamp:

// Device sends with timestamp
{
  "ts": 1710504000000,
  "values": {
    "temperature": 25.5
  }
}

Server Timestamp:

// Device sends without timestamp (edge adds)
{
  "temperature": 25.5
}
// Edge adds: ts = current_time()

Best Practice:

Use device timestamps for time-sensitive data
Use server timestamps for simplicity
Ensure device time sync (NTP)

Conflict Resolution

Scenario: Same telemetry arrives via edge and directly to cloud

Resolution:

Deduplication: Cloud checks timestamp + device ID
Last Write Wins: Most recent timestamp kept
Merge: Combine non-overlapping keys

Example:

Edge sends:  {ts: 1000, temp: 25, humidity: 60}
Cloud has:   {ts: 1000, temp: 25}
Result:      {ts: 1000, temp: 25, humidity: 60}  // Merged

Performance Optimisation

Batch Processing

Configuration:

# Batch settings
ts_kv_batch_size: 1000
ts_kv_batch_max_delay: 100  # milliseconds

Benefits:

Reduce database I/O
Improve throughput (10x+)
Lower CPU usage

Tradeoff:

Slight latency increase (< 100ms)
Acceptable for most use cases

Partition Management

Automatic Partition Creation:

-- Edge creates partitions automatically
-- Future partitions: +2 months
-- Example: Current = March, creates April, May

Partition Cleanup:

# Cron job (monthly)
0 0 1 * * psql -c "DROP TABLE ts_kv_$(date -d '4 months ago' +\%Y_\%m);"

Query Optimisation

Efficient Queries:

-- Good: Uses index
SELECT * FROM ts_kv
WHERE entity_id = '<UUID>'
  AND ts >= 1710504000000
  AND ts < 1710590400000
ORDER BY ts DESC;

-- Bad: Full table scan
SELECT * FROM ts_kv
WHERE key = 'temperature'
  AND dbl_v > 30;

Monitoring Sync Status

Edge Status Page

Navigate: Edge UI → System → Edge Status

Metrics:

Cloud connection: CONNECTED / DISCONNECTED
Last sync time: 2024-03-15 10:30:00
Queue size: 42 messages
Bytes sent: 1.2 MB
Bytes received: 600 KB
Sync errors: 0

Cloud Events Page

Navigate: Edge UI → System → Cloud Events

Event Types:

ENTITY_ASSIGNED: Dashboard/rule chain assigned from cloud
ENTITY_DELETED: Entity deleted on cloud
ATTRIBUTE_UPDATED: Attribute changed on cloud
RELATION_UPDATED: Relation added/removed

Health Check API

curl http://localhost:8080/api/edge/health

Response:

{
  "status": "UP",
  "cloudConnection": "CONNECTED",
  "queueSize": 42,
  "lastSyncTime": "2024-03-15T10:30:00Z",
  "syncErrors": 0,
  "totalSyncedMessages": 1500000,
  "totalSyncedBytes": 157286400
}

Troubleshooting

Issue: Telemetry Not Syncing

Check 1: Cloud Connection

curl http://localhost:8080/api/edge/health | grep cloudConnection
# Expected: "cloudConnection": "CONNECTED"

Check 2: Rule Chain

Verify “Push to Cloud” node exists
Check node connections
Review filter scripts

Check 3: Queue

curl http://localhost:8080/api/edge/health | grep queueSize
# If growing: Cloud connection issue
# If zero: Rule chain issue

Issue: High Queue Size

Causes:

Slow/intermittent cloud connection
Excessive telemetry rate
Insufficient bandwidth

Solutions:

Improve Connectivity:
- Check network stability
- Increase bandwidth
- Configure proxy (if needed)
Reduce Sync Rate:
- Add filtering logic
- Increase aggregation window
- Reduce device message rate

Increase Batch Size:

export EDGE_STORAGE_MAX_READ_RECORDS_COUNT="5000"

Issue: Duplicate Telemetry on Cloud

Cause:

Edge and device both pushing to cloud

Solution:

Remove “Push to Cloud” from edge rule chain
OR disable direct cloud connection on devices

Best Practices

1. Filter Early

Apply filters before “Push to Cloud” node:

Save Timeseries → Filter Script → Push to Cloud
                       │
                       └───→ [Filtered Out] → End

2. Use Aggregation for Trends

Raw data locally, aggregates to cloud:

Local: 1 msg/sec (full resolution)
Cloud: 1 msg/hour (averages)
Reduction: 99.97%

3. Monitor Queue Size

Set alerts:

// Alert rule
if (metadata.queueSize > 50000) {
    sendNotification("High edge queue size");
}

4. Configure Retention

Match to use case:

Real-time dashboards: 7 days
Historical analysis: 90 days
Compliance: Use cloud (unlimited)

5. Test Offline Scenarios

Periodically test:

Disconnect cloud
Generate telemetry
Verify local storage
Reconnect cloud
Verify queue drain

Telemetry Synchronisation

Overview

Telemetry Flow

Basic Flow

Edge Root Rule Chain

Synchronisation Modes

Mode 1: Full Sync

Mode 2: Filtered Sync

Mode 3: Aggregated Sync

Mode 4: On-Demand Sync

Sync Configuration

Rule Chain Configuration

Queue Configuration

Bandwidth Management

Offline Behaviour

During Offline Period

Reconnection Process

Telemetry Storage

Local Storage (Edge)

Cloud Storage

Data Consistency

Timestamp Handling

Conflict Resolution

Performance Optimisation

Batch Processing

Partition Management

Query Optimisation

Monitoring Sync Status

Edge Status Page

Cloud Events Page

Health Check API

Troubleshooting

Issue: Telemetry Not Syncing

Issue: High Queue Size

Issue: Duplicate Telemetry on Cloud

Best Practices

1. Filter Early

2. Use Aggregation for Trends

3. Monitor Queue Size

4. Configure Retention

5. Test Offline Scenarios

Next Steps