Alexander Petrov

June 4, 2025

Cost-Effective IoT Data Processing on a Shoestring Budget

A friend running a pre-revenue startup asked me how to collect and analyze IoT data affordably. With a tight budget, the challenge was to build an efficient, scalable solution without breaking the bank. Here's a streamlined approach that delivers robust performance at minimal cost.

Traditional Approach
A typical IoT data pipeline might include (based on my mostly ad-tech experience):
  • Gateway: An HTTP API, often a web server or AWS Lambda.
  • Queue: A managed service like Kafka or Amazon SQS.
  • Storage: Databases like PostgreSQL, ClickHouse, Delta Lake, or Apache Iceberg.
  • Processing: Analytics using PostgreSQL, ClickHouse, or Apache Spark.

Optimized Low-Cost Solution
To slash costs without sacrificing functionality:
  1. Storage: Use Delta Lake (or Apache Iceberg) on Amazon S3 for scalable, cost-efficient data storage.
  2. Processing: Run DuckDB on-demand in the same AWS region as S3 for fast, lightweight analytics.
  3. Queue Replacement: Skip expensive managed queues like Kafka or SQS. Instead, use SQLite with Write-Ahead Logging (WAL) and tuning within each web server, periodically exporting data to Delta Lake via a lightweight script.

In this setup it handles 300 requests/second (1.3 billion requests/month) with 1KB payload on a modest t3.medium instance (2 vCPUs, 4GB RAM)

Monthly Cost Breakdown
  • S3 (Storage & Requests): $12.60
  • HTTP Compute (t3.medium): $30.37
  • Data Transfer (Inbound & EC2-to-S3): Free (same region)
  • Analytics Node (m5.4xlarge, $0.768/hour, 18 hours): $13.80
    Total: $56.77/month

Why It Works
  • Delta Lake on S3 ensures low-cost, scalable storage with robust data management.
  • DuckDB provides fast, in-memory analytics, spun up only when needed, minimizing compute costs.
  • SQLite eliminates queue overhead, with periodic exports ensuring reliable data transfer to Delta Lake.

Tips for Success
  • SQLite Tuning: Optimize WAL settings and monitor write performance to handle high request rates.
  • Data Export: Use a scheduled script (e.g., cron job) to export SQLite data to Delta Lake, with error handling to prevent data loss.
  • Analytics: Forget spark and Run DuckDB on an m5.4xlarge instance for 18 hours/month, or explore spot instances for further savings if analytics needs grow.

This setup offers startups a scalable, budget-friendly way to process IoT data while keeping costs under $60/month. It’s proof you don’t need deep pockets to build a powerful data pipeline.

About Alexander Petrov


I build products for fun and profit.
web page