Services

Resources

Company

How we cut analytics costs by 67% by migrating from AWS Glue to self-hosted ClickHouse for a fintech processing 30K transactions daily.

Context.

The customer is a fintech company in the digital payments space. They process around 30K transactions daily with multiple terabytes of data across their analytics infrastructure.

Their analytics platform was built entirely on AWS managed services: AWS Glue for ETL, Amazon Athena for querying, and AWS DMS for CDC replication from PostgreSQL. The existing analytics platform worked fine initially. It was reliable, easy to manage. But as data volumes grew, monthly costs crossed from $13,000 and were heading toward $35,000 with projected 3x growth. The analytics solution did not justify the cost to the business. One2N helped to migrate them to a self-hosted open-source stack and cut their costs by 80% while maintaining production-grade reliability.

Problem Statement.

Escalating infrastructure costs: Analytics spend exceeding $13,000/month with no optimization levers available in fully managed services.

Vendor lock-in: Entire pipeline built on AWS-specific services, limiting negotiating power and strategic flexibility.

Scalability concerns: Projected 3x volume growth would push monthly costs beyond $35,000, unsustainable for unit economics.

No operational control: Can't tune compression, storage optimization, or resource allocation with managed services.

Outcome/Impact.

67%

67%

67%

Cost reduction

Cost reduction

Cost reduction

Cost reduction

$40K

$40K

$40K

Annual Savings

Annual Savings

Annual Savings

Annual Savings

99.9%+

99.9%+

99.9%+

Uptime

Uptime

Uptime

Uptime

4.2s

4.2s

4.2s

P95 Query Latency

P95 Query Latency

P95 Query Latency

P95 Query Latency

Cost savings: Monthly spend dropped from $13,000 to $2,000. That's $11,000/month or roughly $120K year.

Cost savings: Monthly spend dropped from $13,000 to $2,000. That's $11,000/month or roughly $120K year.

Cost savings: Monthly spend dropped from $13,000 to $2,000. That's $11,000/month or roughly $120K year.

Cost savings: Monthly spend dropped from $13,000 to $2,000. That's $11,000/month or roughly $120K year.

Cost savings: Monthly spend dropped from $13,000 to $2,000. That's $11,000/month or roughly $120K year.

Production-ready reliability: 1-shard, 2-replica ClickHouse cluster with 99.9%+ availability across availability zones.

Production-ready reliability: 1-shard, 2-replica ClickHouse cluster with 99.9%+ availability across availability zones.

Production-ready reliability: 1-shard, 2-replica ClickHouse cluster with 99.9%+ availability across availability zones.

Production-ready reliability: 1-shard, 2-replica ClickHouse cluster with 99.9%+ availability across availability zones.

Production-ready reliability: 1-shard, 2-replica ClickHouse cluster with 99.9%+ availability across availability zones.

Query performance maintained: P95 latency of 4.2 seconds on par with what they had on Athena.

Query performance maintained: P95 latency of 4.2 seconds on par with what they had on Athena.

Query performance maintained: P95 latency of 4.2 seconds on par with what they had on Athena.

Query performance maintained: P95 latency of 4.2 seconds on par with what they had on Athena.

Query performance maintained: P95 latency of 4.2 seconds on par with what they had on Athena.

Full operational control: Team now owns compression algorithms, indexing strategies, and resource allocation.

Full operational control: Team now owns compression algorithms, indexing strategies, and resource allocation.

Full operational control: Team now owns compression algorithms, indexing strategies, and resource allocation.

Full operational control: Team now owns compression algorithms, indexing strategies, and resource allocation.

Full operational control: Team now owns compression algorithms, indexing strategies, and resource allocation.

No more lock-in: Entire stack is open-source (ClickHouse, PeerDB, Prefect, dbt). It gives flexibility to the customer to move anywhere.

No more lock-in: Entire stack is open-source (ClickHouse, PeerDB, Prefect, dbt). It gives flexibility to the customer to move anywhere.

No more lock-in: Entire stack is open-source (ClickHouse, PeerDB, Prefect, dbt). It gives flexibility to the customer to move anywhere.

No more lock-in: Entire stack is open-source (ClickHouse, PeerDB, Prefect, dbt). It gives flexibility to the customer to move anywhere.

No more lock-in: Entire stack is open-source (ClickHouse, PeerDB, Prefect, dbt). It gives flexibility to the customer to move anywhere.

Solution.

The original setup had DMS pulling CDC from PostgreSQL, dumping to S3, Glue processing the data, Athena for queries. Reliable, but expensive at scale.

We broke the migration into four phases, tackling the biggest cost drivers first.

Phase 1: Replace AWS DMS with PeerDB (~$1,500 → ~$30/month)

DMS was eating $1,500/month just for CDC replication. We swapped it for PeerDB an open-source CDC tool built specifically for PostgreSQL → ClickHouse pipelines. Same 30-second sync intervals, 98% cost reduction. Sometimes the specialized tool just wins.

Phase 2: Build the ClickHouse cluster (~$4,000 → ~$1,600/month)

For the analytical database, we went with ClickHouse. The key decision here was keeping it simple: 1 shard, 2 replicas. No distributed query complexity.

The setup:

  • 2× EC2 r6g.4xlarge (16 vCPUs, 128GB RAM) on Graviton for better price-performance

  • 3× EC2 t4g.medium for ClickHouse Keeper coordination across AZs

  • EBS gp3 starting at 250GB, scaling to 1.5TB per node

We ran compression tests on their actual data—ZSTD gave us 5.4x average compression. Their projected 27TB of raw data fits comfortably in ~5TB compressed. Single-node territory for years. No need to over-engineer with sharding yet.

Phase 3: Replace Glue with Prefect + dbt (~$2,800 → ~$30/month)

Glue was the other big spend. We replaced it with Prefect for orchestration and dbt for SQL transformations. The team now has version-controlled, testable transformation logic instead of Spark jobs running in a black box.

Phase 4: Observability with OpenTelemetry + Last9

Moving off managed services means you own the monitoring too. We set up comprehensive metrics tracking with OpenTelemetry and Last9 for visualization and alerting. Same visibility they had with CloudWatch, fraction of the cost.

Why these choices?

Why no sharding? Their data fits in a single node with room to grow 3x. Sharding adds distributed query complexity, rebalancing headaches, and operational overhead. When they need it, the path is clear. But not yet.

Why Graviton? ~20% better price-performance than x86 for ClickHouse workloads. Easy win.

Why all open-source? No licensing costs, full operational transparency, and they're never locked in again. If they want to move clouds or go on-prem in certain regions, nothing stops them.

Tech stack used.

Tech stack used.

Tech stack used.

Tech stack used.

Analytical DB
Compute
AWS EC2 (Graviton r6g, t4g)
CDC Replication
Storage
AWS EBS gp3
Orchestration
Transformations
Observability
Analytical DB
Orchestration
Observability
Storage
AWS EBS gp3
CDC Replication
Transformations
Compute
AWS EC2 (Graviton r6g, t4g)
Analytical DB
CDC Replication
Orchestration
Transformations
Observability
Compute
AWS EC2 (Graviton r6g, t4g)
Storage
AWS EBS gp3
Analytical DB
Transformations
Storage
AWS EBS gp3
CDC Replication
Observability
Orchestration
Compute
AWS EC2 (Graviton r6g, t4g)
Analytical DB
Transformations
Storage
AWS EBS gp3
CDC Replication
Observability
Orchestration
Compute
AWS EC2 (Graviton r6g, t4g)

Take a look at our other work.

Blogs.

Blogs.

Blogs.