Services

Resources

Company

Our Work

Blog

Schedule a Meet

Migrating from AWS Glue to Self-Hosted ClickHouse: 70% Cost Reduction for a Fintech Client.

Context.

The customer is a B2B fintech company in the cross border payments space. Their daily transaction volume is in the range of $50M to $70M. For this, they query multiple terabytes of data across their analytics infrastructure.

Their analytics platform was built entirely on AWS managed services: AWS DMS for CDC replication from PostgreSQL, AWS Glue jobs for ETL, AWS Glue tables using S3 for storage as their database, and Amazon Athena for querying. The existing analytics platform worked fine initially, but as data volumes grew, monthly costs crossed from $13,000 and were heading toward $35,000 with projected 3x growth.

The analytics solution did not justify the cost to the business. One2N helped to migrate them to a self-hosted open-source stack and cut their costs by 70% while maintaining production-grade reliability.

Problem Statement.

Escalating infrastructure costs: Analytics spend exceeding $13,000/month with no optimization levers available in fully managed services.

Scalability concerns: Projected 3x volume growth would push monthly costs beyond $35,000, unsustainable for unit economics.

No operational control: Can't tune compression, storage optimization, or resource allocation with managed services.

Vendor lock-in: Entire pipeline built on AWS-specific services, limiting negotiating power and strategic flexibility.

Outcome/Impact.

70%

Cost reduction

$108K

Annual Savings

81%

Compression in DB size

Cost savings: Monthly spend dropped from $13,000 to $4,000. That's $9,000/month or roughly $108K year.

Production-ready reliability: 1-shard, 2-replica ClickHouse cluster with 99%+ availability across availability zones.

Query performance maintained: P95 latency of 4.2 seconds on par with what they had on Athena.

Full operational control: Team now owns compression algorithms, indexing strategies, and resource allocation.

No more lock-in: Entire stack is open-source (ClickHouse, PeerDB, Prefect, dbt). It gives flexibility to the customer to move anywhere.

Solution.

Previous Setup vs Updated Setup

The original setup relied on a fully managed AWS stack: DMS for CDC from PostgreSQL, S3 for storage, Glue for processing, and Athena for queries. It was reliable, but at terabyte scale, it was costing a fortune with no way to optimize. We broke the migration into four logical phases to ensure a smooth transition with zero downtime.

Phase 1: Building the ClickHouse Foundation (~$6,400 to ~$3,600/month)

The heart of the new stack is a self-hosted ClickHouse cluster. We opted for a pragmatic 1-shard, 2-replica (1S2R) architecture.

Why only 1 shard? Their current data fits comfortably in a single node with plenty of room to grow. Sharding adds significant operational overhead (distributed query complexity, rebalancing), so we decided to keep it simple and only scale horizontally when the data actually demands it.
The Infrastructure: We used Graviton-based r6g.4xlarge instances. Graviton gives us a ~20% price-performance edge over x86 for ClickHouse workloads. To ensure fault tolerance, we deployed a 2-node ClickHouse Keeper cluster spread across multiple availability zones.
Observability: Moving off managed services means you own the monitoring. We wired in OpenTelemetry and Last9 from day one to track cluster health, replication lag, and query performance.

Phase 2: Transitioning to PeerDB for CDC (~$3,500 to ~$60/month)

With the cluster ready, we needed a more cost-effective way to keep data flowing. AWS DMS was costing $3,500/month just for replication. We replaced it with PeerDB, an open-source tool built specifically for PostgreSQL-to-ClickHouse pipelines. It maintained the same 30-second sync intervals but slashed the cost by 98%. It’s a classic case of a specialized tool outperforming a general-purpose managed service.

Phase 3: Streamlining ETL with Prefect and dbt (~$3,000 to ~$60/month)

Next, we replaced the "black box" of Glue Spark jobs. Prefect handles the orchestration, managing the dependencies between the datalake and datamart layers and executing jobs with the transformation logic. dbt allows the team to use version-controlled, testable SQL instead of complex Spark code to maintain our schemas.

Phase 4: Consuming Analytics with Apache Superset (~$20 to ~$60/month)

Finally, we replaced the Athena query layer with Apache Superset. While Athena’s pay-per-query model seems cheap initially, it becomes a deterrent for exploration at scale. By running Superset on a dedicated instance, we gave the analytics team unlimited query access with no per-scan charges. They now have full-featured dashboards, alerts, and regulatory reporting on a predictable, fixed-cost infrastructure.

Why go all-in on Open Source?

By choosing ClickHouse, PeerDB, Prefect, and dbt, we eliminated licensing fees and vendor lock-in. The client now has full operational transparency and the flexibility to move their stack anywhere be it on-prem or to another cloud provider, without rebuilding their entire data culture.

Tech stack used.

Component	Technology
Analytical DB	ClickHouse
CDC Replication	PeerDB
Orchestration	Prefect
Schema Migrations	dbt
Observability	OpenTelemetry + Last9
Compute	AWS EC2 (Graviton r6g, t4g)
Storage	AWS EBS gp3
Dashboards and Alerts	Superset

Previous Setup vs Updated Setup

Phase 1: Building the ClickHouse Foundation (~$6,400 to ~$3,600/month)

The heart of the new stack is a self-hosted ClickHouse cluster. We opted for a pragmatic 1-shard, 2-replica (1S2R) architecture.

Why only 1 shard? Their current data fits comfortably in a single node with plenty of room to grow. Sharding adds significant operational overhead (distributed query complexity, rebalancing), so we decided to keep it simple and only scale horizontally when the data actually demands it.
The Infrastructure: We used Graviton-based r6g.4xlarge instances. Graviton gives us a ~20% price-performance edge over x86 for ClickHouse workloads. To ensure fault tolerance, we deployed a 2-node ClickHouse Keeper cluster spread across multiple availability zones.
Observability: Moving off managed services means you own the monitoring. We wired in OpenTelemetry and Last9 from day one to track cluster health, replication lag, and query performance.

Phase 2: Transitioning to PeerDB for CDC (~$3,500 to ~$60/month)

Phase 3: Streamlining ETL with Prefect and dbt (~$3,000 to ~$60/month)

Phase 4: Consuming Analytics with Apache Superset (~$20 to ~$60/month)

Why go all-in on Open Source?

Tech stack used.

Component	Technology
Analytical DB	ClickHouse
CDC Replication	PeerDB
Orchestration	Prefect
Schema Migrations	dbt
Observability	OpenTelemetry + Last9
Compute	AWS EC2 (Graviton r6g, t4g)
Storage	AWS EBS gp3
Dashboards and Alerts	Superset

Take a look at our other work.

Read Case Study

Modernising Observability for a Billion Dollar Revenue Gaming Platform

A gaming and betting operator ran 18 Java services across 3,000+ VMs, processing 180M spins daily and 600K RPS, but needed unified observability across fragmented monitoring tools.

10x

Faster Detection

180M+

Daily Transactions

9,648

Elasticsearch queries per day eliminated

Read Case Study

Modernising Observability for a Billion Dollar Revenue Gaming Platform

A gaming and betting operator ran 18 Java services across 3,000+ VMs, processing 180M spins daily and 600K RPS, but needed unified observability across fragmented monitoring tools.

10x

Faster Detection

180M+

Daily Transactions

9,648

Elasticsearch queries per day eliminated

Read Case Study

Modernising Observability for a Billion Dollar Revenue Gaming Platform

A gaming and betting operator ran 18 Java services across 3,000+ VMs, processing 180M spins daily and 600K RPS, but needed unified observability across fragmented monitoring tools.

10x

Faster Detection

180M+

Daily Transactions

9,648

Elasticsearch queries per day eliminated

Read Case Study

How we reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform.

How One2N reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform, while cutting AWS infrastructure costs by 40%.

90%

Latency Improvement

Throughput improvements

40%

Cost reduction

Read Case Study

How we reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform.

How One2N reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform, while cutting AWS infrastructure costs by 40%.

90%

Latency Improvement

Throughput improvements

40%

Cost reduction

Read Case Study

How we reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform.

How One2N reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform, while cutting AWS infrastructure costs by 40%.

90%

Latency Improvement

Throughput improvements

40%

Cost reduction

Read Case Study

How we reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform.

How One2N reduced Kafka message latency by 90% and increased throughput 4x for a $300M SaaS platform, while cutting AWS infrastructure costs by 40%.

90%

Latency Improvement

Throughput improvements

40%

Cost reduction

Other Case Studies

Blogs.

Read Blog

Conway's Law as a lever to fix legacy architectures

Chinmay Naik

CEO @One2N

Conway’s law talks about how software architecture mirrors the org chart. Learn what this means in practice and how to use Reverse Conway Maneuver to refactor existing system architectures.

July 8, 2026 | 6 min read

Read Blog

Hardening Kubernetes Security with Istio Ambient Mode on Flat Networks

Harshwardhan Mehrotra

SRE @One2N

Spandan Ghosh

Content @One2N

Learn how One2N used Istio Ambient mode to enforce zero-trust security on a flat EKS Hybrid network - with mTLS, SPIFFE identity, egress control, and NetworkPolicies.

June 1, 2026 | 7 min read

Read Blog

Git was not built for multiple AI agents. Worktrees fix that.

Srivatsa RV

SRE @One2N

Spandan Ghosh

Content @One2N

Running multiple AI coding agents on the same repo? Git's single-checkout model breaks fast. Learn how git worktrees give each agent its own isolated workspace.

May 26, 2026 | 6 min read

Read Blog

How one parameter stalled Postgres Replication with Debezium for 3 Weeks

Harsh Mishra

SRE @One2N

A story about WAL segments, missing events, and the one line of SQL that should have been there from the start.

April 28, 2026 | 7 min read

Read Blog

Multi-Tenant SFTP: How We Built a Zero-Ops Setup on AWS Transfer Family

Kshitish Deshpande

Software Engineer @One2N

Spandan Ghosh

Content @One2N

How One2N built a zero-ops multi-tenant SFTP system for a fintech client using AWS Transfer Family: one endpoint, isolated buckets, no manual server ops.

April 20, 2026 | 6 min read

Read Blog

My new laptop broke Slack. That's how I ended up on Bluefin.

Harsh Mishra

SRE @One2N

Spandan Ghosh

Content @One2N

A kernel patch broke Slack on my Dell XPS. Here's how that led me to Bluefin, immutable Linux, bootc, and OCI images and why it solved my problem for good.

April 1, 2026 | 7 min read

Blogs

Blogs.

Read Blog

Conway's Law as a lever to fix legacy architectures

Chinmay Naik

CEO @One2N

Conway’s law talks about how software architecture mirrors the org chart. Learn what this means in practice and how to use Reverse Conway Maneuver to refactor existing system architectures.

July 8, 2026 | 6 min read

Read Blog

Hardening Kubernetes Security with Istio Ambient Mode on Flat Networks

Harshwardhan Mehrotra

SRE @One2N

Spandan Ghosh

Content @One2N

Learn how One2N used Istio Ambient mode to enforce zero-trust security on a flat EKS Hybrid network - with mTLS, SPIFFE identity, egress control, and NetworkPolicies.

June 1, 2026 | 7 min read

Read Blog

Git was not built for multiple AI agents. Worktrees fix that.

Srivatsa RV

SRE @One2N

Spandan Ghosh

Content @One2N

Running multiple AI coding agents on the same repo? Git's single-checkout model breaks fast. Learn how git worktrees give each agent its own isolated workspace.

May 26, 2026 | 6 min read

Blogs

Services

Resources

Company

Migrating from AWS Glue to Self-Hosted ClickHouse: 70% Cost Reduction for a Fintech Client.