Careers @One2N

Services

Resources

Company

Our Work

Blog

Schedule a Meet

All Positions

About the role:

We are looking for a Staff Site Reliability Engineer who can operate at a staff level across multiple teams and clients. If you care about designing reliable platforms, influencing system architecture, and raising reliability standards across teams, you’ll enjoy working at One2N.

At One2N, you will work with our startups and enterprise clients, solving One-to-N scale problems where the proof of concept is already established and the focus is on scalability, maintainability, and long-term reliability. In this role, you will drive reliability, observability, and infrastructure architecture across systems, influencing design decisions, defining best practices, and guiding teams to build resilient, production-grade systems.

Key responsibilities:

Own and drive reliability and infrastructure strategy across multiple products or client engagements
Design and evolve platform engineering and self-serve infrastructure patterns used by product engineering teams
Lead architecture discussions around observability, scalability, availability, and cost efficiency.
Define and standardize monitoring, alerting, SLOs/SLIs, and incident management practices.
Build and review production-grade CI/CD and IaC systems used across teams
Act as an escalation point for complex production issues and incident retrospectives.
Partner closely with engineering leads, product teams, and clients to influence system design decisions early.
Mentor young engineers through design reviews, technical guidance, and best practices.
Improve Developer Experience (DX) by reducing cognitive load, toil, and operational friction.
Help teams mature their on-call processes, reliability culture, and operational ownership.
Stay ahead of trends in cloud-native infrastructure, observability, and platform engineering, and bring relevant ideas into practice

About you:

8+ years of experience in SRE, DevOps, or software engineering roles
Strong experience designing and operating Kubernetes-based systems on AWS at scale
Deep hands-on expertise in observability and telemetry, including tools like OpenTelemetry, Datadog, Grafana, Prometheus, ELK, Honeycomb, or similar.
Proven experience with infrastructure as code (Terraform, Pulumi) and cloud architecture design.
Strong understanding of distributed systems, microservices, and containerized workloads.
Ability to write and review production-quality code (Golang, Python, Java, or similar)
Solid Linux fundamentals and experience debugging complex system-level issues
Experience driving cross-team technical initiatives.
Excellent analytical and problem-solving skills, keen attention to detail, and a passion for continuous improvement.
Strong written, communication, and collaboration skills, with the ability to work effectively in a fast-paced, agile environment.

Nice to have:

Experience working in consulting or multi-client environments.
Exposure to cost optimization, or large-scale AWS account management
Experience building internal platforms or shared infrastructure used by multiple teams.
Prior experience influencing or defining engineering standards across organizations.

Apply now

Staff Site Reliability Engineer

Full-time, Location: Pune/Bangalore

Allows Remote

Staff Site Reliability Engineer

Site Reliability Engineer

Full-time, Location: Pune

Apply now

Site Reliability Engineer

Senior Site Reliability Engineer

Full-time, Location: Pune

Apply now

Looking for other roles?

Chekout our Careers Page

All Open Positions