Services

Resources

Company

Our Work

Blog

Book a Call

Back to Blog

#Observability

#SRE

Sep 3, 2025 | 14 min read

DORA Metrics: Useful or Fluff

Jaideep Khandelwal

CTO @One2N

Back to Blog

#Observability

#SRE

Sep 3, 2025 | 14 min read

DORA Metrics: Useful or Fluff

Jaideep Khandelwal

CTO @One2N

Back to Blog

#Observability

#SRE

Sep 3, 2025 | 14 min read

DORA Metrics: Useful or Fluff

Jaideep Khandelwal

CTO @One2N

Back to Blog

#Observability

#SRE

Sep 3, 2025 | 14 min read

DORA Metrics: Useful or Fluff

Jaideep Khandelwal

CTO @One2N

For the majority of my engineering career, I’ve dismissed DORA metrics as corporate fluff. Every time stakeholders mentioned implementing CI/CD solutions, they inevitably asked for a DORA metrics dashboards. As an engineer focused on solving real problems, I couldn't understand why anyone would prioritise measuring deployment frequency over fixing actual engineering challenges.

But recently, I’ve found myself questioning this stance. Are DORA metrics genuinely useful measurement tools, or are they a sophisticated measurement theater that just creates the illusion of progress while missing what actually matters?

In this post, I want to shed light on how blind adherence to DORA metrics can do more harm than good and how an engineering leader can leverage a practical strategic approach to actually link DORA with business outcomes.

The surface appeal of DORA

The DevOps Research and Assessment (DORA) metrics promise simplicity in a complex world. Four clean numbers that supposedly capture your entire software delivery performance:

Deployment Frequency: How often you release to production
Lead Time for Changes: Time from code commit to production deployment
Change Failure Rate: Percentage of deployments causing production failures
Time to Recovery: How quickly you restore service after incidents

Based on 30,000+ global respondents, DORA classifies teams as Elite, High, Medium, or Low performers. It's seductive in its simplicity - a clear path from "Low" to "Elite" that engineering leaders can present to stakeholders.

The assembly line problem

Here's where reality gets messy. I recently worked with a customer achieving ~2,500 deployments per month. Impressive deployment frequency by any standard but dig deeper, and you discover the lead time for changes were large, and their change failure rate was through the roof.

This taught me to think of DORA metrics, like manufacturing assembly line measurements. They give you the speed and defect rates, but they don’t really tell you:

Whether workers have the right tools
If quality control processes are effective
Whether the line can handle increased demand
If workers are burning out from unrealistic pace demands

DORA metrics optimize for velocity and reliability of change, but we need to look beyond the metrics and focus on the actual engineering problems.

The trap of Goodhart's Law

This brings us to Goodhart's Law:

"When a measure becomes a target, it ceases to be a good measure."

I've observed teams subconsciously gaming DORA metrics rather than improving actual delivery capability:

Gaming Deployment Frequency: Breaking features into tiny, meaningless deployments
Gaming Lead Time: Reducing change scope rather than improving process efficiency
Gaming Change Failure Rate: Being so conservative that innovation stagnates
Gaming Recovery Time: Defining "incidents" so narrowly that real problems go unmeasured

The metrics improve, but the real delivery capability doesn't.

What DORA Metrics actually measure (and Don't)

DORA metrics excel at measuring delivery process mechanics, but they're blind to delivery purpose and business impact

What they measure:

Process efficiency and reliability
Team operational discipline
Basic delivery pipeline health
Relative performance against benchmarks

What they miss:

User value creation and satisfaction
Engineering team sustainability and satisfaction
Technical debt accumulation
Strategic alignment of delivery efforts
Innovation and experimentation capability

Most critically, DORA discussions rarely include end-user impact. Users don't care about your deployment frequency, they care about whether your software reliably solves their problems.

The risk of a Measurement Theater

Organizations often implement DORA metrics as a "Measurement Theater" creating beautiful dashboards that generate the appearance of scientific management without driving meaningful improvement. Teams spend more time discussing why their metrics are yellow than actually improving their development practices.

Organizations invest significant effort collecting and analyzing DORA metrics while the actual cultural and structural barriers to delivery excellence remain unaddressed.

So are they useful or fluff?

After deep analysis, here's my nuanced verdict. DORA metrics are useful training wheels, not permanent solutions.

They're useful when:

Your organization lacks any delivery measurement
You need vocabulary to discuss delivery performance with stakeholders
You're identifying obvious operational problems
You're establishing baseline delivery discipline
You're comparing relative performance across teams or time periods

They become fluff when:

You optimize for metrics instead of outcomes
You treat them as ends rather than means
You ignore their disconnection from user value
You use them for performance management rather than learning
You assume correlation with business success without validation

A better path forward: The Maturity Model

Instead of abandoning measurement or blindly implementing DORA, I view DORA in a three-stage maturation approach. The 1st stage is what I have seen work and led in implementing from an engineer’s perspective. The 2nd and 3rd stages are possibilities and challenges to achieve but are a very good aim for engineering leaders to drive value from the business and customers.

Stage 1: DORA Foundation - Collection

The foundation stage is about establishing measurement literacy and baseline awareness in your organization. This part is straightforward. You implement basic DORA collection in explicit "learning mode," focusing on understanding what the data reveals about your current state rather than optimizing for specific targets. The emphasis should be on conversation enablement over optimization, creating simple dashboards with "this is what we're learning" messaging.

Let’s take an example of using deployment frequency in a way which can help teams to transform. When teams first start measuring their deployment patterns, the initial discoveries often surprise everyone involved. For one of the implementation, we began by simply counting deployments per week across different teams within the same organization. What emerged was a striking variation that nobody had previously quantified or even fully recognized. Team A was deploying twelve times per week while Team B deployed once per month. This wasn't a judgment about team quality, but rather a data point that initiated conversations about pipeline automation, approval processes, and risk tolerance.

As the measurement period extended, patterns became visible that had been invisible in day to day experience. For instance, Teams found that most deployments occur between Tuesday and Thursday, avoiding Mondays and Fridays. The Monday avoidance is to guard against teams wanting to avoid disrupting their week with issues that emerge over the weekend. And the Friday avoidance? It’s about being careful not to roll out changes that might need weekend fixes, which could suggest they’re not totally confident in their ability to roll back things and handle incidents effectively.

The key question to ask is "What story is our data telling us?" This approach works because it creates space for teams to discover insights about their own processes without the pressure of immediate optimization. I've seen teams have genuine “aha” moments when they first visualize their delivery patterns, recognizing bottlenecks and inefficiencies that had been hidden in the complexity of daily work.

Stage 2: DORA Plus - Where theory meets practice

The second stage involves connecting delivery metrics to business and user outcomes. Here's where the work gets challenging, but I’ve witnessed organizations making significant progress in correlating engineering delivery performance with actual business value creation.

Building on the deployment frequency foundation, teams in this stage should track their delivery patterns affect business outcomes. Teams deploying more per week have higher feature adoption rates compared to teams deploying weekly. Higher deployment frequency enables faster iteration cycles, allowing teams to respond quickly to user feedback and refine features based on real usage patterns rather than assumptions.

The business impact can be concrete in such cases. Features deployed through rapid iteration cycles, with multiple deployments per week allowing for quick adjustments, achieved adoption success rates that are higher than features deployed through traditional batch release cycles. Teams could release a basic version, observe user behavior, and deploy improvements within days rather than waiting weeks for the next release window. This responsive development approach meant features evolved to meet actual user needs rather than initial specifications.

Customer satisfaction can provide another lens for understanding deployment frequency value. Weeks with higher deployment frequency consistently can correlate to improvements in customer satisfaction scores. The mechanism here is subtle but important, more frequent deployments meant smaller changes, which created less user disruption and allowed faster resolution of user-reported issues. When problems do emerge, the smaller change sets made identification and rollback much faster, minimizing user frustration.

Another example can be customer issues resolved within twenty-four hours maintain high customer retention, while issues taking a week may result in lower retention. This data translates how customer success teams view engineering delivery capabilities, lead time wasn't just an efficiency metric but directly affected customer experience and business sustainability.

The driving question for this stage becomes "How does delivery performance connect to user value?" Organizations that navigate this stage create clear measurement frameworks that connect engineering practices to business outcomes, moving beyond vanity metrics to understand genuine impact on customers and revenue.

Stage 3: Custom Value Metrics - Strategic Differentiation

The most mature organizations develop metrics that measure their unique competitive advantages and value propositions, moving beyond standard DORA measurements to track the delivery capabilities that create strategic differentiation.

At this advanced stage, deployment frequency transforms from an engineering efficiency metric into a strategic competitive capability. Consider a fintech company operating in a heavily regulated environment where regulatory changes create both opportunities and requirements for rapid response. This organization developed a custom metric called "Regulatory Response Velocity" which measures the time from regulation announcement to compliant feature deployment in production.

This company consistently achieved two-week regulatory response times, compared to competitors’ average of three to six months. The competitive advantage was substantial and measurable. Their deployment frequency capabilities, built through years of automation and process refinement, enabled them to be first to market with compliant solutions for new regulatory requirements. This speed became their primary sales differentiator, allowing them to capture higher market share in emerging regulatory segments.

When market opportunities emerges, the response time create decisive advantages. The question for this stage is "What delivery capabilities drive our unique value creation?" organizations reaching this maturity level have moved beyond measuring what everyone else measures to understanding and optimizing the delivery capabilities that gives advantages in the marketplace.

Implementation roadmap and Call to Action

The progression demonstrated above through deployment frequency applies equally to other DORA metrics, lead time for changes, mean time to recovery, and change failure rate can all follow similar maturation paths from basic engineering measurement to strategic business capability.

The key insight is that organizations shouldn't attempt to implement all metrics at all stages simultaneously, but should thoughtfully mature their measurement capability in alignment with strategic priorities. Start with Stage 1 for establishing baseline measurement literacy across your key delivery metrics. Identify your most important business outcome metrics, whether revenue growth, customer satisfaction, market response capability, or innovation velocity.

Choose the one or two DORA metrics most relevant to your strategic advantages and progress them through Stages 2 and 3 while maintaining the foundation metrics for organizational learning.

The real question behind the metrics

The question isn't whether DORA metrics work, but what worldview they encode and whether that worldview serves your actual goals.

DORA metrics encode several assumptions:

Motion equals progress: More deployments and faster changes are inherently better
Efficiency equals effectiveness: Optimized processes lead to better outcomes
Process optimization drives value: Internal improvements automatically benefit users
Measurement drives improvement: What gets measured gets better

These assumptions work well for operational discipline but break down when applied to strategic value creation and innovation.

Practical takeaways

Start with DORA if you're measurement-immature, but explicitly frame them as temporary training wheels
Pair delivery metrics with business and user outcome metrics. Do not set DORA metric targets without understanding underlying capabilities.
Watch for gaming behaviors and course-correct quickly when metrics become targets.
Do not compare teams with different contexts using the same benchmarks.
Avoid falling into the trap of measurement theater and dashboard delusion to create beautiful dashboards.
Graduate to custom metrics as your measurement sophistication and strategic clarity grow
Focus on correlation with user value, not just operational efficiency improvements
Avoid ignoring the cultural and tooling prerequisites for meaningful measurement
Invest in measurement capability development, not just metric collection infrastructure
Create psychological safety around metric discussions to enable honest assessment and evolution

The bottom line

DORA metrics aren't inherently useful or fluff, they represent a level confusion about what we're trying to optimize. They're genuinely useful for organizations building basic measurement maturity and delivery discipline. They become dangerous limitations for organizations that need some approaches to value delivery and strategic differentiation.

The goal isn't to have perfect DORA scores it's to consistently deliver value to users through sustainable engineering practices. Sometimes those goals align beautifully. Often they require looking beyond the dashboard to understand what truly drives delivery excellence in your unique context.

If you are exploring DORA metrics or have written them off as fluff, let's talk. A no strings attached call where we can discuss how we can help your organization leverage DORA metrics to achieve real results. Reach out to us

Checkout our One2N Bits where we discuss the same

The surface appeal of DORA

The DevOps Research and Assessment (DORA) metrics promise simplicity in a complex world. Four clean numbers that supposedly capture your entire software delivery performance:

Deployment Frequency: How often you release to production
Lead Time for Changes: Time from code commit to production deployment
Change Failure Rate: Percentage of deployments causing production failures
Time to Recovery: How quickly you restore service after incidents

The assembly line problem

This taught me to think of DORA metrics, like manufacturing assembly line measurements. They give you the speed and defect rates, but they don’t really tell you:

Whether workers have the right tools
If quality control processes are effective
Whether the line can handle increased demand
If workers are burning out from unrealistic pace demands

DORA metrics optimize for velocity and reliability of change, but we need to look beyond the metrics and focus on the actual engineering problems.

The trap of Goodhart's Law

This brings us to Goodhart's Law:

"When a measure becomes a target, it ceases to be a good measure."

I've observed teams subconsciously gaming DORA metrics rather than improving actual delivery capability:

Gaming Deployment Frequency: Breaking features into tiny, meaningless deployments
Gaming Lead Time: Reducing change scope rather than improving process efficiency
Gaming Change Failure Rate: Being so conservative that innovation stagnates
Gaming Recovery Time: Defining "incidents" so narrowly that real problems go unmeasured

The metrics improve, but the real delivery capability doesn't.

What DORA Metrics actually measure (and Don't)

DORA metrics excel at measuring delivery process mechanics, but they're blind to delivery purpose and business impact

What they measure:

Process efficiency and reliability
Team operational discipline
Basic delivery pipeline health
Relative performance against benchmarks

What they miss:

User value creation and satisfaction
Engineering team sustainability and satisfaction
Technical debt accumulation
Strategic alignment of delivery efforts
Innovation and experimentation capability

Most critically, DORA discussions rarely include end-user impact. Users don't care about your deployment frequency, they care about whether your software reliably solves their problems.

The risk of a Measurement Theater

Organizations invest significant effort collecting and analyzing DORA metrics while the actual cultural and structural barriers to delivery excellence remain unaddressed.

So are they useful or fluff?

After deep analysis, here's my nuanced verdict. DORA metrics are useful training wheels, not permanent solutions.

They're useful when:

Your organization lacks any delivery measurement
You need vocabulary to discuss delivery performance with stakeholders
You're identifying obvious operational problems
You're establishing baseline delivery discipline
You're comparing relative performance across teams or time periods

They become fluff when:

You optimize for metrics instead of outcomes
You treat them as ends rather than means
You ignore their disconnection from user value
You use them for performance management rather than learning
You assume correlation with business success without validation

A better path forward: The Maturity Model

Stage 1: DORA Foundation - Collection

Stage 2: DORA Plus - Where theory meets practice

Stage 3: Custom Value Metrics - Strategic Differentiation

Implementation roadmap and Call to Action

Choose the one or two DORA metrics most relevant to your strategic advantages and progress them through Stages 2 and 3 while maintaining the foundation metrics for organizational learning.

The real question behind the metrics

The question isn't whether DORA metrics work, but what worldview they encode and whether that worldview serves your actual goals.

DORA metrics encode several assumptions:

Motion equals progress: More deployments and faster changes are inherently better
Efficiency equals effectiveness: Optimized processes lead to better outcomes
Process optimization drives value: Internal improvements automatically benefit users
Measurement drives improvement: What gets measured gets better

These assumptions work well for operational discipline but break down when applied to strategic value creation and innovation.

Practical takeaways

Start with DORA if you're measurement-immature, but explicitly frame them as temporary training wheels
Pair delivery metrics with business and user outcome metrics. Do not set DORA metric targets without understanding underlying capabilities.
Watch for gaming behaviors and course-correct quickly when metrics become targets.
Do not compare teams with different contexts using the same benchmarks.
Avoid falling into the trap of measurement theater and dashboard delusion to create beautiful dashboards.
Graduate to custom metrics as your measurement sophistication and strategic clarity grow
Focus on correlation with user value, not just operational efficiency improvements
Avoid ignoring the cultural and tooling prerequisites for meaningful measurement
Invest in measurement capability development, not just metric collection infrastructure
Create psychological safety around metric discussions to enable honest assessment and evolution

The bottom line

Checkout our One2N Bits where we discuss the same

The surface appeal of DORA

The DevOps Research and Assessment (DORA) metrics promise simplicity in a complex world. Four clean numbers that supposedly capture your entire software delivery performance:

Deployment Frequency: How often you release to production
Lead Time for Changes: Time from code commit to production deployment
Change Failure Rate: Percentage of deployments causing production failures
Time to Recovery: How quickly you restore service after incidents

The assembly line problem

This taught me to think of DORA metrics, like manufacturing assembly line measurements. They give you the speed and defect rates, but they don’t really tell you:

Whether workers have the right tools
If quality control processes are effective
Whether the line can handle increased demand
If workers are burning out from unrealistic pace demands

DORA metrics optimize for velocity and reliability of change, but we need to look beyond the metrics and focus on the actual engineering problems.

The trap of Goodhart's Law

This brings us to Goodhart's Law:

"When a measure becomes a target, it ceases to be a good measure."

I've observed teams subconsciously gaming DORA metrics rather than improving actual delivery capability:

Gaming Deployment Frequency: Breaking features into tiny, meaningless deployments
Gaming Lead Time: Reducing change scope rather than improving process efficiency
Gaming Change Failure Rate: Being so conservative that innovation stagnates
Gaming Recovery Time: Defining "incidents" so narrowly that real problems go unmeasured

The metrics improve, but the real delivery capability doesn't.

What DORA Metrics actually measure (and Don't)

DORA metrics excel at measuring delivery process mechanics, but they're blind to delivery purpose and business impact

What they measure:

Process efficiency and reliability
Team operational discipline
Basic delivery pipeline health
Relative performance against benchmarks

What they miss:

User value creation and satisfaction
Engineering team sustainability and satisfaction
Technical debt accumulation
Strategic alignment of delivery efforts
Innovation and experimentation capability

Most critically, DORA discussions rarely include end-user impact. Users don't care about your deployment frequency, they care about whether your software reliably solves their problems.

The risk of a Measurement Theater

Organizations invest significant effort collecting and analyzing DORA metrics while the actual cultural and structural barriers to delivery excellence remain unaddressed.

So are they useful or fluff?

After deep analysis, here's my nuanced verdict. DORA metrics are useful training wheels, not permanent solutions.

They're useful when:

Your organization lacks any delivery measurement
You need vocabulary to discuss delivery performance with stakeholders
You're identifying obvious operational problems
You're establishing baseline delivery discipline
You're comparing relative performance across teams or time periods

They become fluff when:

You optimize for metrics instead of outcomes
You treat them as ends rather than means
You ignore their disconnection from user value
You use them for performance management rather than learning
You assume correlation with business success without validation

A better path forward: The Maturity Model

Stage 1: DORA Foundation - Collection

Stage 2: DORA Plus - Where theory meets practice

Stage 3: Custom Value Metrics - Strategic Differentiation

Implementation roadmap and Call to Action

Choose the one or two DORA metrics most relevant to your strategic advantages and progress them through Stages 2 and 3 while maintaining the foundation metrics for organizational learning.

The real question behind the metrics

The question isn't whether DORA metrics work, but what worldview they encode and whether that worldview serves your actual goals.

DORA metrics encode several assumptions:

Motion equals progress: More deployments and faster changes are inherently better
Efficiency equals effectiveness: Optimized processes lead to better outcomes
Process optimization drives value: Internal improvements automatically benefit users
Measurement drives improvement: What gets measured gets better

These assumptions work well for operational discipline but break down when applied to strategic value creation and innovation.

Practical takeaways

Start with DORA if you're measurement-immature, but explicitly frame them as temporary training wheels
Pair delivery metrics with business and user outcome metrics. Do not set DORA metric targets without understanding underlying capabilities.
Watch for gaming behaviors and course-correct quickly when metrics become targets.
Do not compare teams with different contexts using the same benchmarks.
Avoid falling into the trap of measurement theater and dashboard delusion to create beautiful dashboards.
Graduate to custom metrics as your measurement sophistication and strategic clarity grow
Focus on correlation with user value, not just operational efficiency improvements
Avoid ignoring the cultural and tooling prerequisites for meaningful measurement
Invest in measurement capability development, not just metric collection infrastructure
Create psychological safety around metric discussions to enable honest assessment and evolution

The bottom line

Checkout our One2N Bits where we discuss the same

The surface appeal of DORA

The DevOps Research and Assessment (DORA) metrics promise simplicity in a complex world. Four clean numbers that supposedly capture your entire software delivery performance:

Deployment Frequency: How often you release to production
Lead Time for Changes: Time from code commit to production deployment
Change Failure Rate: Percentage of deployments causing production failures
Time to Recovery: How quickly you restore service after incidents

The assembly line problem

This taught me to think of DORA metrics, like manufacturing assembly line measurements. They give you the speed and defect rates, but they don’t really tell you:

Whether workers have the right tools
If quality control processes are effective
Whether the line can handle increased demand
If workers are burning out from unrealistic pace demands

DORA metrics optimize for velocity and reliability of change, but we need to look beyond the metrics and focus on the actual engineering problems.

The trap of Goodhart's Law

This brings us to Goodhart's Law:

"When a measure becomes a target, it ceases to be a good measure."

I've observed teams subconsciously gaming DORA metrics rather than improving actual delivery capability:

Gaming Deployment Frequency: Breaking features into tiny, meaningless deployments
Gaming Lead Time: Reducing change scope rather than improving process efficiency
Gaming Change Failure Rate: Being so conservative that innovation stagnates
Gaming Recovery Time: Defining "incidents" so narrowly that real problems go unmeasured

The metrics improve, but the real delivery capability doesn't.

What DORA Metrics actually measure (and Don't)

DORA metrics excel at measuring delivery process mechanics, but they're blind to delivery purpose and business impact

What they measure:

Process efficiency and reliability
Team operational discipline
Basic delivery pipeline health
Relative performance against benchmarks

What they miss:

User value creation and satisfaction
Engineering team sustainability and satisfaction
Technical debt accumulation
Strategic alignment of delivery efforts
Innovation and experimentation capability

Most critically, DORA discussions rarely include end-user impact. Users don't care about your deployment frequency, they care about whether your software reliably solves their problems.

The risk of a Measurement Theater

Organizations invest significant effort collecting and analyzing DORA metrics while the actual cultural and structural barriers to delivery excellence remain unaddressed.

So are they useful or fluff?

After deep analysis, here's my nuanced verdict. DORA metrics are useful training wheels, not permanent solutions.

They're useful when:

Your organization lacks any delivery measurement
You need vocabulary to discuss delivery performance with stakeholders
You're identifying obvious operational problems
You're establishing baseline delivery discipline
You're comparing relative performance across teams or time periods

They become fluff when:

You optimize for metrics instead of outcomes
You treat them as ends rather than means
You ignore their disconnection from user value
You use them for performance management rather than learning
You assume correlation with business success without validation

A better path forward: The Maturity Model

Stage 1: DORA Foundation - Collection

Stage 2: DORA Plus - Where theory meets practice

Stage 3: Custom Value Metrics - Strategic Differentiation

Implementation roadmap and Call to Action

Choose the one or two DORA metrics most relevant to your strategic advantages and progress them through Stages 2 and 3 while maintaining the foundation metrics for organizational learning.

The real question behind the metrics

The question isn't whether DORA metrics work, but what worldview they encode and whether that worldview serves your actual goals.

DORA metrics encode several assumptions:

Motion equals progress: More deployments and faster changes are inherently better
Efficiency equals effectiveness: Optimized processes lead to better outcomes
Process optimization drives value: Internal improvements automatically benefit users
Measurement drives improvement: What gets measured gets better

These assumptions work well for operational discipline but break down when applied to strategic value creation and innovation.

Practical takeaways

Start with DORA if you're measurement-immature, but explicitly frame them as temporary training wheels
Pair delivery metrics with business and user outcome metrics. Do not set DORA metric targets without understanding underlying capabilities.
Watch for gaming behaviors and course-correct quickly when metrics become targets.
Do not compare teams with different contexts using the same benchmarks.
Avoid falling into the trap of measurement theater and dashboard delusion to create beautiful dashboards.
Graduate to custom metrics as your measurement sophistication and strategic clarity grow
Focus on correlation with user value, not just operational efficiency improvements
Avoid ignoring the cultural and tooling prerequisites for meaningful measurement
Invest in measurement capability development, not just metric collection infrastructure
Create psychological safety around metric discussions to enable honest assessment and evolution

The bottom line

Checkout our One2N Bits where we discuss the same

The surface appeal of DORA

The DevOps Research and Assessment (DORA) metrics promise simplicity in a complex world. Four clean numbers that supposedly capture your entire software delivery performance:

Deployment Frequency: How often you release to production
Lead Time for Changes: Time from code commit to production deployment
Change Failure Rate: Percentage of deployments causing production failures
Time to Recovery: How quickly you restore service after incidents

The assembly line problem

This taught me to think of DORA metrics, like manufacturing assembly line measurements. They give you the speed and defect rates, but they don’t really tell you:

Whether workers have the right tools
If quality control processes are effective
Whether the line can handle increased demand
If workers are burning out from unrealistic pace demands

DORA metrics optimize for velocity and reliability of change, but we need to look beyond the metrics and focus on the actual engineering problems.

The trap of Goodhart's Law

This brings us to Goodhart's Law:

"When a measure becomes a target, it ceases to be a good measure."

I've observed teams subconsciously gaming DORA metrics rather than improving actual delivery capability:

Gaming Deployment Frequency: Breaking features into tiny, meaningless deployments
Gaming Lead Time: Reducing change scope rather than improving process efficiency
Gaming Change Failure Rate: Being so conservative that innovation stagnates
Gaming Recovery Time: Defining "incidents" so narrowly that real problems go unmeasured

The metrics improve, but the real delivery capability doesn't.

What DORA Metrics actually measure (and Don't)

DORA metrics excel at measuring delivery process mechanics, but they're blind to delivery purpose and business impact

What they measure:

Process efficiency and reliability
Team operational discipline
Basic delivery pipeline health
Relative performance against benchmarks

What they miss:

User value creation and satisfaction
Engineering team sustainability and satisfaction
Technical debt accumulation
Strategic alignment of delivery efforts
Innovation and experimentation capability

Most critically, DORA discussions rarely include end-user impact. Users don't care about your deployment frequency, they care about whether your software reliably solves their problems.

The risk of a Measurement Theater

Organizations invest significant effort collecting and analyzing DORA metrics while the actual cultural and structural barriers to delivery excellence remain unaddressed.

So are they useful or fluff?

After deep analysis, here's my nuanced verdict. DORA metrics are useful training wheels, not permanent solutions.

They're useful when:

Your organization lacks any delivery measurement
You need vocabulary to discuss delivery performance with stakeholders
You're identifying obvious operational problems
You're establishing baseline delivery discipline
You're comparing relative performance across teams or time periods

They become fluff when:

You optimize for metrics instead of outcomes
You treat them as ends rather than means
You ignore their disconnection from user value
You use them for performance management rather than learning
You assume correlation with business success without validation

Motion equals progress: More deployments and faster changes are inherently better
Efficiency equals effectiveness: Optimized processes lead to better outcomes
Process optimization drives value: Internal improvements automatically benefit users
Measurement drives improvement: What gets measured gets better

These assumptions work well for operational discipline but break down when applied to strategic value creation and innovation.

Practical takeaways

Start with DORA if you're measurement-immature, but explicitly frame them as temporary training wheels
Pair delivery metrics with business and user outcome metrics. Do not set DORA metric targets without understanding underlying capabilities.
Watch for gaming behaviors and course-correct quickly when metrics become targets.
Do not compare teams with different contexts using the same benchmarks.
Avoid falling into the trap of measurement theater and dashboard delusion to create beautiful dashboards.
Graduate to custom metrics as your measurement sophistication and strategic clarity grow
Focus on correlation with user value, not just operational efficiency improvements
Avoid ignoring the cultural and tooling prerequisites for meaningful measurement
Invest in measurement capability development, not just metric collection infrastructure
Create psychological safety around metric discussions to enable honest assessment and evolution

The bottom line

Checkout our One2N Bits where we discuss the same

November 26, 2025 | 3 min read

Blog

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Services

Resources

Company

DORA Metrics: Useful or Fluff

DORA Metrics: Useful or Fluff

DORA Metrics: Useful or Fluff

DORA Metrics: Useful or Fluff

DORA Metrics: Useful or Fluff

Share

Jump to section

Continue reading.

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

Error Budget Calculation: Downtime Minutes for every SLO

Error Budget Calculation: Downtime Minutes for every SLO

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

Comparing Latency vs Throughput: why high utilisation hurts reliability

Ever wondered why your systems slow down or fail during peak times? This guide explains in plain English how latency and throughput affect reliability, and why running too close to max capacity leads to problems

Comparing Latency vs Throughput: why high utilisation hurts reliability

Ever wondered why your systems slow down or fail during peak times? This guide explains in plain English how latency and throughput affect reliability, and why running too close to max capacity leads to problems

Deploying a scalable NATS cluster part 2: hands-on demo

Deploying a scalable NATS cluster part 2: hands-on demo

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

Error Budget Calculation: Downtime Minutes for every SLO

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

Error Budget Calculation: Downtime Minutes for every SLO

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content