Measuring What Matters: Rethinking Productivity in the Age of AI

Oct 16, 2025

If you’ve been managing an engineering team lately, you might have noticed something strange: your team seems to be getting more done, but it’s hard to tell if they’re actually more productive. AI tools like GitHub Copilot, ChatGPT, and Claude can generate code, tests, and even draft entire pull requests. That sounds great, right? But here’s the catch: the numbers we used to rely on—story points, pull requests merged, or lines of code don’t always reflect the real work anymore.

I’ve been thinking a lot about this, and honestly, it can feel a little unsettling. As a manager, you want to celebrate wins, but you also want to make sure your team is building things that matter, not just produce more lines of code because an AI helped.

Why old metrics don’t tell the whole story

Traditional engineering metrics were built in a world where humans wrote almost every line of code. Now, AI can write hundreds of lines in minutes. But fast doesn’t always mean better. A pull request generated by AI might take less time to write but more time to review, debug, or integrate into the system. That’s why relying only on traditional metrics can give you a false sense of productivity.

Recent reports show that over 90% of engineering organizations are using AI tools in some way (Jellyfish, 2025). Some studies suggest developers can be 30–55% faster on small coding tasks with AI help. But in real-world teams, the results are mixed. AI can save time on routine work, yet create new kinds of friction—extra review, unexpected defects, or integration challenges (GitLab, 2024; GitHub, 2022).

Shifting the focus: outcomes, health, and team experience

So, if old metrics fail, what should we measure? I’ve been using a simple framework that feels practical and human: focus on three things.

1. Outcomes and impact

This is about the real value delivered. Did a feature improve the product? Did it help customers? Some simple ways to measure this are lead time for customer-impacting changes or adoption rates of new features.

2. Engineering health

AI might write code quickly, but code quality still matters. Keep an eye on post-release defect rates, the amount of rework needed, and tech debt. Tracking defects that come specifically from AI-generated code is helpful too.

3. Team flow and experience

AI changes how developers spend their time. Sometimes it frees them up to be more creative, but it can also create frustration. Measuring review time, developer sentiment (even a simple “Did AI help you today?” poll), or onboarding time for new hires can give you insight into how AI affects the team experience.

A small experiment you can try

One thing I’ve found helpful is to run a short pilot. Pick a team that will use AI tools more actively for a sprint, and another similar team that doesn’t. Track the usual metrics—lead time, review time, defect rate—and add a tiny daily pulse survey asking developers if AI helped or caused extra work. At the end of the sprint, compare results. You’ll quickly see where AI actually helps and where it introduces friction.

Keeping it simple and safe

AI isn’t magic or incapable of making mistakes. A few simple rules can make a big difference :

Ask developers to note when they used AI in a PR.
Always have a human review critical code.
Track defects from AI-generated code separately.
Make sure proprietary or sensitive code isn’t sent to public AI models.

These steps don’t need to be heavy-handed. They’re just small ways to protect the team and product.

The bigger picture

The real lesson I’ve learned is that productivity isn’t just about lines of code or PR counts anymore. It’s about impact, quality, and the team experience. AI can help us go faster, but human judgment and care still matter. As managers, our job is to measure the things that truly matter and make sure our teams are doing meaningful work—not just faster work.

References

Jellyfish. (2025). Engineering in the Age of AI: 2025 State of Engineering Management Report.
GitHub. (2022). Quantifying GitHub Copilot’s Impact on Developer Productivity.
GitLab. (2024). Measuring AI Effectiveness Beyond Developer Productivity Metrics.
DORA / Google Cloud. (2023). The Four Keys: Measuring Software Delivery Performance.
Atlassian. (2025). State of Developer Experience Report.
IBM Research. (2025). Examining the Use and Impact of an AI Code Assistant.