Nikos Vasileiou

November 11, 2022

15 Engineering KPI ideas

miguel-a-amutio-ngZ4V-myG5s-unsplash.jpg


As an Engineering Manager in a Software company, you will most likely want to answer the following crucial question: “How are we doing?”

Setting Key Performance Indicators (KPIs) is a crucial component to get the answers rolling on a weekly, monthly or quarterly basis. You want to set a recurring process for continuous improvement, celebration when things are doing great, or proactive action when things are trending down.

On this context, I'd like to write down my favourites. What I consider some good KPIs to implement and follow in your company.  In short:

1. Deployment frequency
2. Lead time
3. Uptime
4. Incident frequency
5. Response time
6. Operating costs
7. Rollback frequency
8. Bugs creation vs. resolution rate
9. Code crash frequency
10. Static code analysis issues
11. Security patch rate
12. Pen-test vulnerabilities
13. Employee mood rating
14. Engagement in chat apps
15. Time to hire

Let’s break them down into categories, and explain why they matter.

Agility metrics

The agility metrics category covers engineering KPIs that indirectly reveal our agile practices. Are we releasing software in small, frequent increments? Or are we working in a waterfall way via infrequent, long releases?

1. Deployment frequency
How frequently do we deploy to production? How many times per week? Per day? Is our deployment frequency trending up or down over time? High deployment frequency is an indicator of agility and incremental release. This is how modern software companies work and operate.

2.  Lead time
How much time does it take for a task or project to get from start to finish? Low lead times are an indicator of agile thinking and releasing through small increments. High lead times are an indicator of taking over big projects and an opportunity to break things down, or improve waste in the development process.


DevOps metrics

The DevOps metrics category contains KPIs that are related to our operations. Are we deploying and operating scalable and healthy services? Or are we falling behind, and we need to rethink our architecture or modernise our stack?

3. Uptime
What is our uptime over the last month, quarter or year? Are we operating under our SLAs with our customers? For example, is our uptime %99 or %99.9 or %99.99? Digits matter. Higher uptime is an indicator of a resilient, redundant architecture, with fault tolerance and great QA processes in place. Low uptimes should ring the bell that we are not serving our users the right way.

4. Incident frequency
How frequently do we encounter incidents? An incident could be related to an uptime incident, or even a feature incident. Broken functionality, slow queue consumption, low page loads, or other issues that affect user or integration experience in an unusual way. Trending low in our incident frequency is a great indicator of good operations and quality software. Frequent incidents are an indicator that we are dropping the ball, and we need to take a step back as a team and fortify our product.

5. Response time
Are our transactions measured in milliseconds, or does it take seconds to respond? High response time is an indicator that we are not scaling well. Maybe our databases are suffering, or we have slow queries, or we need faster servers.

6. Operating costs
Are we operating over the cloud? Operational cost is one core KPI we can literally "not afford" to neglect. Increasing costs is an indicator of opportunity to reduce waste, by shutting down unused resources, or rethinking architectures making them more cost-effective. 


Code quality

KPIs that are crucial to answer the question: “Are we releasing quality code?”

7. Rollback frequency
How frequently do we roll back releases? High rollback frequency is an indicator that we are not testing enough. Have we introduced staging or canary deployments, and promotion to production after adequate testing has been performed?

8. Bugs creation vs. resolution rate
How frequently do we introduce bugs into the system? How fast are we solving them? A high bug creation rate is an indicator of lacking proper QA. A high-resolution rate is an indicator that we care for the quality of our product and our users' experience.

9. Code crash frequency
Integrating with tools like Sentry or similar tools, can give us a great metric over code quality. An increasing rate in stack trace issues is an indicator of low-quality code, or lack of managing edge cases in our algorithms.


Security metrics

Cybersecurity is a hot topic for the internet business. Taking prompt action and monitoring our security posture is something we need to be monitoring about.

10. Static code analysis issues
There are several static code analysis tools that can proactively discover security issues. Keeping track of what’s happening in our code, is a crucial metric to watch for, and an item for your DevSecOps roadmap.

11. Security patch rate
Automatic scanning of our code repositories can give us indicators of open CVEs from our third-party library dependencies. Tools like NPM, SNYK, TRIVY and others, can report how open we are over critical, high, medium and low vulnerabilities. Frequent patching rate and lowering those number is a good indicator that security is taken seriously within the company.

12. Pen-test vulnerabilities
Having frequent penetration testings, either internally or through third parties, can reveal security issues. How good are we at addressing them over time? How are we improving from pen-test to pen-test? Are we resolving the issues, or are we falling short because of other priorities?


People metrics

None of the above is possible, unless we invest in happy, engaged and productive people. Here are a few indicators to consider over managing a healthy team.

13. Employee mood rating
There are several tools that can be used to monitor employee mood rating. People can check-in weekly and rate how they feel. A declining mood rating for our team, is an indicator that morale is low. Maybe it’s time for some inspiring goal setting, some fun activity, or even some time off to cool down and recharge?

14. Engagement in chat apps
Are you working remotely? High chances you already use a chat collaboration tool like Slack, Microsoft Teams, Google Chat etc. Some tools can give you metrics on how people are engaged in the app. Declining engagement in chat apps, where people collaborate, can be an indicator of isolation, or disengagement. Perhaps something worth investigating further.

15. Time to hire
People come and go. No matter the circumstances, we should be able to hire quality people fast. Time to hire is a good indicator on whether we are recruiting fast enough or we need to up our game in the field.

•·················•·················•

I hope you find these KPI ideas inspiring for your business. Have you experienced relevant ideas? I'd love to hear your thoughts.

- Nikos Vasileiou

About Nikos Vasileiou

Hello friends!
I am Nikos, CTO at Transifex & co-founder of Team O’clock.

I’ve created the Agile Squads framework and co-authored Hey Authors, a blog aggregator for the HEY World community.