My Nightmares With Engineering Estimates
We asked the Product community, how you do your engineering estimates today.
These days, we’re either using a variation of Agile or skipping estimates entirely. And we all pad our estimates with buffers.
I’ve done it both ways. I’ve done Agile and I’ve run teams without any estimates. I actually use a third option now, something entirely different that no one mentioned. It looks like I’m very much in the minority.
But before we get to how I do my engineering estimates today, I want to show you what happened to my projects when I used the most common methods: Agile or no estimates.
No Estimates: KISSmetrics
There was a time when estimates weren’t in my vocabulary. The thought of getting an estimate didn’t even cross my mind.
I only thought about what needed to be built. The big ideas. The specs. The designs. The details of what needed to be done next. After all, as long as we’re working on the most important thing at that moment, why does it matter that we estimate how long it will take? Engineers hate estimates, teams hate estimates, managers hate estimates, and the real work never lines up to estimates anyway. Why not throw them out?
I was in the no estimates camp and it felt great. Little did I know that the initial feeling of relief wouldn’t last.
In 2009, when we first built KISSmetrics and the team was just a handful of people, we didn’t think about estimates. We jumped right into building a product we had discovered after thorough research.
The initial version of KISSmetrics was built in only 30 days by one engineer. Estimates for 30 days worth of work could only get in the way, we thought. We wanted to test the hypotheses we had learned from research, and there was no better way than building an MVP really fast.
We were so busy furiously building during those 30 days, that we didn’t take time to think about what was coming next. Based on our research, we had a few features queued up that would help grow the business. We spec’d out the features and thought through all the details, but didn’t think beyond that. We had #noestimates.
We were on top of the world, with great early validation of our hypotheses. We thought we had a plan of what to do next.
The product was usable and customers actually loved it. But, it had been built with slow and ugly code. Code that didn’t enable us to scale beyond a few hundred initial users.
Our customers didn’t experience much pain from that – but we did. We couldn’t add features because of how fast we had built the product. Even worse, it took 7 servers for every 30 customers. We couldn’t grow the business.
That’s when it hit us.
We (finally) realized it would take our lead engineer, John, 90 days just to make sure the product backend could keep up with customer demand, plus enable us to add additional features.
Ninety days – aka 480 developer hours – is what it would take us to bring on more customers and to build out more features.
We were screwed. All because of no estimates.
Just as things were getting good, just as we were starting to develop Product/Market fit, we realized that our lack of estimates had forced us into a hole that broke our momentum.
When it’s the early stages of a product initiative, you just go and do things. You build as quickly as you can. As the Facebook mantra used to be: “Move fast and break things.”
That works well, until you need to grow and things start falling apart fast because of communication gaps, lack of documentation and zero process.
We knowingly and purposely built our MVP with ugly buggy code that we knew would need to be refactored. That wasn’t the mistake, that’s what you have to do to survive in the early days. But what we had neglected to start planning for was what came next. That’s what broke us.
We never estimated how long it would take to refactor, so we never actually knew how long anything would take to do. We lost traction and our growth trajectory slowed as a result.
Had we estimated during our 30-day buildout instead of after, things could have turned out really differently. The business could have grown faster and been bigger today.
These are the three common pitfalls from using no estimates that I ran into.
No Tradeoffs Made
Had we used estimates early on at KISSmetrics, we would have made educated tradeoffs based on an understanding of how long things would take. With that visibility, we would have had more resources ready, done more things in parallel, or descoped features to keep our momentum steady.
Instead we were stuck waiting, twiddling our thumbs for 90 days until the product was ready for scale. We lost customers. We lost time. And we lost momentum.
Without knowing how long different parts of a feature take, it’s impossible to be able to make tradeoff decisions about what to build and what to hold off on. When there are no estimates and no tradeoffs, momentum slows and launch dates are missed.
Making smart tradeoffs upfront is how you maintain your momentum as a startup.
Too Much Time Communicating
Using no estimates requires more communication with the team on a daily – if not hourly – basis. There has to be constant conversation to gauge status and account for hiccups.
While frequent meetings or Slack rooms abuzz every day with conversations can be exciting and fun, too much communication can also be frustrating and distracting for the engineers.
This tweet explains it well:
"Why do programmers need long periods of silence in order to do their job? https://t.co/F48s7Fpu96" via @hnshah
I plotted it on a graph 📈 pic.twitter.com/9DAkMBywAL— Rishi Athanikar (@ARishi_) January 21, 2017
During our first 120 days building KISSmetrics, we had daily check ins with engineering. We’d always end up asking questions that would cause the engineers to go down a random research rabbit hole since we were never structured and didn’t have reliable estimates.
By overcompensating with communication, we actually made things worse for ourselves. Too many unnecessary projects were queued up and we lost valuable time working on the wrong things.
Tasks Aren’t Broken Up
Without estimates, big specs never get dug into, pieced apart and researched. There’s no pressure to estimate how long anything will take, so the discipline of pulling big tasks apart into small pieces isn’t required. Instead of a thorough, organized process at the start of a project, technical research happens on an as-needed basis when the buildout hits a snag.
At KISSmetrics, we realized much too late that it would take 90 days extra just to get our product in good enough shape to take on more customers and build more features. If we had technical research and estimates up front, that never would have happened. Breaking the tasks up ahead of time would have revealed our most dangerous assumptions and helped us to avoid them entirely.
As a result of no estimates, things that would have taken weeks take months. And overbuilding happens in all the wrong places since no one bothers to figure out what tasks take the longest.
Agile Points: Crazy Egg
After our challenges with no estimates, we tried what I thought was the next best thing: points.
At Crazy Egg we were adding a new feature called Scrollmap, it had a few technical challenges that we had to address during Product development. Our software was going from monitoring clicks on a website to also monitoring how people scrolled on web pages. Analyzing scrolling was something we hadn’t done before. We needed to make sure we implemented it in a way that didn’t cause unexpected errors while visitors were using the websites Crazy Egg was running on.
There were a handful of products that already existed in the market that had a Scrollmap. So we were able to do competitive research to understand the right problems to solve with the new feature.
We had a spec scoped out for engineering that was based on what we learned from research. Our research pointed us to exactly what people cared about most, how far down website visitors were scrolling on web pages.
After we finished spec’ing, we broke the spec up into bite-size tasks with the engineers. We used Pivotal Tracker, an Agile project management tool, to track and assign points to every task.
At the time, we had a simple 3-point system:
1 point was 1 hour or less
2 points was between 1 and 2 hours
3 points was 4 hours
The spec we had created and shared with engineering was super detailed. However, we didn’t have a Product prioritization process in place, so we never asked engineering to do technical research that would inform our spec.
Within a day of us sharing the spec with engineering, we got back an estimate. When I saw it, my immediate reaction was: that’s WAY too long.
With our first pass at assigning points to the tasks the feature came out to be a total of 84 points, or somewhere around 100 hours.
Our customers were hungry for the feature, and we didn’t want to waste any time. 84 points was just too many points and too much time.
So what did we do? We did what everyone does. We de-scoped.
We de-scoped a lot.
We de-scoped too much. We focused on the points and the timelines, not what the customer genuinely wanted. And we paid for it.
We were so focused on reducing points, that we forgot our research.
So we cut out whatever tasks took the longest that weren’t necessary to ship the feature. Instead of being thoughtful about what was needed and what tradeoffs were worth making. Points made it black and white for us: Fewer points meant a faster release.
As a result, we ended up launching something that came out quickly, but was incomplete in our customers’ eyes. We had ignored the research and instead just built the fastest thing possible.
Scrollmap ended up not being used by our customers because it wasn’t better – or even equal to competitors’ features. After releasing the feature we quickly learned that nobody seemed to care about it. 95% of the people who used it for the first time, didn’t use it again. They didn’t find it valuable enough and it simply wasn’t providing them with the information they needed. Since the feature wasn’t good enough, we had to spend weeks adding to it after we had already released it.
We missed out on one of the biggest opportunities for growth.
We knew exactly what the market wanted. We had all the research. We could have built a feature people loved right from the start. And we could have launched it, built a ton of momentum on day 1, and kept going. Instead we burned a bunch of our users and had to build all that momentum inch by inch. We made it harder on ourselves.
And the worst part is, we knew better than that. We had spent the time and effort to do thorough research before we started building. We could have nailed it, but we didn’t.
Speaking A Different Language
With points, it can be like Product and Engineering are speaking a different language.
Engineering considers points as a way to estimate difficulty. It’s simple – more points, more difficulty.
Meanwhile, Product is trying to launch on time and wants to understand how long things take in order to make tradeoff decisions. Product wants to create what solves the customer’s challenges, not what takes the least amount of points to build.
When breaking out the engineering tasks for Crazy Egg’s Scrollmap feature we ran into this exact problem. Although Product had done research and knew exactly what customers needed, we ignored the learnings. Instead we focused on building something fast once we realized how much difficulty the feature would take to build.
The feature started out with Product thinking about the right customer needs but once Engineering got involved everyone started focusing on speed to shipping instead.
Making The Wrong Tradeoffs
Points put everyone in a singular mindset: de-scoping so there are fewer points. It’s nearly impossible to resist that pressure. I’ve rarely seen a Product team resist that temptation. Including ones that I’ve run.
When points are the sole measure, and there is no prioritization methodology in place, products get shipped to customers that don’t match their needs and expectations.
At Crazy Egg, we released a new feature that didn’t end up hitting the mark. Because of this we had to go back and modify the newly released feature just to be competitive. Then we had to re-launch that feature.
We could have done a better job in the first place but we didn’t.
All because we were too focused on points. We wasted time, lost customers and missed the opportunity for a big launch that would have helped us grow faster.
Time Buckets Aren’t Actual Time
Another challenge I’ve come up against is that the point-based time buckets don’t actually add up to hours that are clearly understandable by Product. Or to Engineering either!
When a point is equal to an hour, then 10 minutes and 1 hour both equal one point. Because of that, Product can’t truly understand how much time things actually take, and therefore can’t make tradeoffs and prioritize.
For example, using the scale we used at Crazy Egg, 100 points could equal any of the following in hours:
- 17 hours, if each of the 100 points was 1 point, and the tasks each took 10 minutes
- 50 hours, if each of the 100 points was 1 point, and the tasks each took 30 minutes
- 75 hours, if the 100 points was composed of 2-point tasks that took 1.5 hours
- 100 hours, if each of the 100 points was 1 point, and the tasks each took 1 hour
- 133 hours, if the 100 points were composed of 3-point tasks that took 4 hours
Teams use points with the assumption that the estimates will be slightly off. In reality, the variance is huge. When points are the only measure, no one actually knows how long something will take. Once we got the 84-point estimate for the Scrollmap feature, we felt like it would take too long to build exactly what we had discovered customers needed. So we started de-scoping as quickly as we could.
We made assumptions about how long an 84-point estimate would take to complete without actually knowing how long it would take. We forgot what our customers wanted and made knee jerk sacrifices to lower the number of points it would take.
By trusting the point system, we built and released a feature that nobody cared for. We made severe cuts when it might not even have taken that long in the first place. We had to painfully go back and improve it instead of getting it right in the first place.
Buffers: KISSmetrics
We’re all using buffers, it was by far the most recommended tactic when I asked everyone how they’re doing estimates.
And buffers cause even more damage than no estimates or Agile points.
We had just come off a really rough first 120 days of building KISSmetrics. We had finished fortifying and refactoring our Product so that it could support more customers. And we finally had the ability to add new features to our Product.
But we were nervous. We had Product PTSD. The last time we built using no estimates, everything fell apart. So we decided to do two things:
- Estimate how many days (instead of weeks) that initiatives would take.
- Add 15-30% buffers to everything we estimated to account for unknowns.
The last thing we wanted was to get burned again. So we were conservative. On some days, we hit our estimates. On others, we were a few days behind. And sometimes we were early. We thought it all evened out and we had the right method in place. Until things started going wrong.
We’d hit snags. Major ones. Unexpected delays that buffers just didn’t account for. We’d go down one path and then realize it was the wrong one. That we’d have to rebuild something because we had made the wrong decision.
We even started adding buffers on top of buffers for our sales prospects and customers. We’d take the expected release date which already had a 15-30% buffer, and add another 20% just in case. And we’d still miss those dates. Buffers always seemed to lead to even more buffers.
Buffers caused us to be lazy in our planning and engineering estimates. Estimates were thrown out quickly with little research. It was easy to say how long something would take because buffers allowed room to make mistakes. People weren’t thorough up front. And we suffered as a result.
They Aren’t Actual Estimates
It’s ironic that the reason to add a buffer is to make sure that an estimate is more accurate, when instead, it leads to even more inaccuracy of estimates. It actually makes Product development even more variable and erratic.
Estimates + random cushion = not an actual estimate
A buffer is like automatically admitting that your estimate is wrong. This then leads to a common behavior of everyone on the team adding buffers on top of buffers and it’s over-inflated. It turns into a “they added a buffer, so why shouldn’t we?” round-robin.
Buffers turn estimates into a never-ending struggle to get an actual believable estimate.
That’s exactly what happened to us at KISSmetrics. We were releasing a new analytics report that we knew would be really useful to customers. If customers loved the report, it would make us significantly more revenue per customer. It was called a Power Report and was going help us get larger customers faster than we were.
We took an extra 3 months than we had originally estimated to release the feature. Why?
Engineering added a buffer to communicate with the Product team, which led to Product adding a buffer to communicate to the Sales team. And the Sales team? Well, they were already adding their own buffer of a week to any ship date they were told. That’s 3 buffers on top of each other.
The worst outcome of all, we still missed that date. Buffers failed, and so did we.
Lack of Discipline
Buffers allow teams to be less disciplined in technical research. There’s always room for “things you may have forgotten” or “the unexpected.” Using buffers haphazardly is another form of admitting that you haven’t thought through potential scenarios and potential issues.
When we use buffers, we accept sloppy thinking. When we have sloppy thinking in our spec docs, we’re in a world of hurt later on.
It’s not really about the estimates or if they’re accurate. It’s whether or not our processes allow for sloppy thinking. Sloppy thinking in one area means the rest is also sloppy.
Teams win by getting disciplined at the very beginning and staying disciplined.
At KISSmetrics, we weren’t digging in with technical research up front and exploring potential challenges. We would estimate then just slap a buffer in, hoping we were right. But most of the time, we weren’t.
This was the root cause of why our estimates for the Power Report were off by 3 months.
Because of our comfort with engineering buffers, we had underestimated just how complex the feature actually was to build. We didn’t dig in deep enough early on because we thought the buffers would cover us.
Our sloppy estimates didn’t just affect the Power Report feature. We ended up with Product delays on other new features because we were still building out Power Report.
Lack of Accountability
Buffers instill a culture of unaccountability. Engineering adds buffers. Product adds buffers. Sales adds buffers. And yet those buffer-based deadlines don’t get hit.
Using buffers is like saying you’re wrong right away. That creates a culture where it’s OK to be off on your estimates. It’s OK to not be thorough and really dig in at the start. And it’s OK to not know what went wrong.
Based on the ~100 responses to my question, I learned that teams are using buffers between 10% and up to 5 times an initial estimate. If an estimate is being doubled, quadrupled or multiplied by five, why is it even needed in the first place?
With the automatic “get out of jail card” that buffers provide, you never truly have to go deep in the first place. And you never need to go deep to check and see if your estimates were right.
When we missed our deadlines at KISSmetrics back when we’d used buffers, we weren’t going back to see why. We lacked accountability.
How do I do estimates today?
I’ve been developing software now for over fourteen years and there’s a better way.
A way to make the right tradeoffs and Product prioritization decisions. It’ll require you to eat your veggies, not just that delicious junk food.
You won’t have to rely on story points for your time estimates. You will have accountability across your whole team. And most important of all, you’ll ship Product that customers actually love to use.
Got a story about engineering estimates? Share it with a comment below.