5 Common Mistakes Teams Make When Tracking Performance

Profile photo of Karolina Szczur

Karolina Szczur

June 3, 2020

Often, there are significant challenges in introducing web performance monitoring to teams and organisations. Not only we have to convince decision-makers that performance is worth investing in, but we also need to explain core concepts surrounding site speed. It’s not an easy task. Even when we succeed in establishing a culture of valuing and paying attention to performance, there are still hurdles ahead of us.

Building a performance product—being deeply embedded in the space, continually talking to existing and potential customers as well as fellow performance experts puts us in a unique position. We can hear the most common questions and hurdles from a wide variety of voices. From expertise and feedback, patterns emerge.

In this post, we will describe the most common, high-level reasons why teams struggle to be successful in improving site speed.


Relying on single test results

When looking at time series charts portraying performance monitoring data, it’s understandable to search for any deviations from regularly reported numbers. A spike downwards or upwards, even if occurring only once, becomes a point of interest and analysis. We want to make sure that there’s no persistent, potentially degrading change affecting the user experience, and rightly so.

Unless we are leveraging deployment tracking and know there has been a recent release that might have caused a change, it’s likely to be noise. To be effective at understanding and analysing speed data, we have to be aware of several, crucial facts about both statistics and performance monitoring:

  • Variability is inevitable.
  • Two data points don’t make a trend.
  • Data has meaning when it’s analysed with context and over time.

Knowing that variability is guaranteed, and understanding when a change is persistent will not only make data less confusing to digest but also train your team when it’s time to react. Many people from different backgrounds can view performance dashboards. Not knowing what constitutes a meaningful change that needs investigation can result in false positives, lack of trust in monitoring and cycles spent looking for reasons for performance regressions or upgrades that aren’t there.

What we recommend:

  • Make sure that everyone working with monitoring data understands the sources of variability as well as is able to identify trends in metrics.
  • Use deployment tracking to be able to correlate releases to changes in data and disregard monitoring noise.
  • Only look at individual tests rather than time-series metrics to verify specific issues.

Testing only in production

A significant amount of performance testing happens after changes already have reached your customers. This approach exposes teams to shipping degradations in user experience way too easily. If testing infrequently, and assuming several tests to confirm there is a regression, it might take days to notice a change that might drastically affect how your audience perceives your product.

While a lot of us are used to performance tooling analysing production environments, there’s a more bulletproof way to test. Leveraging command line interfaces that can be integrated with CI and Pull Request Reviews is a powerful way to ensure regressions don’t happen or happen with awareness.


When teams see the performance impact of their work directly where it happens, they can make conscious choices about what gets released.


A lot of performance work boils down to introducing visibility and transparency to the state of user experience. Limiting this visibility to a single point in time, when changes are already released, hinders the ability of your team to ship excellent products.

What we recommend:


Using too many monitoring tools

Some organisations choose to rely on multiple tools to report performance metrics. There can be value in such an approach, especially when different platforms have mutually exclusive feature sets. Teams well-versed in performance might be able to successfully leverage several tools without introducing confusion and doubt in the reported metrics.

However, in many cases, using multiple tools introduces more trouble than gain. Because each platform has a unique approach to running performance tests (for example, using WebPageTest, Lighthouse, PageSpeed, or a combination of Open Source projects with internal tooling), each will return different results.


By using a few performance monitoring tools, we are sacrificing one of the most critical success factors—having a single, reliable source of truth.


With teams looking at multiple results from multiple platforms and different environments, it’s hard to establish which one becomes the benchmark we base our decisions and work on. Suddenly, tools that couldn’t possibly provide similar results (due to infrastructure set up, geographical regions and underlying technologies; more on this in the section below) are competing with each other for credibility, distracting your team from the most critical objective—improving your speed and user experience.

Having a few tools at work can only be successful when teams have an in-depth understanding of both performance and the internals of the platforms they are using. On top of these prerequisites, organisations have to agree which information comes from what source, to avoid confusion and decisions being made based on the wrong set of metrics. With such an amount of requirements for success, using multiple platforms turns from empowerment to a source of headaches.

What we recommend:

  • Reduce the number of monitoring tools to one (or two if there are valuable, mutually exclusive features).
  • If using multiple platforms, ensure there’s always a single source of truth for reported metrics and everyone with access to data understands those prerequisites.
  • Be aware that each platform will report different measurements.
  • Choose a tool that fits your entire organisations needs, not just your development team.

Expecting the same results from different tools

A widespread source of confusion, partially described above, is expecting each performance tool to return identical or very close scoring. This might only be possible when using tools from a single vendor. For example, you might be able to see very similar scoring using PageSpeed Insights, web.dev measure portal or Lighthouse Chrome DevTools tab. Google manages all of these tools, and there has been a significant effort put into unifying those results.

The reality of commercial platforms is very different. Firstly, each product might be leveraging various performance utilities to run tests. The most common testing tools include Lighthouse, WebPageTest and including Chrome UX Report (formerly CrUX) results. That fact, combined with additional tooling that might be in place makes the results impossible to compare.

But the differences don’t end here. Contrasting test conditions will have a significant impact on tests results too. The geographical location of where the tests run from, applied network speeds, CPU and GPU capabilities of the test agents plus the presence or lack of simulation settings are all factors influencing the end metrics you receive.


Since each platform runs on different infrastructure and with varying environment settings, it’s impossible to expect the same results.


Using several performance tools and expecting the same results can not only become a source of frustration but also result in a lack of trust in testing platforms. If teams are aware of the reasoning behind variance, they can have more confidence in both observed data and the result of their work.

What we recommend:

  • Use a primary monitoring tool to establish a performance baseline and observe trends.
  • Know that each platform will report different measurements and why.

Focusing on the wrong metrics

There are dozens of web performance metrics—tracking all of them would be a difficult task and, most likely, a futile effort. It’s not surprising that often teams are not sure how to define a metric set. What makes this task even harder is metrics getting quietly deprecated and different ones taking their place as best practice recommendation. Knowing which metric is relevant and in what use case is crucial knowledge, although difficult to obtain.

Because of those complexities, some organisations focus on metrics that are easier to analyse, but not necessarily helpful in their context. That’s when it becomes incredibly hard to move the needle on speed and even harder to make recommendations about switching to a more reliable metric.

Researching and defining a core set of metrics that are tracked in each project, no matter its setup or context, will remove a lot of uncertainty. When metrics are diverse in portraying different aspects of user experience: perceived load speed, responsiveness and visual stability of the interface, a team already has an excellent performance baseline in place.

With the understanding of what each type of metric portrays, we can make informed choices on what’s most relevant to track for our context. Adding a randomised set of metrics will result in more confusion rather than visibility and actionability.

What we recommend:

  • Create a set of core metrics (that cover the load and interactivity of your pages) that are always tracked: Performance Score, Web Vitals (LCP, TBT, CLS) TTI, FCP and TTFB.
  • Tailor secondary metrics depending on your use case (for example, use runtime metrics for JavaScript-heavy applications and sites; use file size metrics to keep an eye on resources being delivered to customers).
  • Don’t track historical metrics such as onDomContentloaded, onLoad or FMP. onDomContentLoaded and onLoad have no relevance in explaining a user experience, and FMP is officially deprecated.

Knowing the most common roadblocks in improving speed will make your team more effective at thinking strategically about performance. While establishing a working performance system might be challenging, the benefits of better speed and user experience are well worth the effort.

Profile photo of Karolina Szczur

Karolina Szczur

Karolina is the Product Design Lead at Calibre. With years of design experience under her belt, she’s responsible for the comprehensive user experience spanning over product, visuals and content. Find her on Twitter or LinkedIn.

Related posts

The Algorithm Behind the Performance Score (and How to Improve Yours)

Don’t miss another article

Be notified about new product features, releases and performance research.