June 3, 2020
Often, there are significant challenges in introducing web performance monitoring to teams and organisations. Not only we have to convince decision-makers that performance is worth investing in, but we also need to explain core concepts surrounding site speed. It’s not an easy task. Even when we succeed in establishing a culture of valuing and paying attention to performance, there are still hurdles ahead of us.
Building a performance product—being deeply embedded in the space, continually talking to existing and potential customers as well as fellow performance experts puts us in a unique position. We can hear the most common questions and hurdles from a wide variety of voices. From expertise and feedback, patterns emerge.
In this post, we will describe the most common, high-level reasons why teams struggle to be successful in improving site speed.
When looking at time series charts portraying performance monitoring data, it’s understandable to search for any deviations from regularly reported numbers. A spike downwards or upwards, even if occurring only once, becomes a point of interest and analysis. We want to make sure that there’s no persistent, potentially degrading change affecting the user experience, and rightly so.
Unless we are leveraging deployment tracking and know there has been a recent release that might have caused a change, it’s likely to be noise. To be effective at understanding and analysing speed data, we have to be aware of several, crucial facts about both statistics and performance monitoring:
Knowing that variability is guaranteed, and understanding when a change is persistent will not only make data less confusing to digest but also train your team when it’s time to react. Many people from different backgrounds can view performance dashboards. Not knowing what constitutes a meaningful change that needs investigation can result in false positives, lack of trust in monitoring and cycles spent looking for reasons for performance regressions or upgrades that aren’t there.
A significant amount of performance testing happens after changes already have reached your customers. This approach exposes teams to shipping degradations in user experience way too easily. If testing infrequently, and assuming several tests to confirm there is a regression, it might take days to notice a change that might drastically affect how your audience perceives your product.
While a lot of us are used to performance tooling analysing production environments, there’s a more bulletproof way to test. Leveraging command line interfaces that can be integrated with CI and Pull Request Reviews is a powerful way to ensure regressions don’t happen or happen with awareness.
When teams see the performance impact of their work directly where it happens, they can make conscious choices about what gets released.
A lot of performance work boils down to introducing visibility and transparency to the state of user experience. Limiting this visibility to a single point in time, when changes are already released, hinders the ability of your team to ship excellent products.
Some organisations choose to rely on multiple tools to report performance metrics. There can be value in such an approach, especially when different platforms have mutually exclusive feature sets. Teams well-versed in performance might be able to successfully leverage several tools without introducing confusion and doubt in the reported metrics.
However, in many cases, using multiple tools introduces more trouble than gain. Because each platform has a unique approach to running performance tests (for example, using WebPageTest, Lighthouse, PageSpeed, or a combination of Open Source projects with internal tooling), each will return different results.
By using a few performance monitoring tools, we are sacrificing one of the most critical success factors—having a single, reliable source of truth.
With teams looking at multiple results from multiple platforms and different environments, it’s hard to establish which one becomes the benchmark we base our decisions and work on. Suddenly, tools that couldn’t possibly provide similar results (due to infrastructure set up, geographical regions and underlying technologies; more on this in the section below) are competing with each other for credibility, distracting your team from the most critical objective—improving your speed and user experience.
Having a few tools at work can only be successful when teams have an in-depth understanding of both performance and the internals of the platforms they are using. On top of these prerequisites, organisations have to agree which information comes from what source, to avoid confusion and decisions being made based on the wrong set of metrics. With such an amount of requirements for success, using multiple platforms turns from empowerment to a source of headaches.
A widespread source of confusion, partially described above, is expecting each performance tool to return identical or very close scoring. This might only be possible when using tools from a single vendor. For example, you might be able to see very similar scoring using PageSpeed Insights, web.dev measure portal or Lighthouse Chrome DevTools tab. Google manages all of these tools, and there has been a significant effort put into unifying those results.
The reality of commercial platforms is very different. Firstly, each product might be leveraging various performance utilities to run tests. The most common testing tools include Lighthouse, WebPageTest and including Chrome UX Report (formerly CrUX) results. That fact, combined with additional tooling that might be in place makes the results impossible to compare.
But the differences don’t end here. Contrasting test conditions will have a significant impact on tests results too. The geographical location of where the tests run from, applied network speeds, CPU and GPU capabilities of the test agents plus the presence or lack of simulation settings are all factors influencing the end metrics you receive.
Since each platform runs on different infrastructure and with varying environment settings, it’s impossible to expect the same results.
Using several performance tools and expecting the same results can not only become a source of frustration but also result in a lack of trust in testing platforms. If teams are aware of the reasoning behind variance, they can have more confidence in both observed data and the result of their work.
There are dozens of web performance metrics—tracking all of them would be a difficult task and, most likely, a futile effort. It’s not surprising that often teams are not sure how to define a metric set. What makes this task even harder is metrics getting quietly deprecated and different ones taking their place as best practice recommendation. Knowing which metric is relevant and in what use case is crucial knowledge, although difficult to obtain.
Because of those complexities, some organisations focus on metrics that are easier to analyse, but not necessarily helpful in their context. That’s when it becomes incredibly hard to move the needle on speed and even harder to make recommendations about switching to a more reliable metric.
Researching and defining a core set of metrics that are tracked in each project, no matter its setup or context, will remove a lot of uncertainty. When metrics are diverse in portraying different aspects of user experience: perceived load speed, responsiveness and visual stability of the interface, a team already has an excellent performance baseline in place.
With the understanding of what each type of metric portrays, we can make informed choices on what’s most relevant to track for our context. Adding a randomised set of metrics will result in more confusion rather than visibility and actionability.
Knowing the most common roadblocks in improving speed will make your team more effective at thinking strategically about performance. While establishing a working performance system might be challenging, the benefits of better speed and user experience are well worth the effort.
Be notified about new product features, releases and performance research.