August 6, 2020
Lighthouse returns several scores on the scale of 0 to 100 to portray how well a site or application is doing within a given category (Performance, Accessibility, SEO, Best Practices and Progressive Web Applications). The Performance Score is not only a high-level portrayal of site speed but also an essential factor in site ranking, since PageSpeed Score is now the same as the Performance Score.
Recently, with the release of Lighthouse 6, the algorithm behind the Performance Score changed significantly. The Performance Score is now calculated differently for both mobile and desktop devices, for some resulting in significant changes to their scoring.
In this article, we explain how the Performance Score is calculated and what metrics to track to avoid regressions in your overall site speed.
In Lighthouse 6, the score is measured with 6 metrics, removing First Meaningful Paint and First CPU Idle, and replacing them with Largest Contentful Paint, Total Blocking Time and Cumulative Layout Shift.
Each one of the metrics contributing to the overall score has its weighting, putting more emphasis on the new generation of measurements focusing on user experience:
|Metric Name||Metric weight||Metric recommended range|
|First Contentful Paint||15%||≤ 2s|
|Speed Index||15%||≤ 4.3s|
|Largest Contentful Paint||25%||≤ 2.5s|
|Cumulative Layout Shift||5%||≤ 0.1|
|Time to Interactive||15%||≤ 3.8s|
|Total Blocking Time||25%||≤ 300ms|
The Lighthouse 6 scoring algorithm is visibly prioritising recently announced Core Web Vitals: a set of metrics portraying loading, interactivity and visual stability. Core Web Vitals are Total Blocking Time, Cumulative Layout Shift and Largest Contentful Paint.
While all metrics contributing to the Performance Score are essential to track separately, Core Web Vitals account for 55% of the score weighting, making not tracking them hard to justify.
The score calculation does not end at collecting the metrics mentioned above. Lighthouse uses HTTP Archive data to determine two control points for each metric and shape a log-normal curve that will be used to determine where the value falls on the distribution chart:
The Time to Interactive chart above illustrates that there’s a narrow range of values that will be classified as good (namely, below 4s).
Your metric readings are effectively graded based on real website data (or, in other words, compared to how well sites perform worldwide in general).
This approach ensures the metrics are put in necessary context before establishing which category they fall into—poor, needs improvement or good. Before Lighthouse 6, the reference data did not include desktop references, which resulted in desktop Performance Scores being artificially inflated. In Lighthouse 6, mobile and desktop score calculations use different reference data, which will result in disparities in the Performance Score for desktop devices when switching from versions 5 to 6.
To summarise, your Performance Score will depend on six different metrics and how those measurements compare to real, historical website data collected by HTTP Archive.
There is an inherent variability built into performance testing that comes from the characteristics of the web and networking technologies. Often, this variability can be a source of confusion, frustration and lack of trust in performance monitoring tools.
While outlier results can be removed by test verification engines, some variance in monitoring data will persist and cannot be bypassed by any performance tool.
It’s critical to understand that certain levels of variability are to be expected. Even running consecutive tests might result in slightly different metrics, depending on server response time, and then how fast resources are fetched, parsed and executed. The Performance Score will be especially prone to these fluctuations since it’s calculated based on paint and runtime specific performance metrics.
Measurements such as Largest Contentful Paint, Cumulative Layout Shift and First Contentful Paint will be affected by delays in obtaining resources—especially ones that block paints. In consequence, the Performance Score might be different each time due to networking variabilities and how they influence specific metric values.
It’s always worth it to investigate sustained significant Performance Score and metric changes. Most of the time, looking at one-off differences is unhelpful and statistically irrelevant.
At first glance, it might seem plausible that if speed monitoring tools are using Lighthouse, they should be returning identical scores. In reality, performing tests with Lighthouse is only a fraction of the story. Lighthouse tests can be conducted in a variety of ways—for example, using bandwidth and device throttling as the test is being run or applying specific simulations after the metrics are collected, which results in different readings.
The conditions the tests are run in will play a significant part in the results, too. The geographical location of where tests originate from, network and emulation speed, as well as CPU and GPU capabilities of the hardware, will produce vast disparities in reporting.
It’s impossible to compare Lighthouse results between different services. Only when recording in a stable environment, under the same conditions, we’ll be able to draw confident conclusions about our site’s performance.
If you’d like to know more about the differences in Lighthouse tests, read our guide on differences between Calibre and PageSpeed Insights. Expecting the same results from various tools is one of the common mistakes teams should avoid when monitoring performance.
A significant first step in improving your Performance Score is continuously monitoring it in a variety of user scenarios. Once we can quantify the quality of user experience, we will be better equipped to act on this information.
Each Lighthouse test provides actionable recommendations on how to improve your speed (which you can find them in Snapshot → Performance tab in Calibre). Now that you know which metrics contribute to the Performance Score, you can also investigate them individually to make targeted improvements.
Based on the type of metrics involved in the score calculation, it will be worth to invest time in the following:
Armed with both the understanding of the scoring algorithm and the most common areas to focus on will result in tangible improvements in your speed metrics, including the Performance Score.
Be notified about new product features, releases and performance research.