CrUX on BigQuery
Learn how CrUX data is structured on BigQuery.
Introduction
The raw data behind the Chrome UX Report (CrUX) is available on BigQuery, a database hosted on the Google Cloud Platform (GCP).
CrUX on BigQuery allows users to directly query the full dataset going back to 2017, for example to analyze trends, compare web technologies and benchmark domains.
The data is structured by monthly release, as well as a number of summary tables to provide simple access for querying the data. These are documented further below.
The BigQuery data is the basis of the CrUX Dashboard, which allows you to visualize this data without writing SQL queries.
Accessing the dataset in GCP
Using BigQuery requires a GCP project and basic knowledge of SQL. The CrUX dataset on BigQuery is free to access and explore up to the limits of the free tier, which is renewed monthly and provided by BigQuery. Additionally, new GCP users may be eligible for a signup credit to cover expenses beyond the free tier. Note that a credit card must be provided for the GCP project, see Why do I need to provide a credit card?.
If this is your first time using BigQuery then follow below steps to set up a project:
- Navigate to Google Cloud Platform.
- Click Create a Project.
- Give your new project a name like “My Chrome UX Report” and click Create.
- Provide your billing information if prompted.
- Navigate to the CrUX dataset on BigQuery
Now you’re ready to start querying the dataset.
For example queries see the getting started guide.
Project organization
CrUX data on BigQuery is released on the second Tuesday of the following month. Each month is released as a new table under chrome-ux-report.all
. There are also a number of materialized tables which provide summary statistics for each month.
- chrome-ux-report
Detailed table schema
Raw tables
The raw tables for each country and the all
dataset have the following schema:
origin
- effective_connection_type
- form_factor
- first_paint
- first_contentful_paint
- largest_contentful_paint
- dom_content_loaded
- onload
- first_input
- delay
- layout_instability
- cumulative_layout_shift
- experimental
- permission
- notifications
- time_to_first_byte
- interaction_to_next_paint
- popularity
- permission
Materialized table schema
Materialized tables are provided for easy access to summary data by a number of key dimensions. No histograms are provided, instead performance data is aggregated into fractions by performance assessment and the 75th percentile value. A set of example rows from the metrics_summary
table are shown below as an example:
yyyymm | origin | fast_lcp | avg_lcp | slow_lcp | p75_lcp |
---|---|---|---|---|---|
202204 | https://example.com | 0.9056 | 0.0635 | 0.0301 | 1600 |
202203 | https://example.com | 0.9209 | 0.052 | 0.0274 | 1400 |
202202 | https://example.com | 0.9169 | 0.0545 | 0.0284 | 1500 |
202201 | https://example.com | 0.9072 | 0.0626 | 0.0298 | 1500 |
This shows that in the 202204 dataset, 90.56% of real-user experiences on https://example.com
met the criteria for good LCP, and that the coarse 75th percentile LCP value was 1,600ms. This is slightly slower than previous months.
Four materialized tables are provided:
metrics_summary
- key metrics by month and origin
device_summary
- key metrics by month, origin and device type
country_summary
- key metrics by month, origin, device type and country
origin_summary
- a list of all origins included in the dataset
metrics_summary
The metrics_summary
table contains summary statistics for each origin and each monthly dataset:
yyyymm
- Month of the data collection period
origin
- URL of the site origin
rank
- Coarse popularity ranking (as of March 2021)
[small|medium|large]_cls
- fraction of traffic by CLS thresholds
[fast|avg|slow]_<metric>
- fraction of traffic by performance thresholds
p75_<metric>
- 75th percentile value of performance metrics (milliseconds)
notification_permission_[accept|deny|ignore|dismiss]
- fraction of notification permission behaviors
[desktop|phone|tablet]Density
- fraction of traffic by form factor
[_4G|_3G|_2G|slow2G|offline]Density
- fraction of traffic by effective connection type
device_summary
The device_summary
table contains aggregated statistics by month, origin, country and device. In addition to the metrics_summary
columns there is:
device
- Device form factor
country_summary
The country_summary
table contains aggregated statistics by month, origin, country and device. In addition to the metrics_summary
columns there is:
country_code
- Two-letter country code
device
- Device form factor
origin_summary
The origin_summary
table contains a list of all origins in the CrUX dataset; it is updated monthly with the latest list of origins in the dataset and has a single column: origin
.
Experimental dataset
Tables in the experimental dataset are exact copies of the default YYYYMM
tables, but they make use of newer and more advanced BigQuery features like partitioning and clustering that enable you to write faster, simpler, and cheaper queries.
Country
The experimental.country
dataset contains aggregated data from the country_CC
datasets with an additional yyyymm
column for the dataset date. The schema is identical to raw tables with the addition of the date and country_code
columns, allowing for country-level comparison over time queries to be executed without joining the monthly tables.
Global
The experimental.global
dataset contains aggregated data from the all
dataset with an additional yyyymm
column for the dataset date. The schema is identical to raw tables with the addition of the date, allowing for comparison over time queries to be executed without joining the monthly tables.