CrUX on BigQuery

Learn how CrUX data is structured on BigQuery.

Published on Thursday, June 23, 2022 • Updated on Thursday, November 10, 2022

Introduction

The raw data behind the Chrome UX Report (CrUX) is available on BigQuery, a database hosted on the Google Cloud Platform (GCP).

CrUX on BigQuery allows users to directly query the full dataset going back to 2017, for example to analyze trends, compare web technologies and benchmark domains.

The data is structured by monthly release, as well as a number of summary tables to provide simple access for querying the data. These are documented further below.

The BigQuery data is the basis of the CrUX Dashboard, which allows you to visualize this data without writing SQL queries.

Accessing the dataset in GCP

Using BigQuery requires a GCP project and basic knowledge of SQL. The CrUX dataset on BigQuery is free to access and explore up to the limits of the free tier, which is renewed monthly and provided by BigQuery. Additionally, new GCP users may be eligible for a signup credit to cover expenses beyond the free tier. Note that a credit card must be provided for the GCP project, see Why do I need to provide a credit card?.

If this is your first time using BigQuery then follow below steps to set up a project:

Navigate to Google Cloud Platform.
Click Create a Project.
Give your new project a name like “My Chrome UX Report” and click Create.
Provide your billing information if prompted.
Navigate to the CrUX dataset on BigQuery

Now you’re ready to start querying the dataset.

For example queries see the getting started guide.

Project organization

CrUX data on BigQuery is released on the second Tuesday of the following month. Each month is released as a new table under chrome-ux-report.all. There are also a number of materialized tables which provide summary statistics for each month.

chrome-ux-report
- all
  - YYYYMM
- country_CC
  - YYYYMM
- experimental
  - country
  - global
- materialized

Detailed table schema

Raw tables

The raw tables for each country and the all dataset have the following schema:

origin
effective_connection_type
form_factor
first_paint
first_contentful_paint
largest_contentful_paint
dom_content_loaded
onload
first_input
- delay
layout_instability
- cumulative_layout_shift
experimental
- permission
  - notifications
- time_to_first_byte
- interaction_to_next_paint
- popularity

Materialized table schema

Materialized tables are provided for easy access to summary data by a number of key dimensions. No histograms are provided, instead performance data is aggregated into fractions by performance assessment and the 75th percentile value. A set of example rows from the metrics_summary table are shown below as an example:

yyyymm	origin	fast_lcp	avg_lcp	slow_lcp	p75_lcp
202204	https://example.com	0.9056	0.0635	0.0301	1600
202203	https://example.com	0.9209	0.052	0.0274	1400
202202	https://example.com	0.9169	0.0545	0.0284	1500
202201	https://example.com	0.9072	0.0626	0.0298	1500

This shows that in the 202204 dataset, 90.56% of real-user experiences on https://example.com met the criteria for good LCP, and that the coarse 75th percentile LCP value was 1,600ms. This is slightly slower than previous months.

Four materialized tables are provided:

metrics_summary: key metrics by month and origin
device_summary: key metrics by month, origin and device type
country_summary: key metrics by month, origin, device type and country
origin_summary: a list of all origins included in the dataset

metrics_summary

The metrics_summary table contains summary statistics for each origin and each monthly dataset:

yyyymm: Month of the data collection period
origin: URL of the site origin
rank: Coarse popularity ranking (as of March 2021)
[small|medium|large]_cls: fraction of traffic by CLS thresholds
[fast|avg|slow]_<metric>: fraction of traffic by performance thresholds
p75_<metric>: 75th percentile value of performance metrics (milliseconds)
notification_permission_[accept|deny|ignore|dismiss]: fraction of notification permission behaviors
[desktop|phone|tablet]Density: fraction of traffic by form factor
[_4G|_3G|_2G|slow2G|offline]Density: fraction of traffic by effective connection type

device_summary

The device_summary table contains aggregated statistics by month, origin, country and device. In addition to the metrics_summary columns there is:

device: Device form factor

country_summary

The country_summary table contains aggregated statistics by month, origin, country and device. In addition to the metrics_summary columns there is:

country_code: Two-letter country code
device: Device form factor

origin_summary

The origin_summary table contains a list of all origins in the CrUX dataset; it is updated monthly with the latest list of origins in the dataset and has a single column: origin.

Experimental dataset

Tables in the experimental dataset are exact copies of the default YYYYMM tables, but they make use of newer and more advanced BigQuery features like partitioning and clustering that enable you to write faster, simpler, and cheaper queries.

Country

The experimental.country dataset contains aggregated data from the country_CC datasets with an additional yyyymm column for the dataset date. The schema is identical to raw tables with the addition of the date and country_code columns, allowing for country-level comparison over time queries to be executed without joining the monthly tables.

Global

The experimental.global dataset contains aggregated data from the all dataset with an additional yyyymm column for the dataset date. The schema is identical to raw tables with the addition of the date, allowing for comparison over time queries to be executed without joining the monthly tables.

Updated on Thursday, November 10, 2022 • Improve article

CrUX on BigQuery

# Introduction

# Accessing the dataset in GCP

# Project organization

# Detailed table schema

# Raw tables

# Materialized table schema

# metrics_summary

# device_summary

# country_summary

# origin_summary

# Experimental dataset

# Country

# Global