Request Access to Climatiqs data tool kit.
All posts
July 25, 2025

Data normalization at Climatiq

      Data normalization at Climatiq

      This is the second of two blog posts on emission factor normalization. In this blog, we will explain what the normalization process looks like at Climatiq specifically. To get a broader understanding of what normalization is and why it’s necessary in carbon accounting, check out our first blog post

      There are hundreds of datasets within the carbon accounting space. They come from a range of sources spanning across the globe, such as governmental bodies, NGOs, and academic institutions—each with thousands of emission factors which often relate to a specific geography or industry.

      At Climatiq, we have normalized over 200,000 (and counting!) emission factors from different datasets and for different regions by adjusting or removing variables that could introduce bias or hinder comparability. This allows users to quickly build robust emissions tracking tools on top of our database, avoiding the time and cost involved with building a bespoke dataset.

      The normalization process involves aligning naming (unifying wording or spelling), standardizing units (e.g. by converting weight from pounds (lbs) to kilograms), and clearly labelling the scope of emissions being measured (such as whether they cover a product’s entire lifecycle or just part of it). This allows our users to access data in an automated system without having to worry about comparability and compatibility.

      By normalizing emission factors into our cohesive schema, we provide a trusted, transparent, and accessible resource which makes it easy to use emission factors from various sources and apply them in programmatic calculations.

      How does Climatiq normalize data?

      • Emission factors are classified by sector, category, and region, making it simple to search for and compare data across industries, activities, and geographies. We have developed our own classification system with sectors and categories to organize emissions activities. This is because there is no single, universally adopted taxonomy for emission factors, and any existing systems are too broad, inconsistent, or not detailed enough for our needs. Our taxonomy is based on industry relevance and data granularity, and aligns emission factors with real-world business activities. 
      • Year information is clearly specified, distinguishing between the year of publication and the year for which the data is applicable, which can differ. This is because there is usually a lag to allow for the data to be collected and measured before its publication.
      • Activity units are made consistent (kgCO2e per activity unit) to eliminate confusion and variability stemming from differing units across datasets and allow for programmatic calculations. For example, if one data provider uses kgCO2e per mile travelled, we convert this to kilometers to keep one consistent unit across our database. This avoids simple errors that can easily arise in global reporting.
      • Lifecycle assessment (LCA) stage (the phase in the life of a product, process, or service during which environmental impact is assessed) is reported to define boundaries and applicability. In many cases, LCA stage is not explicitly stated in the emission factor data methodology. Our team carefully analyzes the documentation to clarify what’s included in the emission factor. 
      • We convert all data to kilograms of CO2e for each greenhouse gas (carbon dioxide, methane, and nitrous oxide). This ensures the data accurately reflects the specific impact of each gas in relation to each other and enables precise calculations of CO2e. If no gas breakdown is shown, it means the original source did not provide that information, and the factor is marked with a “partial factor” quality flag.

      Normalization is part of our wider data ingestion process, which also involves analyzing metadata, adding quality flags, and expert manual review by our Science and Data team of PhD-educated scientists and carbon accounting experts.

      All of this is done to allow our customers to:

      • Ensure comparability across 80+ emission factor datasets and 40+ trusted sources
      • Switch and combine datasets from various providers, enhancing coverage without the risk of errors
      • Create automation tools that are compatible with any data provider
      • Implement reliable, up-to-date emission factors that scale with their product and offering

      This harmonization allows users to quickly build reliable, globally applicable emissions tracking capabilities. Rather than spending months sourcing and cleaning data, they can simply integrate Climatiq’s database and trust that emission factors are reliable, current, and maintenance-free, saving time and resources for wider product development.

      A few examples from our normalization process

      BEIS—emission factor calculation method 

      Some sources provide the values for CO2, CH4, and N2O in their actual weight (in kilograms), while others report them in kilograms of CO2e (CO2 equivalent—which reflects their climate impact rather than their actual mass). For example, in the BEIS dataset, the values for CO2, CH4, and N2O are given in kg CO2e. This means the data has already been adjusted using Global Warming Potential (GWP) factors, which show how much each gas contributes to global warming compared to CO2. 

      However, since we collect and report the actual amount of each individual gas, we had to convert the CO2e values back by dividing them by their respective GWP values based on the IPCC method used.

      Electricity data from 3 sources: Green-e, AIB, Electricity Info

      We normalized electricity data from these three different sources to use the same name, activity_id, and lca_activity because they all refer to the same type of activity. This is done to ensure consistency between datasets and allow for comparability.

      The original units:

      • Green-e reports emissions in lbs/MWh
      • AIB and Electricity Info report emissions in g/kWh

      To make the data comparable, all values were converted to a common unit: kg/kWh.

      Fuel naming conventions

      In the original EPA dataset, the term “Coal and Coke” is used. We standardized the name to “Coal - anthracite” to match naming conventions in other datasets and to ensure it's searchable in the Data Explorer along with similar activities. 

      Normalization—the crucial step for comprehensive emission factor data

      Managing huge amounts of emissions data is a question of balancing breadth and consistency. If we only used a single data provider, we'd have great consistency but our users would be far more limited by data coverage and granularity. Many of our users operate globally or build tools to measure emissions across entire supply chains and business activities, so they require data that’s both wide-ranging and compatible with their software.

      Sourcing this data is already a massive task in itself, before even tackling the complex work of normalizing the data to make it comparable and usable. Our database bypasses this, ensuring you have access to the widest range of accurate emission factor data—all in one place—without the consistency issues of a patchwork dataset.

      To find out more about our data and calculations, check out our Methodology Hub.

      LAST EDITED ON

      July 28, 2025

      Tags