Data Quality
Not all data is a perfect fit for the actual activity, which is why the concept of “Data Quality” exists. The Climatiq API follows the PACT methodology for calculating data quality ratings, where the rating is an integer between 1-5. Each component of the PCF is given its own rating. Where it’s not possible to determine a rating Climatiq applies a default value of “5”. You should override the rating are able to make a better judgment.
Data Quality is always split up into three different indicators:
- Technological Representativeness: How well does the activity used in the PCF match the actual process taking place or the text provided as an input.
- Geographical Representativeness: How well does the region of the emission factor match the region the process took place in.
- Temporal Representativeness: How well does the year the emission factor is valid for match the year the process took place in.
Below are the tables based on the PACT Methodology tables that explain when which data quality ratings are appropriate for the different indicators:
Technological representativeness
| Data Quality Rating | Description |
|---|---|
| 1 | The dataset has been created based on data reflecting the exact technology employed (i.e. plant specific process/equipment data for the plant/equipment where the product has been manufactured). Note: this quality score can be achieved only in case of use of primary data |
| 2 | The dataset has been created based on data reflecting the company-specific and same technology to the one employed for the actual manufacturing (i.e. same technology, the company/site specific but not necessarily plant specific – it could be an average if several company/site specific data are available). Note: this quality score can be achieved only in case of use of primary data |
| 3 | The dataset has been created based on data reflecting an average for an equivalent technology to the one employed for the actual manufacturing (i.e. same technology, but not company specific). Note: this is the maximum score achievable with secondary data |
| 4 | The dataset has been created based on data reflecting a technological proxy (i.e. similar but not same technology, irrespectively if based on averages or supplier specific data) |
| 5 | The dataset has been created based on different or unknown technology vs technology actually employed |
Geographical representativeness
| Data Quality Rating | Description |
|---|---|
| 1 | The dataset has been created based on data reflecting the country subdivision (if applicable) or country in which the product has been manufactured. Country subdivision list: States in the USA, Provinces in Canada, Federative units in Brazil, Provinces in Argentina, States in Mexico, Republics in Russia, States in India, Provinces in China, States in Australia |
| 2 | The dataset has been created based on data pertaining to the country, in which the product has been manufactured. The area where the dataset is generated is valid for the geographical area where the site is located. Example: The site is in California and the dataset is a US average |
| 3 | The dataset has been created based on data pertaining to the geographical region (e.g., Europe, Asia, North America), in which the product has been manufactured. The area where the dataset is generated is valid for the geographical area where the site is located. Example: The site is in Spain and the dataset is a European average |
| 4 | The dataset has been created based on global averages. Example: The site is in Japan and the dataset is a global average |
| 5 | The dataset has been created based on data with a geographical scope which is either unknown or pertaining a country, or region not including the site in which the product has been manufactured. Example: In absence of a global average, the dataset geographical applicability is unknown. |
Temporal / Time representativeness
| Data Quality Rating | Description |
|---|---|
| 1 | The difference between the year of the dataset and estimate of the PCF is ≤1 year |
| 2 | The difference between the year of the dataset and estimate of the PCF is >1 year and ≤2 years |
| 3 | The difference between the year of the dataset and estimate of the PCF is >2 years and ≤3 years |
| 4 | The difference between the year of the dataset and estimate of the PCF is >3 years and ≤4 years |
| 5 | The difference between the year of the dataset and estimate of the PCF is >4 years or unknown |
How Climatiq assigns data quality ratings
Each part of the PCF can consist of one or more activity types. Each type has its own set of three data quality ratings. Depending on the indicator and the type of activity, Climatiq applies different heuristics to determine a data quality rating. In any event, the available data points, for example the manufacturing location and year, are compared with the emission factors matched. Temporal and geographical representativeness are the same across all activity types. Technological representativeness depends on the activity type, e.g. electricity consumption, combustion, emission factor, etc.
Technological representativeness
Determining the technological representativeness depends on the activity type in question, e.g. electricity, emission factor, etc. The following sections explain in detail how technological representativeness is rated for each type.
Autopilot matches
If the emission factor was found via Autopilot, for example for simple components for which no emission factor selector was specified by the user, we assign a technological representativeness rating of five.
Electricity
Electricity emissions usually consist of multiple components (transmission & distribution (T&D), transmission & distribution well-to-tank (T&D WTT), generation and generation well-to-tank). Each of these component emission factors is assigned a data quality rating of its own.
For generation components, if you specify the emissions intensity (CO2e / kWh) then we apply the highest rating of one for geographical, temporal and technological representativeness.
Otherwise, geographical and temporal data quality ratings are assigned according to the sections above and technological representativeness is assigned as follows: four for well-to-tank emission factors and direct connections and two for regular grid connections.
Fuel combustion
Emissions from fuel combustion tend to be very similar between technologies and regions for a given fuel; emissions vary mainly according to the carbon / energy content of the fuel and how much is fully / partially combusted. By default, we assign a data quality rating of three.
Heat & steam
Like electricity, heat & steam emissions usually consist of multiple components (transmission & distribution (T&D), transmission & distribution well-to-tank (T&D WTT), generation and generation well-to-tank). Each of these component emission factors is assigned a data quality rating of its own.
For generation components, if you specify the emissions intensity (CO2e / kWh) then we apply the highest rating of one for geographical, temporal and technological representativeness.
If neither the energy source, nor the CO2e/kWh value have been given, we assign a four or a five, depending on the emission factor used matching the region of the manufacturing site.
If an energy source but no CO2e/kWh value has been specified, we assign a four. Otherwise, with CO2e/kWh specified, we assign a two.
Emission factor selector
If you specified an emission factor by selector, we rate the geographical and temporal representativeness as described in the previous sections. We have no way to gauge the technological representativeness however, and thus assign a five. Users are encouraged to specify the technological representativeness themselves according to PACT and using the method described in section manually specifying data quality.
Geographical representativeness
PACT only allows a data quality rating of one where the data matches the activity at a sub-country level. The PCF endpoint currently only supports countries so the maximum rating is two.
If the country code of the manufacturing location equals that of the emission factor, the data quality rating is two.
If the emission factor is valid globally, the data quality rating is four.
Otherwise, we assign a data quality rating of five.
Temporal representativeness
For temporal representativeness, we compare the manufacturing year of the product to the year of the emission factor. The difference between the two determines the data quality rating, as represented in the following table.
| Year difference | Data quality rating |
|---|---|
| 0 - 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5+ | 5 |