Guides
Understand
Data Versioning: How Climatiq Handles Updates to Emission Factors

Emission Factor Data Versioning

Climatiq implements two versioning systems: API Versioning and Data Versioning. API versioning focuses on which parameters the API accepts and returns and how these may change over time. Data Versioning, that we'll focus on here, deals with the versioning of the underlying emission factor data. This is particularly relevant for those using endpoints nested under the /data URL, or providing selector overrides for other endpoints.

Calculation Endpoints

For endpoints that do not explicitly use selectors, such as “cloud”, “intermodal”, etc., data versioning is not relevant. These endpoints will be updated as described in the API Versioning page.

There is currently no way to lock these endpoints into a specific set of emission factors. This means that you are not guaranteed a calculation will use the same emission factor month over month.

This does not mean you will not be able to reproduce specific calculations if required. To enable the reproduction of calculations, Climatiq can make available the ID of the emission factor used, the final activity value (e.g. of energy or distance) which was used to calculate the emissions reported, and any transformations applied to the factor during final calculation (e.g. applying a Radiative Forcing Index to a flight leg in our intermodal endpoint). This allows manual recalculation of the estimation for audit or other retrospective purposes.

New preview endpoints

There is currently an absence of the information described above in some of the preview features. This is because the complexity of the calculations performed by our new emissions calculators means that we are unable to explain the calculations adequately using the existing model. We are working on a new method to explain our calculations. If you have some requirements for calculation transparency please get in contact (opens in a new tab).

Now that we've talked about what won't be affected, let's see what will be:

Climatiq's Data Changes

The database underpinning Climatiq is often updated. Emission factors are changed for many reasons, such as when a source publishes errata, new data quality flags are added that apply to existing emission factors, or when a change is made to a factor's metadata such as activity_id or source_lca_activity value.

As an illustration of the kind of change to data which necessitates having a data version, consider an example in which your application calls the Climatiq estimation endpoint using an activity_id, a source, a year and you have specified that you do not want to accept any factors with a data quality issue of any kind (allowed_data_quality_flags []). If Climatiq later notices a data quality issue with the factor selected by your implementation with these criteria, we may add a suspicious_homogeneity flag to the emission factors, in which case your estimations would start failing as no emission factor is available that meets your search criteria.

Updates like these often require you to make changes in your application. We are introducing the concept of a data version, to ensure that you can choose when to opt-in to changes like the above.

The ways data changes

Data in the Climatiq database can change in three ways.

  • New emission factors can be added
  • Existing emission factors can be modified. This could for example be if the source provides errata, or Climatiq introduces a new data quality that applies to an existing factor.
  • (Rarely) an emission factor is deemed to be of such poor quality, so that we will need to mark them as deprecated.

When a factor is modified, it is not deleted - rather it is replaced with an emission factor that is identical, apart from the changes. This new replacement factor also has a new ID. So whenever an emission factor is changed in any way, the ID is also changed.

Conceptually, you can think of the modification of an emission factor as two discrete steps:

  • The addition of a new (almost identical) emission factor, with a new ID
  • The removal of the old emission factor.

Data Versions

The Data version will be versioned with two numbers, such as 3.3 or 4.6. This versioning scheme mirrors the major.minor versioning scheme that software libraries often use when versioning software. We will refer to the leftmost number as the major point from now on, and the rightmost number as the minor point.

Diagram showing the axes of data version growth

Climatiq periodically makes data releases. These generally contain both modifications and additions.

When Climatiq releases new data, we will:

  • Create a new minor release for every major release. This minor release will include all emission factors additions and modifications. Modifications happen, by keeping the old emission factor, but also adding a new corrected emission factor as well.
  • Create a new major release. This will include all emission factor additions and modification. It will also remove older versions of modified emission factors, so only the most up-to-date emission factors are available.

This means that minor versions only ever get additions and corrections, while removals only happen in major versions.

An example When emission factor {activity_id: "power", id: "1234", data_quality_flags: []} has been modified, to e.g. have another data quality flag, the next minor version will contain two emission factors:

  • {activity_id: "power", id: "1234", data_quality_flags: []} (the original)
  • {activity_id: "power", id: "9876", data_quality_flags: ["flag"]} (the new addition)

If you upgrade to a newer minor version, and your query matches both emission factors, Climatiq will pick the one from the newest data version. If your query does not match the new addition, you will continue to use the old emission factor.

The major version will only contain the newly added emission factor, and not the original:

  • {activity_id: "power", id: "9876", data_quality_flags: ["flag"]}

This means that when upgrading minor versions, you will always be able to find an emission factor that you previously found - but you might be upgraded to a newer version of the emission factor if that also matches your query.

It also means that if you want to be certain that you're not using any emission factors that are not up-to-date, you should update your major version (and application code) occasionally.

Alright, now let's see how you can use this concept of data versions to decide how you want your app to behave on data changes.

Selectors

In some situations we need to be able to tell which data version you are using. This is when:

  • You are using the /search endpoint
  • You are using the /estimate endpoint, or using a Selector to override emission factor selection inside a calculation endpoint, such as the cloud endpoints. In these cases, you must specify either an id or a data_version parameter.

A data_version must be provided as a string value like "1.1", or "^3" - we'll talk about what they mean right after this.

If you do not provide a data_version, you will get an error that looks like this

{
"error": "bad_request",
"error_code": "invalid_input",
"message": "Selector should either provide an 'id', OR a 'data_version' and an 'activity_id'. It must not provide both. The latest 'data_version' is '3.3'"
}

You can specify the data version one or two ways:

Specifying a full version

If a data_version contains both major and minor (e.g. "8.12"), you are always looking at the same immutable view of the underlying data. This means if you make the same request (to the same version of the API) you will get the same result, private factors notwithstanding.

Using this form is particularly important while producing an accounting report, you need the first calculations you make to be made using the same underlying data as the last calculation.

Specifying a major-version compatible version

If a data_version preceded by a caret (eg. ^8) is provided, Climatiq's estimate API will provide a “version 8 compatible” set of data. This is the same as the latest minor version that belongs to the major version 8. Specifying a major-compatible version means that you will continue to receive new emission factor updates, but no emission factors will be removed until you manually upgrade your major version.

This means that:

  • Depending on your query, the emission factors selected for your estimates might change, if a newer more accurate factor is released
  • Your queries or estimates will never stop working with the same input, as no emission factor is ever modified.

Use this form when the app should have the most up-to-date and correct data, and you are okay with this result changing over time.

Upgrading minor versions

If you are specifying a full version, upgrading minor versions is a fairly low-risk thing to do.

  • You will get access to newer emission factors
  • No queries that used to work, will stop working
  • Only upgrading minor versions means there is a risk of using deprecated and less-precise emission factors that have been removed in a later major version.

We recommend when you upgrade that you upgrade to the latest minor version.

Every time we release a major version, any backwards-compatible changes are applied as minor releases to every previous major version, so even without upgrading major versions your app will still receive most of the data updates.

If you don't require reproducible calculations, we recommend you use only a major version and automatically receive minor updates.

For more concrete guidance on how to upgrade major versions, see here.

Uniquely identifying emission factors

Each emission factor in the Climatiq database has a unique id (id). Whenever an emission factor is changed in any way, this id is changed.

For emission factors that have not changed, the id will not change between data sets. This means that across five data versions

  • Some emission factors (say for example the UK BEIS emission factor for electric cars for 2019) might be updated three times (perhaps due to methodology updates from the source) and will have three different ids
  • Many emission factors will not have needed changes, and thus have the same id in each data version

Just like today, for each estimate Climatiq performs, we will return a id, allowing you to uniquely identify the emission factor used.

When selecting an emission factor, you must use either an id or a data_version with an activity_id. When using data_version you are either select from the exact same set of emission factors, or from a set including additions but without removals. When using id you will always get the same emission factor, even if newer factors are available or the factor has been deprecated.

Data change logs

When releasing a new data version, we will provide a changelog listing which factors were changed, and whether they were deprecated and replaced, or simply added to or removed from the data_version in question. You can then see if you are using any of these emission factors, and what needs to change for you to migrate to the latest data version. You can see the data version changelog here.