Digital data has a carbon footprint, but it is difficult to track down a carbon cost estimate due to the many complexities of its life cycle, like when data is exchanged between two separate parties. Without estimates for costs, it is challenging to know how to best develop and evaluate methods for carbon emission reduction throughout its life cycle. In our HotCarbon 2023 paper, we outline a technical framework for performing carbon accounting on data and also define new opportunities for carbon reduction that exploit approximation (i.e., "good enough" data) in certain use cases. In doing so, we specify two categories for data: embodied and operational carbon. Embodied carbon corresponds to collecting, tranferring, and storing data. Operational carbon corresponds to using data such as in an AI model or a query. We don't want to waste resources on data that isn't ever used (i.e., data with high embodied costs compared to the operational costs)!
This repository serves as a gentle introduction to how we can account for the carbon footprint of data. We focus on two example components of the data life cycle and show how to produce cost estimates:
- Data collection and compression (
device_power/CAISO_MISO_webcam_experiments.py
) - Data transfer over the internet (
network/route_estimator.py
)
- Python 3
- We use the Intel Power Gadget to obtain the power draw of hardware components. The Intel Power Gadget is compatible with 2nd Generation up to 10th Generation Intel Core processors and has native support for Mac and Windows. There are existing ports for Linux, but we cannot verify their accuracy.
- traceroute
- Clone, fork, or download the repo
- Install Python requirements via
pip install -r requirements.txt
- Install the Intel Power Gadget
We collected publicly-available generation source data from both the Midwest Independent System Operator (MISO) and California Independent System Operator
(CAISO). These raw data can be found in carbon_intensity_pricing/
. We display the data sources in the below table for each applicable figure:
The carbon intensity of each generation source in kg CO2e/MWh = g CO2e/kWh for the Midwest Independent System Operator and the California Independent System Operator are in MISO_carbon_intensity.json
and CAISO_carbon_intensity.json
respectively. Each JSON object contains the links to the references.
Low-carbon sources are given life cycle carbon intensities. This metric includes emissions from manufacturing renewable components, ongoing operations, and disposing of the materials at the end of the component lifetime. We rely on metrics from the National Renewable Energy Lab that calibrate a median life cycle value across multiple published life cycle asessments. For CAISO, the biogas CI is derived from a study where the CI is the average of the 10 plant CIs given in Figure 3 of their paper.
The fossil fuels are given a grid-specific combustion CI derived from historical 2020 data to improve the geographical accuracy of the grid carbon intensities. Combustion CI does not include the rest of the power plant life cycle. You can visualize the data here by selecting output emission rates (lb/MWh) for CO2 equivalent for all fuels at the balancing authority level for 2020. From there, you can click on the MISO and CAISO grids by navigating to the Midwest or California on the map.
Let
There are two plausible ways to compute the energy estimate in the context of the Intel Power Gadget. The first is using the "empirical" log data, which may contain a small I/O delay. To avoid this delay, we chose the "theoretical" energy, assuming that the power was logged at each discrete sampling time defined by the first log time and multiples of the sampling rate. Specifically, let
We can then convert the value to kWh by taking
We compute the carbon emissions via:
where
The below table gives the correspondence between Figure x in the paper and the code file that reproduces that plot.
Figure | Code File |
---|---|
2 | device_power/CAISO_MISO_webcam_experiments.py |
3 | network/route_estimator.py |
Before running device_power/CAISO_MISO_webcam_experiments.py
you must first start the power logging that saves the power for your hardware to a file at a given sampling rate. You will need to configure the sampling rate in milliseconds and the path that your logger will use like in the below image. Note that the GUI sampling rate is distinct from the logging sampling rate, which is the quantity that needs to be configured.
This value will be prompted by device_power/CAISO_MISO_webcam_experiments.py
so please make note of it.
Before running network/route_estimator.py
you need to get an API key from CO2 Signal and you will be prompted for it similar to the above.
The stored results from the 24 short video experiment run on the 2019 Macbook Pro is in experiment_results/results.csv
. All of the plots are in the figs/
directory. The emission intensity estimate for internet data transfer at the time of the experiment is found in network/result.json