From 58fdef4e288112af95d2640ecce793296a74e0bb Mon Sep 17 00:00:00 2001 From: dagibbs22 Date: Tue, 22 Mar 2022 11:11:18 -0400 Subject: [PATCH] Version 1 2 2 2021 tcl update (#22) * Updated model constants (primarily loss years/model years) for annual update. Also tentatively updated input and output folder dates on s3 from 2021 to 2022. Finally, refactored "gain" folder to "removals" folder (don't know if refactor worked entirely correctly but it did results in "gain" being changed to "removals" in comments and variables throughout the model). * Updated burn area scripts for 2022 model run. * Ran model successfully on 00N_000E locally in Docker, with a combination of constants for v1.2.1 and v1.2.2 in order to make it run to completion (but not necessarily produce reasonable results). Now going to run on a spot machine. * Ran model successfully on 00N_000E locally in Docker, with a combination of constants for v1.2.1 and v1.2.2 in order to make it run to completion (but not necessarily produce reasonable results). Now going to run on a spot machine. * Updating MODIS burned area for 2021 tree cover loss. Only running steps 1-3 for now because need TCL update for step 4. * Updating MODIS burned area for 2021 tree cover loss. Only running steps 1-3 for now because need TCL update for step 4. * Working on burn year update for 2021. * Working on burn year update for 2021. On step 3 now. * Did steps 1-3 of burn year update. Waiting on 2021 TCL for step 4 of burn year update. * Experimenting with tracking memory during model run. I can't figure out how to have a memory profiler that runs concurrently with the model processing (tracemalloc doesn't seem to do the thing I want), so I'm just checking memory at specific points and hopefully that's representative of peak memory usage. * Experimenting with tracking memory during model run. Using my new psutil-based memory tracker in uu.end_of_fx_summary seems to capture peak memory usage. I checked that in a local test and in a r5d.4xlarge instance running 3 tiles. Both using mp_model_extent stage. Haven't tried memory tracking using other stages yet. * Added memory tracking to all model steps used in run_full_model.py. I've only tested it in mp_model_extent.py but it should work similarly elsewhere. Used htop memory graph for validation for psutil memory value. * Changed postgres steps for plantation preprocessing to work in r5d instance. Dockerfile now sets up the correct environment and permissions but the actual database creation and postgis extension addition has to be done manually in the container. I can't figure out how to get the container to launch with postgres running, the database created, or postgis exension added. Anyhow, this is still workable. I tested out adding plantation data to the postgres database and it seemed fine, although I didn't then try running the plantation script to rasterize that. So there could easily be issues with the postgres database; I just don't know. * test commit. changed plantation preparation. * Trying full model run with 2021 TCL. Theoretically will run all the way through, but using 2020 drivers and burned area. Will update burned area through 2021 later on. * Trying full model run with 2021 TCL. Theoretically will run all the way through, but using 2020 drivers and burned area. Will update burned area through 2021 later on. * Changing memory warning threshold to be higher. * Changing processor count for model extent. * Issue with full model run-- not finding s3 folder correctly. Trying with two test tiles first. * Experimenting with full model run again. * Experimenting with full model run again. * 96% memory warning threshold led to unnecessary program termination. Increasing to 99%. That's because the percentage reported by psutil.virtual_memory() doesn't exactly equal what's shown in htop. * Reducing processor count for deadwood/litter creation. * Finishing burn year update for 2021 now that TCL 2021 is available. * Changed directory and names for updated drivers. * Changed directory and names for updated drivers. * Trying to rewindow the tcd, gain, etc. tiles again. * Trying to run aggregation step again. * Trying to run aggregation step again. * Issue with 20N_030W in emissions aggregation step. Never rewindowed and can't tell why. * Issue with 10N_150E net flux rewindowing. Randomly didn't happen. Trying net flux tile aggregation again. * Having rewindowed area tiles from aggregation step on the spot machine made the spot machine think it already had the area tiles, even though it had the rewindowed version. So adding the deletion of all rewindowed tiles after aggregate maps are made. * Wrote a script that downloads the tiles using a list of directories and patterns for a list of tile_ids. It then builds overviews for them for easy viewing in ArcMap. This is just to simplify the process of downloading all the intermediate and final outputs of the model for QCing (don't have to download them individually using Cloudberry). * Running full model, hopefully from start to finish. Using 2021 TCL with drivers and burn year updated. * Running full model, from gross emissions aggregation onwards. Emissions aggregation skipped 20N_030W again, so the model stopped. This happened with the same tile before. Still don't know what the deal is. * Running full model, from net flux aggregation onwards. Net flux aggregation skipped another tile again (I think it was 10N_150E but I actually didn't record it-- oops). This happened with net flux before. Still don't know what the deal is. * Revised download_tile_set.py to create overviews (.ovr) for tile set that are smaller (DEFLATE compression). * Rerunning drivers preparation and emissions onwards because of corrected drivers map. * Rerunning drivers preparation and emissions onwards because of corrected drivers map. * Rerunning emissions aggregation onwards because of corrected drivers map. * Rerunning net flux aggregation onwards because of corrected drivers map. * Creating soil C emissions for 2021 update (v1.2.2). * Updated readme for model update with 2021 TCL (v.1.2.2). This is the model version as I ran it for the 2021 update. --- Dockerfile | 29 ++- analyses/aggregate_results_to_4_km.py | 12 +- analyses/create_supplementary_outputs.py | 1 + analyses/download_tile_set.py | 123 +++++++++++ analyses/mp_aggregate_results_to_4_km.py | 18 +- analyses/mp_tile_statistics.py | 2 +- analyses/net_flux.py | 6 +- burn_date/mp_burn_year.py | 203 ++++++++++-------- burn_date/stack_ba_hv.py | 6 +- carbon_pools/create_carbon_pools.py | 18 +- carbon_pools/create_inputs_for_C_pools.py | 2 +- carbon_pools/create_soil_C.py | 2 +- carbon_pools/mp_create_carbon_pools.py | 15 +- carbon_pools/mp_create_inputs_for_C_pools.py | 2 +- constants_and_names.py | 132 ++++++------ data_prep/model_extent.py | 3 + data_prep/mp_model_extent.py | 5 +- data_prep/mp_plantation_preparation.py | 142 ++++++++---- data_prep/mp_prep_other_inputs.py | 8 +- ec2_launch_template_startup_instructions.TXT | 51 ++++- emissions/calculate_gross_emissions.py | 2 + .../cpp_util/calc_gross_emissions_generic.cpp | 2 +- emissions/cpp_util/constants.h | 4 +- emissions/cpp_util/equations.cpp | 2 +- emissions/mp_calculate_gross_emissions.py | 1 - pg_hba.conf | 99 +++++++++ readme.md | 141 +++++++----- {gain => removals}/.gitignore | 0 {gain => removals}/US_removal_rates.py | 20 +- {gain => removals}/__init__.py | 0 ...nual_gain_rate_AGC_BGC_all_forest_types.py | 2 + .../annual_gain_rate_IPCC_defaults.py | 30 +-- .../annual_gain_rate_mangrove.py | 30 +-- {gain => removals}/continent_ecozone_tiles.py | 8 +- .../forest_age_category_IPCC.py | 4 +- .../gain_year_count_all_forest_types.py | 63 ++++-- .../gross_removals_all_forest_types.py | 4 +- {gain => removals}/mp_US_removal_rates.py | 19 +- ...nual_gain_rate_AGC_BGC_all_forest_types.py | 2 +- .../mp_annual_gain_rate_IPCC_defaults.py | 27 +-- .../mp_annual_gain_rate_mangrove.py | 27 +-- .../mp_continent_ecozone_tiles.py | 6 +- .../mp_forest_age_category_IPCC.py | 13 +- .../mp_gain_year_count_all_forest_types.py | 4 +- .../mp_gross_removals_all_forest_types.py | 6 +- requirements.txt | 2 + run_full_model.py | 136 ++++++------ sensitivity_analysis/US_removal_rates.py | 16 +- sensitivity_analysis/mp_US_removal_rates.py | 13 +- sensitivity_analysis/mp_legal_AMZ_loss.py | 22 +- universal_util.py | 92 ++++++-- 51 files changed, 1039 insertions(+), 538 deletions(-) create mode 100644 analyses/download_tile_set.py create mode 100644 pg_hba.conf rename {gain => removals}/.gitignore (100%) rename {gain => removals}/US_removal_rates.py (90%) rename {gain => removals}/__init__.py (100%) rename {gain => removals}/annual_gain_rate_AGC_BGC_all_forest_types.py (99%) rename {gain => removals}/annual_gain_rate_IPCC_defaults.py (83%) rename {gain => removals}/annual_gain_rate_mangrove.py (83%) rename {gain => removals}/continent_ecozone_tiles.py (97%) rename {gain => removals}/forest_age_category_IPCC.py (99%) rename {gain => removals}/gain_year_count_all_forest_types.py (87%) rename {gain => removals}/gross_removals_all_forest_types.py (98%) rename {gain => removals}/mp_US_removal_rates.py (93%) rename {gain => removals}/mp_annual_gain_rate_AGC_BGC_all_forest_types.py (99%) rename {gain => removals}/mp_annual_gain_rate_IPCC_defaults.py (93%) rename {gain => removals}/mp_annual_gain_rate_mangrove.py (89%) rename {gain => removals}/mp_continent_ecozone_tiles.py (97%) rename {gain => removals}/mp_forest_age_category_IPCC.py (94%) rename {gain => removals}/mp_gain_year_count_all_forest_types.py (99%) rename {gain => removals}/mp_gross_removals_all_forest_types.py (97%) diff --git a/Dockerfile b/Dockerfile index 9ad8cd6d..3c6542bb 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,6 +1,8 @@ # Use osgeo GDAL image. It builds off Ubuntu 18.04 and uses GDAL 3.0.4 FROM osgeo/gdal:ubuntu-small-3.0.4 -#FROM osgeo/gdal:ubuntu-full-3.0.4 # Use this if downloading hdf files for burn year analysis + +# # Use this if downloading hdf files for burn year analysis +# FROM osgeo/gdal:ubuntu-full-3.0.4 ENV DIR=/usr/local/app ENV TMP=/usr/local/tmp @@ -39,11 +41,28 @@ RUN mkdir -p ${TILES} WORKDIR ${DIR} COPY . . -# set environment variables + +# Set environment variables ENV AWS_SHARED_CREDENTIALS_FILE $SECRETS_PATH/.aws/credentials ENV AWS_CONFIG_FILE $SECRETS_PATH/.aws/config +# https://www.postgresql.org/docs/current/libpq-envars.html +ENV PGUSER postgres +ENV PGDATABASE=ubuntu + + +####################################### +# Activate postgres and enable connection to it +# Copies config file that allows user postgres to enter psql shell, +# as shown here: https://stackoverflow.com/a/26735105 (change peer to trust). +# Commented out the start/restart commands because even with running them, postgres isn't running when the container is created. +# So there's no point in starting posgres here if it's not active when the instance opens. +####################################### +RUN cp pg_hba.conf /etc/postgresql/10/main/ +# RUN pg_ctlcluster 10 main start +# RUN service postgresql restart + -# Install missing python dependencies +# Install missing Python dependencies RUN pip3 install -r requirements.txt # Link gdal libraries @@ -57,10 +76,10 @@ RUN ln -s /usr/bin/python3 /usr/bin/python RUN git config --global user.email dagibbs22@gmail.com ## Check out the branch that I'm currently using for model development -#RUN git checkout model_v_1.2.1 +#RUN git checkout model_v_1.2.2 # ## Makes sure the latest version of the current branch is downloaded -#RUN git pull origin model_v_1.2.1 +#RUN git pull origin model_v_1.2.2 ## Compile C++ scripts #RUN g++ /usr/local/app/emissions/cpp_util/calc_gross_emissions_generic.cpp -o /usr/local/app/emissions/cpp_util/calc_gross_emissions_generic.exe -lgdal && \ diff --git a/analyses/aggregate_results_to_4_km.py b/analyses/aggregate_results_to_4_km.py index 71a3892e..a97a4db6 100644 --- a/analyses/aggregate_results_to_4_km.py +++ b/analyses/aggregate_results_to_4_km.py @@ -13,7 +13,7 @@ For sensitivity analysis runs, it only processes outputs which actually have a sensitivity analysis version. The user has to supply a tcd threshold for which forest pixels to include in the results. Defaults to cn.canopy_threshold. For sensitivity analysis, the s3 folder with the aggregations for the standard model must be specified. -sample command: python mp_aggregate_results_to_4_km.py -tcd 30 -t no_shifting_ag -sagg s3://gfw2-data/climate/carbon_model/0_4deg_output_aggregation/biomass_soil/standard/20200901/net_flux_Mt_CO2e_biomass_soil_per_year_tcd30_0_4deg_modelv1_2_0_std_20200901.tif +sample command: python mp_aggregate_results_to_4_km.py -tcd 30 -t no_shifting_ag -sagg s3://gfw2-data/climate/carbon_model/0_04deg_output_aggregation/biomass_soil/standard/20200901/net_flux_Mt_CO2e_biomass_soil_per_year_tcd30_0_4deg_modelv1_2_0_std_20200901.tif ''' @@ -71,7 +71,9 @@ def aggregate(tile, thresh, sensit_type, no_upload): #2D array in which the 0.04x0.04 deg aggregated sums will be stored sum_array = np.zeros([250,250], 'float32') - out_raster = "{0}_{1}_0_4deg.tif".format(tile_id, tile_type) + out_raster = "{0}_{1}_0_04deg.tif".format(tile_id, tile_type) + + uu.check_memory() # Iterates across the windows (160x160 30m pixels) of the input tile for idx, window in windows: @@ -103,11 +105,11 @@ def aggregate(tile, thresh, sensit_type, no_upload): sum_array[idx[0], idx[1]] = non_zero_pixel_sum - # Converts the annual carbon gain values annual gain in megatonnes and makes negative (because removals are negative) + # Converts the annual carbon removals values annual removals in megatonnes and makes negative (because removals are negative) if cn.pattern_annual_gain_AGC_all_types in tile_type: sum_array = sum_array / cn.tonnes_to_megatonnes * -1 - # Converts the cumulative CO2 gain values to annualized CO2 in megatonnes and makes negative (because removals are negative) + # Converts the cumulative CO2 removals values to annualized CO2 in megatonnes and makes negative (because removals are negative) if cn.pattern_cumul_gain_AGCO2_BGCO2_all_types in tile_type: sum_array = sum_array / cn.loss_years / cn.tonnes_to_megatonnes * -1 @@ -183,7 +185,7 @@ def aggregate(tile, thresh, sensit_type, no_upload): # aggregated.close() # Prints information about the tile that was just processed - uu.end_of_fx_summary(start, tile_id, '{}_0_4deg'.format(tile_type), no_upload) + uu.end_of_fx_summary(start, tile_id, '{}_0_04deg'.format(tile_type), no_upload) # Calculates the percent difference between the standard model's net flux output diff --git a/analyses/create_supplementary_outputs.py b/analyses/create_supplementary_outputs.py index 80983fca..244cec63 100644 --- a/analyses/create_supplementary_outputs.py +++ b/analyses/create_supplementary_outputs.py @@ -112,6 +112,7 @@ def create_supplementary_outputs(tile_id, input_pattern, output_patterns, sensit per_pixel_forest_extent_dst.update_tags( scale='Negative values are net sinks. Positive values are net sources.') + uu.check_memory() # Iterates across the windows of the input tiles for idx, window in windows: diff --git a/analyses/download_tile_set.py b/analyses/download_tile_set.py new file mode 100644 index 00000000..9d174d37 --- /dev/null +++ b/analyses/download_tile_set.py @@ -0,0 +1,123 @@ +''' +This script downloads the listed tiles and creates overviews for them for easy viewing in ArcMap. +It must be run in the Docker container, and so tiles are downloaded to and overviewed in the folder of the Docker container where +all other tiles are downloaded. +''' + +import multiprocessing +from functools import partial +from osgeo import gdal +import pandas as pd +import datetime +import argparse +import glob +from subprocess import Popen, PIPE, STDOUT, check_call +import os +import sys +sys.path.append('../') +import constants_and_names as cn +import universal_util as uu + +def download_tile_set(sensit_type, tile_id_list): + + uu.print_log("Downloading all tiles for: ", tile_id_list) + + wd = os.path.join(cn.docker_base_dir,"spot_download") + + os.chdir(wd) + + download_dict = { + cn.model_extent_dir: [cn.pattern_model_extent], + cn.age_cat_IPCC_dir: [cn.pattern_age_cat_IPCC], + cn.annual_gain_AGB_IPCC_defaults_dir: [cn.pattern_annual_gain_AGB_IPCC_defaults], + cn.annual_gain_BGB_IPCC_defaults_dir: [cn.pattern_annual_gain_BGB_IPCC_defaults], + cn.stdev_annual_gain_AGB_IPCC_defaults_dir: [cn.pattern_stdev_annual_gain_AGB_IPCC_defaults], + cn.removal_forest_type_dir: [cn.pattern_removal_forest_type], + cn.annual_gain_AGC_all_types_dir: [cn.pattern_annual_gain_AGC_all_types], + cn.annual_gain_BGC_all_types_dir: [cn.pattern_annual_gain_BGC_all_types], + cn.annual_gain_AGC_BGC_all_types_dir: [cn.pattern_annual_gain_AGC_BGC_all_types], + cn.stdev_annual_gain_AGC_all_types_dir: [cn.pattern_stdev_annual_gain_AGC_all_types], + cn.gain_year_count_dir: [cn.pattern_gain_year_count], + cn.cumul_gain_AGCO2_all_types_dir: [cn.pattern_cumul_gain_AGCO2_all_types], + cn.cumul_gain_BGCO2_all_types_dir: [cn.pattern_cumul_gain_BGCO2_all_types], + cn.cumul_gain_AGCO2_BGCO2_all_types_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types], + cn.AGC_emis_year_dir: [cn.pattern_AGC_emis_year], + cn.BGC_emis_year_dir: [cn.pattern_BGC_emis_year], + cn.deadwood_emis_year_2000_dir: [cn.pattern_deadwood_emis_year_2000], + cn.litter_emis_year_2000_dir: [cn.pattern_litter_emis_year_2000], + cn.soil_C_emis_year_2000_dir: [cn.pattern_soil_C_emis_year_2000], + cn.total_C_emis_year_dir: [cn.pattern_total_C_emis_year], + + # cn.gross_emis_commod_biomass_soil_dir: [cn.pattern_gross_emis_commod_biomass_soil], + # cn.gross_emis_shifting_ag_biomass_soil_dir: [cn.pattern_gross_emis_shifting_ag_biomass_soil], + # cn.gross_emis_forestry_biomass_soil_dir: [cn.pattern_gross_emis_forestry_biomass_soil], + # cn.gross_emis_wildfire_biomass_soil_dir: [cn.pattern_gross_emis_wildfire_biomass_soil], + # cn.gross_emis_urban_biomass_soil_dir: [cn.pattern_gross_emis_urban_biomass_soil], + # cn.gross_emis_no_driver_biomass_soil_dir: [cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil], + + cn.gross_emis_all_gases_all_drivers_biomass_soil_dir: [cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil], + cn.gross_emis_co2_only_all_drivers_biomass_soil_dir: [cn.pattern_gross_emis_co2_only_all_drivers_biomass_soil], + cn.gross_emis_non_co2_all_drivers_biomass_soil_dir: [cn.pattern_gross_emis_non_co2_all_drivers_biomass_soil], + cn.gross_emis_nodes_biomass_soil_dir: [cn.pattern_gross_emis_nodes_biomass_soil], + cn.net_flux_dir: [cn.pattern_net_flux], + cn.cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent], + cn.cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types_forest_extent], + cn.cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent], + cn.gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent_dir: [cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent], + cn.gross_emis_all_gases_all_drivers_biomass_soil_forest_extent_dir: [cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil_forest_extent], + cn.gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent_dir: [cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent], + cn.net_flux_per_pixel_full_extent_dir: [cn.pattern_net_flux_per_pixel_full_extent], + cn.net_flux_forest_extent_dir: [cn.pattern_net_flux_forest_extent], + cn.net_flux_per_pixel_forest_extent_dir: [cn.pattern_net_flux_per_pixel_forest_extent] + } + + # Downloads input files or entire directories, depending on how many tiles are in the tile_id_list + for key, values in download_dict.items(): + dir = key + pattern = values[0] + uu.s3_flexible_download(dir, pattern, wd, sensit_type, tile_id_list) + + cmd = ['aws', 's3', 'cp', cn.output_aggreg_dir, wd, '--recursive'] + uu.log_subprocess_output_full(cmd) + + tile_list = glob.glob('*tif') + uu.print_log("Tiles for pyramiding: ", tile_list) + + # https://gis.stackexchange.com/questions/160459/comparing-use-of-gdal-to-build-raster-pyramids-or-overviews-versus-arcmap + # Example 3 from https://gdal.org/programs/gdaladdo.html + # https://stackoverflow.com/questions/33158526/how-to-correctly-use-gdaladdo-in-a-python-program + for tile in tile_list: + uu.print_log("Pyramiding ", tile) + Image = gdal.Open(tile, 0) # 0 = read-only, 1 = read-write. + gdal.SetConfigOption('COMPRESS_OVERVIEW', 'DEFLATE') + Image.BuildOverviews('NEAREST', [2, 4, 8, 16, 32], gdal.TermProgress_nocb) + del Image # close the dataset (Python object and pointers) + + uu.print_log("Pyramiding done") + + +if __name__ == '__main__': + + # The arguments for what kind of model run is being run (standard conditions or a sensitivity analysis) and + # the tiles to include + parser = argparse.ArgumentParser( + description='Download model outputs for specific tile') + parser.add_argument('--model-type', '-t', required=True, + help='{}'.format(cn.model_type_arg_help)) + parser.add_argument('--tile_id_list', '-l', required=True, + help='List of tile ids to use in the model. Should be of form 00N_110E or 00N_110E,00N_120E or all.') + parser.add_argument('--run-date', '-d', required=False, + help='Date of run. Must be format YYYYMMDD.') + args = parser.parse_args() + sensit_type = args.model_type + tile_id_list = args.tile_id_list + run_date = args.run_date + + # Create the output log + uu.initiate_log(tile_id_list=tile_id_list, sensit_type=sensit_type, run_date=run_date) + + # Checks whether the sensitivity analysis and tile_id_list arguments are valid + uu.check_sensit_type(sensit_type) + tile_id_list = uu.tile_id_list_check(tile_id_list) + + download_tile_set(sensit_type=sensit_type, tile_id_list=tile_id_list) \ No newline at end of file diff --git a/analyses/mp_aggregate_results_to_4_km.py b/analyses/mp_aggregate_results_to_4_km.py index 1a41607b..e8713e1b 100644 --- a/analyses/mp_aggregate_results_to_4_km.py +++ b/analyses/mp_aggregate_results_to_4_km.py @@ -13,7 +13,7 @@ For sensitivity analysis runs, it only processes outputs which actually have a sensitivity analysis version. The user has to supply a tcd threshold for which forest pixels to include in the results. Defaults to cn.canopy_threshold. For sensitivity analysis, the s3 folder with the aggregations for the standard model must be specified. -sample command: python mp_aggregate_results_to_4_km.py -tcd 30 -t no_shifting_ag -sagg s3://gfw2-data/climate/carbon_model/0_4deg_output_aggregation/biomass_soil/standard/20200901/net_flux_Mt_CO2e_biomass_soil_per_year_tcd30_0_4deg_modelv1_2_0_std_20200901.tif +sample command: python mp_aggregate_results_to_4_km.py -tcd 30 -t no_shifting_ag -sagg s3://gfw2-data/climate/carbon_model/0_04deg_output_aggregation/biomass_soil/standard/20200901/net_flux_Mt_CO2e_biomass_soil_per_year_tcd30_0_4deg_modelv1_2_0_std_20200901.tif ''' @@ -81,7 +81,7 @@ def mp_aggregate_results_to_4_km(sensit_type, thresh, tile_id_list, std_net_flux # Downloads input files or entire directories, depending on how many tiles are in the tile_id_list uu.s3_flexible_download(dir, download_pattern_name, cn.docker_base_dir, sensit_type, tile_id_list) - + if tile_id_list == 'all': # List of tiles to run in the model @@ -110,7 +110,7 @@ def mp_aggregate_results_to_4_km(sensit_type, thresh, tile_id_list, std_net_flux # from https://stackoverflow.com/questions/12666897/removing-an-item-from-list-matching-a-substring tile_list = [i for i in tile_list if not ('hanson_2013' in i)] tile_list = [i for i in tile_list if not ('rewindow' in i)] - tile_list = [i for i in tile_list if not ('0_4deg' in i)] + tile_list = [i for i in tile_list if not ('0_04deg' in i)] tile_list = [i for i in tile_list if not ('.ovr' in i)] # tile_list = ['00N_070W_cumul_gain_AGCO2_BGCO2_t_ha_all_forest_types_2001_15_biomass_swap.tif'] # test tiles @@ -169,8 +169,8 @@ def mp_aggregate_results_to_4_km(sensit_type, thresh, tile_id_list, std_net_flux # aggregate_results_to_4_km.aggregate(tile, thresh, sensit_type, no_upload) # Makes a vrt of all the output 10x10 tiles (10 km resolution) - out_vrt = "{}_0_4deg.vrt".format(pattern) - os.system('gdalbuildvrt -tr 0.04 0.04 {0} *{1}_0_4deg*.tif'.format(out_vrt, pattern)) + out_vrt = "{}_0_04deg.vrt".format(pattern) + os.system('gdalbuildvrt -tr 0.04 0.04 {0} *{1}_0_04deg*.tif'.format(out_vrt, pattern)) # Creates the output name for the 10km map out_pattern = uu.name_aggregated_output(download_pattern_name, thresh, sensit_type) @@ -221,7 +221,13 @@ def mp_aggregate_results_to_4_km(sensit_type, thresh, tile_id_list, std_net_flux for tile_name in tile_list: tile_id = uu.get_tile_id(tile_name) os.remove('{0}_{1}_rewindow.tif'.format(tile_id, pattern)) - os.remove('{0}_{1}_0_4deg.tif'.format(tile_id, pattern)) + os.remove('{0}_{1}_0_04deg.tif'.format(tile_id, pattern)) + + # Need to delete rewindowed tiles so they aren't confused with the normal tiles for creation of supplementary outputs + rewindow_list = glob.glob('*rewindow*tif') + for rewindow_tile in rewindow_list: + os.remove(rewindow_tile) + uu.print_log("Deleted all rewindowed tiles") # Compares the net flux from the standard model and the sensitivity analysis in two ways. diff --git a/analyses/mp_tile_statistics.py b/analyses/mp_tile_statistics.py index 79df27f8..82ac336e 100644 --- a/analyses/mp_tile_statistics.py +++ b/analyses/mp_tile_statistics.py @@ -195,7 +195,7 @@ def mp_tile_statistics(sensit_type, tile_id_list): if __name__ == '__main__': parser = argparse.ArgumentParser( - description='Create tiles of the annual AGB and BGB gain rates for mangrove forests') + description='Create tiles of the annual AGB and BGB removals rates for mangrove forests') parser.add_argument('--model-type', '-t', required=True, help='{}'.format(cn.model_type_arg_help)) parser.add_argument('--tile_id_list', '-l', required=True, diff --git a/analyses/net_flux.py b/analyses/net_flux.py index e3c3e393..409c1f74 100644 --- a/analyses/net_flux.py +++ b/analyses/net_flux.py @@ -16,7 +16,7 @@ def net_calc(tile_id, pattern, sensit_type, no_upload): # Start time start = datetime.datetime.now() - # Names of the gain and emissions tiles + # Names of the removals and emissions tiles removals_in = uu.sensit_tile_rename(sensit_type, tile_id, cn.pattern_cumul_gain_AGCO2_BGCO2_all_types) emissions_in = uu.sensit_tile_rename(sensit_type, tile_id, cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil) @@ -73,6 +73,8 @@ def net_calc(tile_id, pattern, sensit_type, no_upload): net_flux_dst.update_tags( scale='Negative values are net sinks. Positive values are net sources.') + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tile for idx, window in windows: @@ -86,7 +88,7 @@ def net_calc(tile_id, pattern, sensit_type, no_upload): except: emissions_window = np.zeros((window.height, window.width)).astype('float32') - # Subtracts gain that from loss + # Subtracts removals from emissions to calculate net flux (negative is net sink, positive is net source) dst_data = emissions_window - removals_window net_flux_dst.write_band(1, dst_data, window=window) diff --git a/burn_date/mp_burn_year.py b/burn_date/mp_burn_year.py index a5db6db3..6149bceb 100644 --- a/burn_date/mp_burn_year.py +++ b/burn_date/mp_burn_year.py @@ -1,14 +1,15 @@ ''' Creates tiles of when tree cover loss coincides with burning or preceded burning by one year. There are four steps to this: 1) acquire raw hdfs from MODIS burned area sftp; 2) make tifs of burned area for -each year in each MODUS h-v tile; 3) make annual Hansen-style (extent, res, etc.) tiles of burned area; +each year in each MODIS h-v tile; 3) make annual Hansen-style (extent, res, etc.) tiles of burned area; 4) make tiles of where TCL and burning coincided (same year or with 1 year lag). To update this, steps 1-3 can be run on only the latest year of MODIS burned area product. Only step 4 needs to be run on the entire time series. That is, steps 1-3 operate on burned area products separately for each year, so adding another year of data won't change steps 1-3 for preceding years. NOTE: The step in which hdf files are opened and converted to tifs (step 2) requires -osgeo/gdal:ubuntu-full-X.X.X Docker image. The "small' Docker image doesn't have an hdf driver in gdal, so it can't read +osgeo/gdal:ubuntu-full-X.X.X Docker image (change in Dockerfile). +The "small' Docker image doesn't have an hdf driver in gdal, so it can't read the hdf files on the ftp site. The rest of the burned area analysis can be done with a 'small' version of the Docker image (though that would require terminating the Docker container and restarting it, which would only make sense if the analysis was being continued later). @@ -99,115 +100,127 @@ def mp_burn_year(tile_id_list, run_date = None, no_upload = None): ''' Downloading the hdf files from the sftp burned area site is done outside the script in the sftp shell on the command line. - This will download all the 2020 hdfs to the spot machine. It will take a few minutes before the first - hdf is downloaded but then it should go quickly. - Change 2020 to other year for future years of downloads. + This will download all the 2021 hdfs to the spot machine. There will be a pause of a few minutes before the first + hdf is downloaded but then it should go quickly (5 minutes for 2021 data). + Change 2021 to other year for future years of downloads. https://modis-fire.umd.edu/files/MODIS_C6_BA_User_Guide_1.3.pdf, page 24, section 4.1.3 + Change directory to /app/burn_date/ and download hdfs into burn_date folder: + sftp fire@fuoco.geog.umd.edu [For password] burnt cd data/MODIS/C6/MCD64A1/HDF - ls [to check that it's the folder with all the tile folders] - get h??v??/MCD64A1.A2020* + ls [to check that it's the folder with all the h-v tile folders] + get h??v??/MCD64A1.A2021* bye //exits the stfp shell + + Before moving to the next step, confirm that all months of burned area data were downloaded. + The last month will have the format MCD64A1.A20**336.h... or so. ''' - # Uploads the latest year of raw burn area hdfs to s3. - # All hdfs go in this folder - cmd = ['aws', 's3', 'cp', '{0}/burn_date/'.format(cn.docker_app), cn.burn_year_hdf_raw_dir, '--recursive', '--exclude', '*', '--include', '*hdf'] - uu.log_subprocess_output_full(cmd) - - - # Step 2: - # Makes burned area rasters for each year for each MODIS horizontal-vertical tile. - # This only needs to be done for the most recent year of data (set in stach_ba_hv). - uu.print_log("Stacking hdf into MODIS burned area tifs by year and MODIS hv tile...") - - count = multiprocessing.cpu_count() - pool = multiprocessing.Pool(processes=count - 10) - pool.map(stack_ba_hv.stack_ba_hv, global_grid_hv) - pool.close() - pool.join() - - # # For single processor use - # for hv_tile in global_grid_hv: - # stack_ba_hv.stack_ba_hv(hv_tile) - - - # Step 3: - # Creates a 10x10 degree wgs 84 tile of .00025 res burned year. - # Downloads all MODIS hv tiles from s3, - # makes a mosaic for each year, and warps to Hansen extent. - # Range is inclusive at lower end and exclusive at upper end (e.g., 2001, 2021 goes from 2001 to 2020). - # This only needs to be done for the most recent year of data. - # NOTE: The first time I ran this for the 2020 TCL update, I got an error about uploading the log to s3 - # after most of the tiles were processed. I didn't know why it happened, so I reran the step and it went fine. - for year in range(2020, 2021): - - uu.print_log("Processing", year) - - # Downloads all hv tifs for this year - include = '{0}_*.tif'.format(year) - year_tifs_folder = "{}_year_tifs".format(year) - utilities.makedir(year_tifs_folder) - - uu.print_log("Downloading MODIS burn date files from s3...") - - cmd = ['aws', 's3', 'cp', cn.burn_year_stacked_hv_tif_dir, year_tifs_folder] - cmd += ['--recursive', '--exclude', "*", '--include', include] - uu.log_subprocess_output_full(cmd) - - uu.print_log("Creating vrt of MODIS files...") - - vrt_name = "global_vrt_{}.vrt".format(year) - - # Builds list of vrt files - with open('vrt_files.txt', 'w') as vrt_files: - vrt_tifs = glob.glob(year_tifs_folder + "/*.tif") - for tif in vrt_tifs: - vrt_files.write(tif + "\n") - - # Creates vrt with wgs84 MODIS tiles. - cmd = ['gdalbuildvrt', '-input_file_list', 'vrt_files.txt', vrt_name] - uu.log_subprocess_output_full(cmd) - - uu.print_log("Reprojecting vrt...") - - # Builds new vrt and virtually project it - # This reprojection could be done as part of the clip_year_tiles function but Sam had it out here like this and - # so I'm leaving it like that. - vrt_wgs84 = 'global_vrt_{}_wgs84.vrt'.format(year) - cmd = ['gdalwarp', '-of', 'VRT', '-t_srs', "EPSG:4326", '-tap', '-tr', '.00025', '.00025', '-overwrite', - vrt_name, vrt_wgs84] - uu.log_subprocess_output_full(cmd) - - # Creates a list of lists, with year and tile id to send to multi processor - tile_year_list = [] - for tile_id in tile_id_list: - tile_year_list.append([tile_id, year]) - - # Given a list of tiles and years ['00N_000E', 2017] and a VRT of burn data, - # the global vrt has pixels representing burned or not. This process clips the global VRT - # and changes the pixel value to represent the year the pixel was burned. Each tile has value of - # year burned and NoData. - count = multiprocessing.cpu_count() - pool = multiprocessing.Pool(processes=count-5) - pool.map(partial(clip_year_tiles.clip_year_tiles, no_upload=no_upload), tile_year_list) - pool.close() - pool.join() - # # For single processor use - # for tile_year in tile_year_list: - # clip_year_tiles.clip_year_tiles(tile_year, no_upload) + # # Uploads the latest year of raw burn area hdfs to s3. + # # All hdfs go in this folder + # cmd = ['aws', 's3', 'cp', '{0}/burn_date/'.format(cn.docker_app), cn.burn_year_hdf_raw_dir, '--recursive', '--exclude', '*', '--include', '*hdf'] + # uu.log_subprocess_output_full(cmd) + # + # + # # Step 2: + # # Makes burned area rasters for each year for each MODIS horizontal-vertical tile. + # # This only needs to be done for the most recent year of data (set in stach_ba_hv). + # uu.print_log("Stacking hdf into MODIS burned area tifs by year and MODIS hv tile...") + # + # count = multiprocessing.cpu_count() + # pool = multiprocessing.Pool(processes=count - 10) + # pool.map(stack_ba_hv.stack_ba_hv, global_grid_hv) + # pool.close() + # pool.join() + # + # # # For single processor use + # # for hv_tile in global_grid_hv: + # # stack_ba_hv.stack_ba_hv(hv_tile) + # + # + # # Step 3: + # # Creates a 10x10 degree wgs 84 tile of .00025 res burned year. + # # Downloads all MODIS hv tiles from s3, + # # makes a mosaic for each year, and warps to Hansen extent. + # # Range is inclusive at lower end and exclusive at upper end (e.g., 2001, 2022 goes from 2001 to 2021). + # # This only needs to be done for the most recent year of data. + # # NOTE: The first time I ran this for the 2020 TCL update, I got an error about uploading the log to s3 + # # after most of the tiles were processed. I didn't know why it happened, so I reran the step and it went fine. + # + # start_year = 2000 + cn.loss_years + # end_year = 2000 + cn.loss_years + 1 + # + # # Assumes that only the last year of fires are being processed + # for year in range(start_year, end_year): + # + # uu.print_log("Processing", year) + # + # # Downloads all hv tifs for this year + # include = '{0}_*.tif'.format(year) + # year_tifs_folder = "{}_year_tifs".format(year) + # utilities.makedir(year_tifs_folder) + # + # uu.print_log("Downloading MODIS burn date files from s3...") + # + # cmd = ['aws', 's3', 'cp', cn.burn_year_stacked_hv_tif_dir, year_tifs_folder] + # cmd += ['--recursive', '--exclude', "*", '--include', include] + # uu.log_subprocess_output_full(cmd) + # + # uu.print_log("Creating vrt of MODIS files...") + # + # vrt_name = "global_vrt_{}.vrt".format(year) + # + # # Builds list of vrt files + # with open('vrt_files.txt', 'w') as vrt_files: + # vrt_tifs = glob.glob(year_tifs_folder + "/*.tif") + # for tif in vrt_tifs: + # vrt_files.write(tif + "\n") + # + # # Creates vrt with wgs84 MODIS tiles. + # cmd = ['gdalbuildvrt', '-input_file_list', 'vrt_files.txt', vrt_name] + # uu.log_subprocess_output_full(cmd) + # + # uu.print_log("Reprojecting vrt...") + # + # # Builds new vrt and virtually project it + # # This reprojection could be done as part of the clip_year_tiles function but Sam had it out here like this and + # # so I'm leaving it like that. + # vrt_wgs84 = 'global_vrt_{}_wgs84.vrt'.format(year) + # cmd = ['gdalwarp', '-of', 'VRT', '-t_srs', "EPSG:4326", '-tap', '-tr', str(cn.Hansen_res), str(cn.Hansen_res), + # '-overwrite', vrt_name, vrt_wgs84] + # uu.log_subprocess_output_full(cmd) + # + # # Creates a list of lists, with year and tile id to send to multi processor + # tile_year_list = [] + # for tile_id in tile_id_list: + # tile_year_list.append([tile_id, year]) + # + # # Given a list of tiles and years ['00N_000E', 2017] and a VRT of burn data, + # # the global vrt has pixels representing burned or not. This process clips the global VRT + # # and changes the pixel value to represent the year the pixel was burned. Each tile has value of + # # year burned and NoData. + # count = multiprocessing.cpu_count() + # pool = multiprocessing.Pool(processes=count-5) + # pool.map(partial(clip_year_tiles.clip_year_tiles, no_upload=no_upload), tile_year_list) + # pool.close() + # pool.join() + # + # # # For single processor use + # # for tile_year in tile_year_list: + # # clip_year_tiles.clip_year_tiles(tile_year, no_upload) + # + # uu.print_log("Processing for {} done. Moving to next year.".format(year)) - uu.print_log("Processing for {} done. Moving to next year.".format(year)) # Step 4: # Creates a single Hansen tile covering all years that represents where burning coincided with tree cover loss # or preceded TCL by one year. # This needs to be done on all years each time burned area is updated. - # Downloads the loss tiles + # Downloads the loss tiles. The step 3 burn year tiles are downloaded within hansen_burnyear uu.s3_folder_download(cn.loss_dir, '.', 'std', cn.pattern_loss) uu.print_log("Extracting burn year data that coincides with tree cover loss...") diff --git a/burn_date/stack_ba_hv.py b/burn_date/stack_ba_hv.py index b06798b0..a49358b0 100644 --- a/burn_date/stack_ba_hv.py +++ b/burn_date/stack_ba_hv.py @@ -11,7 +11,11 @@ def stack_ba_hv(hv_tile): - for year in range(2020, 2021): # End year is not included in burn year product + start_year = 2000 + cn.loss_years + end_year = 2000 + cn.loss_years + 1 + + # Assumes that only the last year of fires are being processed + for year in range(start_year, end_year): # End year is not included in burn year product # Download hdf files from s3 into folders by h and v output_dir = utilities.makedir('{0}/{1}/raw/'.format(hv_tile, year)) diff --git a/carbon_pools/create_carbon_pools.py b/carbon_pools/create_carbon_pools.py index 2ffd6f54..bd2435c9 100644 --- a/carbon_pools/create_carbon_pools.py +++ b/carbon_pools/create_carbon_pools.py @@ -13,7 +13,7 @@ def mangrove_pool_ratio_dict(gain_table_simplified, tropical_dry, tropical_wet, subtropical): # Creates x_pool:aboveground biomass ratio dictionary for the three mangrove types, where the keys correspond to - # the "mangType" field in the gain rate spreadsheet. + # the "mangType" field in the removals rate spreadsheet. # If the assignment of mangTypes to ecozones changes, that column in the spreadsheet may need to change and the # keys in this dictionary would need to change accordingly. # Key 4 is water, so there shouldn't be any mangrove values there. @@ -21,7 +21,7 @@ def mangrove_pool_ratio_dict(gain_table_simplified, tropical_dry, tropical_wet, '3': subtropical, '4': '100'} type_ratio_dict_final = {int(k): float(v) for k, v in list(type_ratio_dict.items())} - # Applies the x_pool:aboveground biomass ratios for the three mangrove types to the annual aboveground gain rates to + # Applies the x_pool:aboveground biomass ratios for the three mangrove types to the annual aboveground removals rates to # create a column of x_pool:AGB gain_table_simplified['x_pool_AGB_ratio'] = gain_table_simplified['mangType'].map(type_ratio_dict_final) @@ -162,6 +162,8 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent, no_upload): uu.print_log(" Creating aboveground carbon density for {0} using carbon_pool_extent '{1}'...".format(tile_id, carbon_pool_extent)) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tiles for idx, window in windows: @@ -213,7 +215,7 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent, no_upload): # print(agc_2000_model_extent_window[0][0:5]) # Creates a mask based on whether the pixels had loss and gain in them. Loss&gain pixels are 1, all else are 0. - # This is used to determine how much post-2000 carbon gain to add to AGC2000 pixels. + # This is used to determine how much post-2000 carbon removals to add to AGC2000 pixels. loss_gain_mask = np.ma.masked_where(loss_year_window == 0, gain_window).filled(0) # Loss pixels that also have gain pixels are treated differently from loss-only pixels. @@ -226,7 +228,7 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent, no_upload): # Calculates AGC in emission year for pixels that had loss & gain (excludes loss_gain_mask = 0). - # To do this, it adds only the portion of the gain that occurred before the loss year to the carbon in 2000. + # To do this, it adds only the portion of the removals that occurred before the loss year to the carbon in 2000. gain_before_loss = annual_gain_AGC_window * (loss_year_window - 1) AGC_emis_year_loss_and_gain = agc_2000_model_extent_window + gain_before_loss AGC_emis_year_loss_and_gain_masked = np.ma.masked_where(loss_gain_mask == 0, AGC_emis_year_loss_and_gain).filled(0) @@ -326,6 +328,8 @@ def create_BGC(tile_id, mang_BGB_AGB_ratio, carbon_pool_extent, sensit_type, no_ uu.print_log(" Creating belowground carbon density for {0} using carbon_pool_extent '{1}'...".format(tile_id, carbon_pool_extent)) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tiles for idx, window in windows: @@ -500,6 +504,8 @@ def create_deadwood_litter(tile_id, mang_deadwood_AGB_ratio, mang_litter_AGB_rat uu.print_log(" Creating deadwood and litter carbon density for {0} using carbon_pool_extent '{1}'...".format(tile_id, carbon_pool_extent)) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tiles for idx, window in windows: @@ -732,6 +738,8 @@ def create_soil_emis_extent(tile_id, pattern, sensit_type, no_upload): uu.print_log(" Creating soil carbon density for loss pixels in {}...".format(tile_id)) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tiles for idx, window in windows: @@ -833,6 +841,8 @@ def create_total_C(tile_id, carbon_pool_extent, sensit_type, no_upload): uu.print_log(" Creating total carbon density for {0} using carbon_pool_extent '{1}'...".format(tile_id, carbon_pool_extent)) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tiles for idx, window in windows: diff --git a/carbon_pools/create_inputs_for_C_pools.py b/carbon_pools/create_inputs_for_C_pools.py index d7192399..c99f9eef 100644 --- a/carbon_pools/create_inputs_for_C_pools.py +++ b/carbon_pools/create_inputs_for_C_pools.py @@ -48,7 +48,7 @@ def create_input_files(tile_id, no_upload): windows = bor_tem_trop_src.block_windows(1) # Updates kwargs for the output dataset. - # Need to update data type to float 32 so that it can handle fractional gain rates + # Need to update data type to float 32 so that it can handle fractional removal rates kwargs.update( driver='GTiff', count=1, diff --git a/carbon_pools/create_soil_C.py b/carbon_pools/create_soil_C.py index e01d8ad0..b022b800 100644 --- a/carbon_pools/create_soil_C.py +++ b/carbon_pools/create_soil_C.py @@ -87,7 +87,7 @@ def create_combined_soil_C(tile_id, no_upload): mineral_soil_src = rasterio.open(mineral_soil) # Updates kwargs for the output dataset. - # Need to update data type to float 32 so that it can handle fractional gain rates + # Need to update data type to float 32 so that it can handle fractional removal rates kwargs.update( driver='GTiff', count=1, diff --git a/carbon_pools/mp_create_carbon_pools.py b/carbon_pools/mp_create_carbon_pools.py index fcae01ac..e45d61c8 100644 --- a/carbon_pools/mp_create_carbon_pools.py +++ b/carbon_pools/mp_create_carbon_pools.py @@ -6,8 +6,8 @@ It also creates carbon emitted_pools for the year of loss/emissions-- only for pixels that had loss that are within the model. To do this, it adds CO2 (carbon) accumulated since 2000 to the C (biomass) 2000 stock, so that the CO2 (carbon) emitted is 2000 + gains -until loss. (For Hansen loss+gain pixels, only the portion of C that is accumulated before loss is included in the -lost carbon (lossyr-1), not the entire carbon gain of the pixel.) Because the emissions year carbon emitted_pools depend on +until loss. (For Hansen loss+removals pixels, only the portion of C that is accumulated before loss is included in the +lost carbon (lossyr-1), not the entire carbon removals of the pixel.) Because the emissions year carbon emitted_pools depend on carbon removals, any time the removals model changes, the emissions year carbon emitted_pools need to be regenerated. The carbon emitted_pools in 2000 are not used for the flux model at all; they are purely for illustrative purposes. Only the @@ -164,13 +164,14 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da if run_date is not None and no_upload is not None: output_dir_list = uu.replace_output_dir_date(output_dir_list, run_date) - # Table with IPCC Wetland Supplement Table 4.4 default mangrove gain rates - cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + # Table with IPCC Wetland Supplement Table 4.4 default mangrove removals rates + # cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir] uu.log_subprocess_output_full(cmd) pd.options.mode.chained_assignment = None - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates gain_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name="mangrove gain, for model") @@ -303,7 +304,9 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da if sensit_type == 'biomass_swap': processes = 10 # 10 processors = XXX GB peak else: - processes = 15 # 32 processors = >750 GB peak; 24 > 750 GB peak; 14 = 685 GB peak (stops around 600, then increases very very slowly); 15 = 700 GB peak + # 32 processors = >750 GB peak; 24 > 750 GB peak; 14 = 685 GB peak (stops around 600, then increases very very slowly); + # 15 = 700 GB peak once but also too much memory another time, so back to 14 + processes = 14 else: # For 2000, or loss & 2000 ### Note: deleted precip, elevation, and WHRC AGB tiles at equatorial latitudes as deadwood and litter were produced. ### There wouldn't have been enough room for all deadwood and litter otherwise. diff --git a/carbon_pools/mp_create_inputs_for_C_pools.py b/carbon_pools/mp_create_inputs_for_C_pools.py index 61b3f761..72596b67 100644 --- a/carbon_pools/mp_create_inputs_for_C_pools.py +++ b/carbon_pools/mp_create_inputs_for_C_pools.py @@ -76,7 +76,7 @@ def mp_create_inputs_for_C_pools(tile_id_list, run_date = None, no_upload = None if __name__ == '__main__': parser = argparse.ArgumentParser( - description='Create tiles of the annual AGB and BGB gain rates for mangrove forests') + description='Create tiles of the annual AGB and BGB removals rates for mangrove forests') parser.add_argument('--tile_id_list', '-l', required=True, help='List of tile ids to use in the model. Should be of form 00N_110E or 00N_110E,00N_120E or all.') parser.add_argument('--run-date', '-d', required=False, diff --git a/constants_and_names.py b/constants_and_names.py index 72cf405c..8b1fda7d 100644 --- a/constants_and_names.py +++ b/constants_and_names.py @@ -8,14 +8,14 @@ ######## ######## # Model version -version = '1.2.1' +version = '1.2.2' version_filename = version.replace('.', '_') # Number of years of tree cover loss. If input loss raster is changed, this must be changed, too. -loss_years = 20 +loss_years = 21 -# Number of years in tree cover gain. If input gain raster is changed, this must be changed, too. +# Number of years in tree cover gain. If input cover gain raster is changed, this must be changed, too. gain_years = 12 # Biomass to carbon ratio for aboveground, belowground, and deadwood in non-mangrove forests (planted and non-planted) @@ -64,7 +64,7 @@ tile_width = 10 / Hansen_res tile_height = 10 / Hansen_res -# Pixel window sizes for aggregated rasters +# Pixel window sizes for rewindowed input rasters agg_pixel_window = int(tile_width * 0.004) # m2 per hectare @@ -112,7 +112,7 @@ ### Model extent ###### pattern_model_extent = 'model_extent' -model_extent_dir = os.path.join(s3_base_dir, 'model_extent/standard/20210223/') +model_extent_dir = os.path.join(s3_base_dir, 'model_extent/standard/20220309/') ###### ### Biomass tiles @@ -150,15 +150,15 @@ pixel_area_rewindow_dir = os.path.join(s3_base_dir, 'rewindow/pixel_area/20210621/') -# Spreadsheet with annual gain rates +# Spreadsheet with annual removals rates gain_spreadsheet = 'gain_rate_continent_ecozone_age_20200820.xlsx' gain_spreadsheet_dir = os.path.join(s3_base_dir, 'removal_rate_tables/') -# Annual Hansen loss tiles (2001-2020) -pattern_loss = 'GFW2020' -loss_dir = 's3://gfw2-data/forest_change/hansen_2020/' +# Annual Hansen loss tiles (2001-2021) +pattern_loss = 'GFW2021' +loss_dir = 's3://gfw2-data/forest_change/hansen_2021/' -# Hansen gain tiles (2001-2012) +# Hansen removals tiles (2001-2012) pattern_gain = 'Hansen_GFC2015_gain' gain_dir = 's3://gfw2-data/forest_change/tree_cover_gain/gaindata_2012/' pattern_gain_rewindow = 'Hansen_GFC2015_gain_rewindow' @@ -222,9 +222,9 @@ # Drivers of tree cover loss drivers_raw_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/tree_cover_loss_drivers/raw/') -pattern_drivers_raw = 'Final_Classification_2020__reproj_nearest_0-005_0-005_deg__20210323.tif' +pattern_drivers_raw = 'Final_Classification_2021__reproj_nearest_0-005_0-005_deg__20220316.tif' pattern_drivers = 'tree_cover_loss_driver_processed' -drivers_processed_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/tree_cover_loss_drivers/processed/drivers_2020/20210323/') +drivers_processed_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/tree_cover_loss_drivers/processed/drivers_2021/20220316/') # Burn year burn_area_raw_ftp = 'sftp://fuoco.geog.umd.edu/data/MODIS/C6/MCD64A1/HDF/' # per https://modis-fire.umd.edu/files/MODIS_C6_BA_User_Guide_1.3.pdf @@ -232,7 +232,7 @@ burn_year_stacked_hv_tif_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/stacked_hv_tifs/') burn_year_warped_to_Hansen_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/burn_year_10x10_clip/') pattern_burn_year = "burnyear_with_Hansen_loss" -burn_year_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/burn_year_with_Hansen_loss/20210218/') +burn_year_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/burn_year_with_Hansen_loss/20220308/') ###### ### Plantation processing @@ -274,7 +274,7 @@ # Age categories over entire model extent, as a precursor to assigning IPCC default removal rates pattern_age_cat_IPCC = 'forest_age_category_IPCC__1_young_2_mid_3_old' -age_cat_IPCC_dir = os.path.join(s3_base_dir, 'forest_age_category_IPCC/standard/20210223/') +age_cat_IPCC_dir = os.path.join(s3_base_dir, 'forest_age_category_IPCC/standard/20220309/') ### US-specific removal precursors @@ -296,70 +296,68 @@ -### Annual carbon gain rates that are precursors for composite annual removal factor +### Annual carbon and biomass removals rates for specific forest types that are precursors for composite annual removal factor -# Annual aboveground and belowground carbon gain rate for planted forests, with gain rates everywhere inside the plantation boundaries (includes mangrove pixels) +# Annual aboveground and belowground carbon removals rate for planted forests, with removals rates everywhere inside the plantation boundaries (includes mangrove pixels) pattern_annual_gain_AGC_BGC_planted_forest_unmasked = 'annual_gain_rate_AGC_BGC_t_ha_planted_forest_unmasked' annual_gain_AGC_BGC_planted_forest_unmasked_dir = os.path.join(s3_base_dir, 'annual_gain_rate_AGC_BGC_planted_forest_unmasked/standard/20200730/') -# Annual aboveground carbon gain rate for <20 year secondary, non-mangrove, non-planted natural forests (raw) +# Annual aboveground carbon removals rate for <20 year secondary, non-mangrove, non-planted natural forests (raw) name_annual_gain_AGC_natrl_forest_young_raw = 'sequestration_rate__mean__aboveground__full_extent__Mg_C_ha_yr.tif' annual_gain_AGC_natrl_forest_young_raw_URL = 'http://gfw2-data.s3.amazonaws.com/climate/carbon_seqr_AI4E/Nature_publication_final_202007/full_extent/sequestration_rate__mean__aboveground__full_extent__Mg_C_ha_yr.tif' -# Annual aboveground carbon gain rate for young (<20 year secondary), non-mangrove, non-planted natural forests +# Annual aboveground carbon removals rate for young (<20 year secondary), non-mangrove, non-planted natural forests (processed) pattern_annual_gain_AGC_natrl_forest_young = 'annual_gain_rate_AGC_t_ha_natural_forest_young_secondary' annual_gain_AGC_natrl_forest_young_dir = os.path.join(s3_base_dir, 'annual_gain_rate_AGC_natural_forest_young_secondary/standard/20200728/') -# Annual aboveground+belowground carbon gain rate for natural European forests (raw) +# Annual aboveground+belowground carbon removals rate for natural European forests (raw) name_annual_gain_AGC_BGC_natrl_forest_Europe_raw = 'annual_gain_rate_AGC_BGC_t_ha_natural_forest_raw_Europe.tif' annual_gain_AGC_BGC_natrl_forest_Europe_raw_dir = os.path.join(s3_base_dir, 'annual_gain_rate_AGC_BGC_natural_forest_Europe/raw/standard/20200722/') -# Annual aboveground+belowground carbon gain rate for natural European forests (processed tiles) +# Annual aboveground+belowground carbon removals rate for natural European forests (processed tiles) # https://www.efi.int/knowledge/maps/treespecies pattern_annual_gain_AGC_BGC_natrl_forest_Europe = 'annual_gain_rate_AGC_BGC_t_ha_natural_forest_Europe' annual_gain_AGC_BGC_natrl_forest_Europe_dir = os.path.join(s3_base_dir, 'annual_gain_rate_AGC_BGC_natural_forest_Europe/processed/standard/20200724/') -# Annual aboveground+belowground carbon gain rate for natural US forests (processed tiles) +# Annual aboveground+belowground carbon removals rate for natural US forests (processed tiles) pattern_annual_gain_AGC_BGC_natrl_forest_US = 'annual_removal_factor_AGC_BGC_Mg_ha_natural_forest_US' annual_gain_AGC_BGC_natrl_forest_US_dir = os.path.join(s3_base_dir, 'annual_gain_rate_AGC_BGC_natural_forest_US/processed/standard/20200831/') - - -### Annual biomass gain rates - -# Annual aboveground biomass gain rate for mangroves +# Annual aboveground biomass removals rate for mangroves pattern_annual_gain_AGB_mangrove = 'annual_removal_factor_AGB_Mg_ha_mangrove' annual_gain_AGB_mangrove_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGB_mangrove/standard/20200824/') -# Annual belowground biomass gain rate for mangroves +# Annual belowground biomass removals rate for mangroves pattern_annual_gain_BGB_mangrove = 'annual_removal_factor_BGB_Mg_ha_mangrove' annual_gain_BGB_mangrove_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGB_mangrove/standard/20200824/') -# Annual aboveground biomass gain rate using IPCC default removal rates +# Annual aboveground biomass removals rate using IPCC default removal rates pattern_annual_gain_AGB_IPCC_defaults = 'annual_removal_factor_AGB_Mg_ha_IPCC_defaults_all_ages' -annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20210223/') +annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20220309/') -# Annual aboveground biomass gain rate using IPCC default removal rates +# Annual aboveground biomass removals rate using IPCC default removal rates pattern_annual_gain_BGB_IPCC_defaults = 'annual_removal_factor_BGB_Mg_ha_IPCC_defaults_all_ages' -annual_gain_BGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGB_IPCC_defaults_all_ages/standard/20210223/') +annual_gain_BGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGB_IPCC_defaults_all_ages/standard/20220309/') + +### Annual composite removal factor -# Annual aboveground gain rate for all forest types +# Annual aboveground removals rate for all forest types pattern_annual_gain_AGC_all_types = 'annual_removal_factor_AGC_Mg_ha_all_forest_types' -annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_all_forest_types/standard/20210223/') +annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_all_forest_types/standard/20220309/') -# Annual belowground gain rate for all forest types +# Annual belowground removals rate for all forest types pattern_annual_gain_BGC_all_types = 'annual_removal_factor_BGC_Mg_ha_all_forest_types' -annual_gain_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGC_all_forest_types/standard/20210223/') +annual_gain_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGC_all_forest_types/standard/20220309/') -# Annual aboveground+belowground gain rate for all forest types +# Annual aboveground+belowground removals rate for all forest types pattern_annual_gain_AGC_BGC_all_types = 'annual_removal_factor_AGC_BGC_Mg_ha_all_forest_types' -annual_gain_AGC_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_BGC_all_forest_types/standard/20210223/') +annual_gain_AGC_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_BGC_all_forest_types/standard/20220309/') ### Removal forest types (sources) # Forest type used in removals model pattern_removal_forest_type = 'removal_forest_type' -removal_forest_type_dir = os.path.join(s3_base_dir, 'removal_forest_type/standard/20210223/') +removal_forest_type_dir = os.path.join(s3_base_dir, 'removal_forest_type/standard/20220309/') # Removal model forest type codes mangrove_rank = 6 @@ -370,30 +368,30 @@ old_natural_rank = 1 -### Number of years of carbon removal (gain year count) +### Number of years of carbon removal (removals year count) -# Number of gain years for all forest types +# Number of removals years for all forest types pattern_gain_year_count = 'gain_year_count_all_forest_types' -gain_year_count_dir = os.path.join(s3_base_dir, 'gain_year_count_all_forest_types/standard/20210224/') +gain_year_count_dir = os.path.join(s3_base_dir, 'gain_year_count_all_forest_types/standard/20220309/') -### Cumulative carbon dioxide removals +### Cumulative gross carbon dioxide removals # Gross aboveground removals for all forest types pattern_cumul_gain_AGCO2_all_types = 'gross_removals_AGCO2_Mg_ha_all_forest_types_2001_{}'.format(loss_years) -cumul_gain_AGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_all_forest_types/standard/per_hectare/20210224/') +cumul_gain_AGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_all_forest_types/standard/per_hectare/20220309/') # Gross belowground removals for all forest types pattern_cumul_gain_BGCO2_all_types = 'gross_removals_BGCO2_Mg_ha_all_forest_types_2001_{}'.format(loss_years) -cumul_gain_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_BGCO2_all_forest_types/standard/per_hectare/20210224/') +cumul_gain_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_BGCO2_all_forest_types/standard/per_hectare/20220309/') # Gross aboveground and belowground removals for all forest types in all pixels pattern_cumul_gain_AGCO2_BGCO2_all_types = 'gross_removals_AGCO2_BGCO2_Mg_ha_all_forest_types_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_hectare/20210224/') +cumul_gain_AGCO2_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_hectare/20220309/') # Gross aboveground and belowground removals for all forest types in pixels within forest extent pattern_cumul_gain_AGCO2_BGCO2_all_types_forest_extent = 'gross_removals_AGCO2_BGCO2_Mg_ha_all_forest_types_forest_extent_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_hectare/20210225/') +cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_hectare/20220309/') ###### @@ -429,7 +427,7 @@ ## Carbon emitted_pools in loss year # Date to include in the output directory for all emissions year carbon emitted_pools -emis_pool_run_date = '20210224' +emis_pool_run_date = '20220309' # Aboveground carbon in the year of emission for all forest types in loss pixels pattern_AGC_emis_year = "Mg_AGC_ha_emis_year" @@ -486,7 +484,7 @@ # Soil C full extent but just from SoilGrids250 (mangrove soil C layer not added in) # Not used in model. pattern_soil_C_full_extent_2000_non_mang = 'soil_C_ha_full_extent_2000_non_mangrove_Mg_ha' -soil_C_full_extent_2000_non_mang_dir = os.path.join(base_carbon_pool_dir, 'soil_carbon/intermediate_full_extent/no_mangrove/20210414/') +soil_C_full_extent_2000_non_mang_dir = os.path.join(base_carbon_pool_dir, 'soil_carbon/intermediate_full_extent/no_mangrove/20220414/') # Soil C full extent (all soil pixels, with mangrove soil C in Giri mangrove extent getting priority over mineral soil C) # Non-mangrove C is 0-30 cm, mangrove C is 0-100 cm @@ -505,7 +503,7 @@ ### Emissions from biomass and soil (all carbon emitted_pools) # Date to include in the output directory -emis_run_date_biomass_soil = '20210323' +emis_run_date_biomass_soil = '20220316' # pattern_gross_emis_commod_biomass_soil = 'gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_{}'.format(loss_years) pattern_gross_emis_commod_biomass_soil = 'gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_{}'.format(loss_years) @@ -544,7 +542,7 @@ ### Emissions from soil only # Date to include in the output directory -emis_run_date_soil_only = '20210324' +emis_run_date_soil_only = '20220318' pattern_gross_emis_commod_soil_only = 'gross_emis_commodity_Mg_CO2e_ha_soil_only_2001_{}'.format(loss_years) gross_emis_commod_soil_only_dir = '{0}gross_emissions/commodities/soil_only/standard/{1}/'.format(s3_base_dir, emis_run_date_soil_only) @@ -582,11 +580,11 @@ # Net emissions for all forest types and all carbon emitted_pools in all pixels pattern_net_flux = 'net_flux_Mg_CO2e_ha_biomass_soil_2001_{}'.format(loss_years) -net_flux_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_hectare/20210323/') +net_flux_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_hectare/20220316/') # Net emissions for all forest types and all carbon emitted_pools in forest extent pattern_net_flux_forest_extent = 'net_flux_Mg_CO2e_ha_biomass_soil_forest_extent_2001_{}'.format(loss_years) -net_flux_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_hectare/20210323/') +net_flux_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_hectare/20220316/') ### Per pixel model outputs @@ -594,37 +592,37 @@ # Gross removals per pixel in all pixels pattern_cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent = 'gross_removals_AGCO2_BGCO2_Mg_pixel_all_forest_types_full_extent_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_pixel/20210225/') +cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_pixel/20220309/') # Gross removals per pixel in forest extent pattern_cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent = 'gross_removals_AGCO2_BGCO2_Mg_pixel_all_forest_types_forest_extent_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_pixel/20210225/') +cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_pixel/20220309/') # Gross emissions per pixel in all pixels pattern_gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent = 'gross_emis_all_gases_all_drivers_Mg_CO2e_pixel_biomass_soil_full_extent_2001_{}'.format(loss_years) -gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/full_extent/per_pixel/20210323/') +gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/full_extent/per_pixel/20220316/') # Gross emissions per pixel in forest extent pattern_gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent = 'gross_emis_all_gases_all_drivers_Mg_CO2e_pixel_biomass_soil_forest_extent_2001_{}'.format(loss_years) -gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/forest_extent/per_pixel/20210323/') +gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/forest_extent/per_pixel/20220316/') # Net flux per pixel in all pixels pattern_net_flux_per_pixel_full_extent = 'net_flux_Mg_CO2e_pixel_biomass_soil_full_extent_2001_{}'.format(loss_years) -net_flux_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_pixel/20210323/') +net_flux_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_pixel/20220316/') # Net flux per pixel in forest extent pattern_net_flux_per_pixel_forest_extent = 'net_flux_Mg_CO2e_pixel_biomass_soil_forest_extent_2001_{}'.format(loss_years) -net_flux_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_pixel/20210323/') +net_flux_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_pixel/20220316/') ### 4x4 km aggregation tiles for mapping ###### -pattern_aggreg = '0_4deg_modelv{}'.format(version_filename) -pattern_aggreg_sensit_perc_diff = 'net_flux_0_4deg_modelv{}_perc_diff_std'.format(version_filename) -pattern_aggreg_sensit_sign_change = 'net_flux_0_4deg_modelv{}_sign_change_std'.format(version_filename) +pattern_aggreg = '0_04deg_modelv{}'.format(version_filename) +pattern_aggreg_sensit_perc_diff = 'net_flux_0_04deg_modelv{}_perc_diff_std'.format(version_filename) +pattern_aggreg_sensit_sign_change = 'net_flux_0_04deg_modelv{}_sign_change_std'.format(version_filename) -output_aggreg_dir = os.path.join(s3_base_dir, '0_4deg_output_aggregation/biomass_soil/standard/20210323/') +output_aggreg_dir = os.path.join(s3_base_dir, '0_04deg_output_aggregation/biomass_soil/standard/20220316/') @@ -648,7 +646,7 @@ pattern_stdev_annual_gain_AGC_BGC_planted_forest_unmasked = 'annual_gain_rate_stdev_AGC_BGC_t_ha_planted_forest_unmasked' stdev_annual_gain_AGC_BGC_planted_forest_unmasked_dir = 's3://gfw2-data/climate/carbon_model/stdev_annual_removal_factor_AGC_BGC_planted_forest_unmasked/standard/20200801/' -# Standard deviation for annual aboveground+belowground carbon gain rate for natural US forests +# Standard deviation for annual aboveground+belowground carbon removals rate for natural US forests pattern_stdev_annual_gain_AGC_BGC_natrl_forest_US = 'annual_removal_factor_stdev_AGC_BGC_Mg_ha_natural_forest_US' stdev_annual_gain_AGC_BGC_natrl_forest_US_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGC_BGC_natural_forest_US/processed/standard/20200831/') @@ -662,11 +660,11 @@ # Standard deviation for annual aboveground biomass removal factors using IPCC default removal rates pattern_stdev_annual_gain_AGB_IPCC_defaults = 'annual_removal_factor_stdev_AGB_Mg_ha_IPCC_defaults_all_ages' -stdev_annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20210223/') +stdev_annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20220309/') # Standard deviation for aboveground and belowground removal factors for all forest types pattern_stdev_annual_gain_AGC_all_types = 'annual_removal_factor_stdev_AGC_Mg_ha_all_forest_types' -stdev_annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGC_all_forest_types/standard/20210223/') +stdev_annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGC_all_forest_types/standard/20220309/') # Raw mineral soil C file site @@ -720,11 +718,11 @@ # table_US_removal_rate = 'US_removal_rates_flux_model_20200623.xlsx' # US_removal_rate_table_dir = os.path.join(s3_base_dir, 'removal_rate_tables/') # -# # Annual aboveground biomass gain rate for non-mangrove, non-planted natural forests +# # Annual aboveground biomass removals rate for non-mangrove, non-planted natural forests # pattern_US_annual_gain_AGB_natrl_forest = 'annual_gain_rate_AGB_t_ha_natural_forest_non_mangrove_non_planted_US_removals' # US_annual_gain_AGB_natrl_forest_dir = os.path.join(s3_base_dir, 'annual_gain_rate_AGB_natural_forest/US_removals/20200107/') # -# # Annual belowground biomass gain rate for non-mangrove, non-planted natural forests using US-specific removal rates +# # Annual belowground biomass removals rate for non-mangrove, non-planted natural forests using US-specific removal rates # pattern_US_annual_gain_BGB_natrl_forest = 'annual_gain_rate_BGB_t_ha_natural_forest_non_mangrove_non_planted_US_removals' # US_annual_gain_BGB_natrl_forest_dir = os.path.join(s3_base_dir, 'annual_gain_rate_BGB_natural_forest/US_removals/20200107/') diff --git a/data_prep/model_extent.py b/data_prep/model_extent.py index a50c4c55..c32709f4 100644 --- a/data_prep/model_extent.py +++ b/data_prep/model_extent.py @@ -8,6 +8,7 @@ import constants_and_names as cn import universal_util as uu +# @uu.counter def model_extent(tile_id, pattern, sensit_type, no_upload): # I don't know why, but this needs to be here and not just in mp_model_extent @@ -104,6 +105,8 @@ def model_extent(tile_id, pattern, sensit_type, no_upload): uu.print_log(" Creating model extent for {}".format(tile_id)) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tile for idx, window in windows: diff --git a/data_prep/mp_model_extent.py b/data_prep/mp_model_extent.py index 150da9e8..67a05680 100644 --- a/data_prep/mp_model_extent.py +++ b/data_prep/mp_model_extent.py @@ -22,6 +22,7 @@ sys.path.append(os.path.join(cn.docker_app,'data_prep')) import model_extent + def mp_model_extent(sensit_type, tile_id_list, run_date = None, no_upload = None): os.chdir(cn.docker_base_dir) @@ -84,9 +85,9 @@ def mp_model_extent(sensit_type, tile_id_list, run_date = None, no_upload = None if run_date is not None and no_upload is not None: output_dir_list = uu.replace_output_dir_date(output_dir_list, run_date) - # Creates a single filename pattern to pass to the multiprocessor call pattern = output_pattern_list[0] + # This configuration of the multiprocessing call is necessary for passing multiple arguments to the main function # It is based on the example here: http://spencerimp.blogspot.com/2015/12/python-multiprocess-with-multiple.html if cn.count == 96: @@ -94,7 +95,7 @@ def mp_model_extent(sensit_type, tile_id_list, run_date = None, no_upload = None processes = 38 else: processes = 45 # 30 processors = 480 GB peak (sporadic decreases followed by sustained increases); - # 36 = 550 GB peak; 40 = 590 GB peak; 42 = 631 GB peak; 45 = XXX GB peak + # 36 = 550 GB peak; 40 = 590 GB peak; 42 = 631 GB peak; 43 = 690 GB peak; 45 = too high else: processes = 3 uu.print_log('Model extent processors=', processes) diff --git a/data_prep/mp_plantation_preparation.py b/data_prep/mp_plantation_preparation.py index a86ef009..54d6f47f 100644 --- a/data_prep/mp_plantation_preparation.py +++ b/data_prep/mp_plantation_preparation.py @@ -46,13 +46,16 @@ ''' """ -### Before running this script, the plantation gdb must be converted into a PostGIS table. That's more easily done as a series -### of commands than as a script. Below are the instructions for creating a single PostGIS table of all plantations. -### This assumes that the plantation gdb has one feature class for each country with plantations and that -### each country's feature class's attribute table has a growth rate column named "growth". +Before running this script, the plantation gdb must be converted into a PostGIS table. That's more easily done as a series +of commands than as a script. Below are the instructions for creating a single PostGIS table of all plantations. +This assumes that the plantation gdb has one feature class for each country with plantations and that +each country's feature class's attribute table has a growth rate column named "growth". -# Start a r.16xlarge spot machine -spotutil new r4.16xlarge dgibbs_wri +# Start a r5d.24xlarge spot machine +spotutil new r5d.24xlarge dgibbs_wri + +# Change directory to where the data are kept in the docker +cd /usr/local/tiles/ # Copy zipped plantation gdb with growth rate field in tables aws s3 cp s3://gfw-files/plantations/final/global/plantations_v3_1.gdb.zip . @@ -60,6 +63,27 @@ # Unzip the zipped plantation gdb. This can take several minutes. unzip plantations_v3_1.gdb.zip +# Start the postgres service and check that it has actually started +service postgresql restart +pg_lsclusters + +# Create a postgres database called ubuntu. I tried adding RUN createdb ubuntu to the Dockerfile after RUN service postgresql restart +# but got the error: +# createdb: could not connect to database template1: could not connect to server: No such file or directory. +# Is the server running locally and accepting +# So I'm adding that step here. +createdb ubuntu + +# Enter the postgres database called ubuntu and add the postgis exension to it +psql +CREATE EXTENSION postgis; +\q + + +##### NOTE: I HAVEN'T ACTUALLY CHECKED THAT THE BELOW PROCEDURES REALLY RESULT IN A TABLE THAT CAN BE RASTERIZED CORRECTLY. +##### I JUST CHECKED THAT ROWS WERE BEING ADDED TO THE TABLE + + # Add the feature class of one country's plantations to PostGIS. This creates the "all_plant" table for other countries to be appended to. # Using ogr2ogr requires the PG connection info but entering the PostGIS shell (psql) doesn't. ogr2ogr -f Postgresql PG:"dbname=ubuntu" plantations_v3_1.gdb -progress -nln all_plant -sql "SELECT growth, species_simp, SD_error FROM cmr_plant" @@ -67,6 +91,8 @@ # Enter PostGIS and check that the table is there and that it has only the growth field. psql \d+ all_plant; +SELECT * FROM all_plant LIMIT 2; # To see what the first two rows look like +SELECT COUNT (*) FROM all_plant; # Should be 697 for CMR in plantations v3.1 # Delete all rows from the table so that it is now empty DELETE FROM all_plant; @@ -81,12 +107,14 @@ more out.txt q -# Run a loop in bash that iterates through all the gdb feature classes and imports them to the all_plant PostGIS table +# Run a loop in bash that iterates through all the gdb feature classes and imports them to the all_plant PostGIS table. +# I think it's okay that there's a message "Warning 1: organizePolygons() received a polygon with more than 100 parts. The processing may be really slow. You can skip the processing by setting METHOD=SKIP, or only make it analyze counter-clock wise parts by setting METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock wise defined" +# It just seems to mean that the processing is slow, but the processing methods haven't changed. while read p; do echo $p; ogr2ogr -f Postgresql PG:"dbname=ubuntu" plantations_v3_1.gdb -nln all_plant -progress -append -sql "SELECT growth, species_simp, SD_error FROM $p"; done < out.txt # Create a spatial index of the plantation table to speed up the intersections with 1x1 degree tiles psql -CREATE INDEX IF NOT EXISTS all_plant_index ON all_plant using gist(wkb_geometry); +CREATE INDEX IF NOT EXISTS all_plant_index ON all_plant using gist(shape); # Adds a new column to the table and stores the plantation type reclassified as 1 (palm), 2 (wood fiber), or 3 (other) ALTER TABLE all_plant ADD COLUMN type_reclass SMALLINT; @@ -95,10 +123,6 @@ # Exit Postgres shell \q - -# Install a Python package that is needed for certain processing routes below -sudo pip install simpledbf - """ import plantation_preparation @@ -116,7 +140,7 @@ import universal_util as uu -def mp_plantation_preparation(gadm_index_shp, planted_index_shp): +def mp_plantation_preparation(gadm_index_shp, planted_index_shp, tile_id_list, run_date = None, no_upload = None): os.chdir(cn.docker_base_dir) @@ -133,24 +157,33 @@ def mp_plantation_preparation(gadm_index_shp, planted_index_shp): if (gadm_index_path not in cn.gadm_plant_1x1_index_dir or planted_index_path not in cn.gadm_plant_1x1_index_dir): uu.exception_log('Invalid inputs. Please provide None or s3 shapefile locations for both arguments.') - # List of all possible 10x10 Hansen tiles except for those at very extreme latitudes (not just WHRC biomass tiles) - total_tile_list = uu.tile_list_s3(cn.pixel_area_dir) - uu.print_log("Number of possible 10x10 tiles to evaluate:", len(total_tile_list)) - # Removes the latitude bands that don't have any planted forests in them according to Liz Goldman. - # i.e., Liz Goldman said by Slack on 1/2/19 that the nothernmost planted forest is 69.5146 and the southernmost is -46.938968. - # This creates a more focused list of 10x10 tiles to iterate through (removes ones that definitely don't have planted forest). - # NOTE: If the planted forest gdb is updated, the list of latitudes to exclude below may need to be changed to not exclude certain latitude bands. - planted_lat_tile_list = [tile for tile in total_tile_list if '90N' not in tile] - planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '80N' not in tile] - planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '50S' not in tile] - planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '60S' not in tile] - planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '70S' not in tile] - planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '80S' not in tile] - # planted_lat_tile_list = ['10N_080W'] + # If a full model run is specified, the correct set of tiles for the particular script is listed + if tile_id_list == 'all': + # List of all possible 10x10 Hansen tiles except for those at very extreme latitudes (not just WHRC biomass tiles) + total_tile_list = uu.tile_list_s3(cn.pixel_area_dir) + + # Removes the latitude bands that don't have any planted forests in them according to Liz Goldman. + # i.e., Liz Goldman said by Slack on 1/2/19 that the nothernmost planted forest is 69.5146 and the southernmost is -46.938968. + # This creates a more focused list of 10x10 tiles to iterate through (removes ones that definitely don't have planted forest). + # NOTE: If the planted forest gdb is updated, the list of latitudes to exclude below may need to be changed to not exclude certain latitude bands. + planted_lat_tile_list = [tile for tile in total_tile_list if '90N' not in tile] + planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '80N' not in tile] + planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '50S' not in tile] + planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '60S' not in tile] + planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '70S' not in tile] + planted_lat_tile_list = [tile for tile in planted_lat_tile_list if '80S' not in tile] + uu.print_log(planted_lat_tile_list) + uu.print_log("There are {} tiles to process after extreme latitudes have been removed".format(str(len(planted_lat_tile_list))) + "\n") + else: + planted_lat_tile_list = tile_id_list + uu.print_log("There are {} tiles to process".format(str(len(planted_lat_tile_list))) + "\n") + + + ################# Stopped updating plantation processing script to use tile_id_list argument here. + ################# But I didn't check the earlier work, either. + - uu.print_log(planted_lat_tile_list) - uu.print_log("Number of 10x10 tiles to evaluate after extreme latitudes have been removed:", len(planted_lat_tile_list)) # If a planted forest extent 1x1 tile index shapefile isn't supplied @@ -357,27 +390,27 @@ def mp_plantation_preparation(gadm_index_shp, planted_index_shp): pool.join() - ### All script entry points meet here: creation of 10x10 degree planted forest gain rate and rtpe tiles - ### from 1x1 degree planted forest gain rate and type tiles + ### All script entry points meet here: creation of 10x10 degree planted forest removals rate and rtpe tiles + ### from 1x1 degree planted forest removals rate and type tiles - # Name of the vrt of 1x1 planted forest gain rate tiles + # Name of the vrt of 1x1 planted forest removals rate tiles plant_gain_1x1_vrt = 'plant_gain_1x1.vrt' - # Creates a mosaic of all the 1x1 plantation gain rate tiles - uu.print_log("Creating vrt of 1x1 plantation gain rate tiles") + # Creates a mosaic of all the 1x1 plantation removals rate tiles + uu.print_log("Creating vrt of 1x1 plantation removals rate tiles") os.system('gdalbuildvrt {} plant_gain_*.tif'.format(plant_gain_1x1_vrt)) - # Creates 10x10 degree tiles of plantation gain rate by iterating over the set of pixel area tiles supplied + # Creates 10x10 degree tiles of plantation removals rate by iterating over the set of pixel area tiles supplied # at the start of the script that are in latitudes with planted forests. # For multiprocessor use processes = 20 - uu.print_log('Create 10x10 plantation gain rate max processors=', processes) + uu.print_log('Create 10x10 plantation removals rate max processors=', processes) pool = Pool(processes) pool.map(partial(plantation_preparation.create_10x10_plantation_gain, plant_gain_1x1_vrt=plant_gain_1x1_vrt), planted_lat_tile_list) pool.close() pool.join() - # Creates 10x10 degree tiles of plantation gain rate by iterating over the set of pixel area tiles supplied + # Creates 10x10 degree tiles of plantation removals rate by iterating over the set of pixel area tiles supplied #at the start of the script that are in latitudes with planted forests. # For single processor use #for tile in planted_lat_tile_list: @@ -410,14 +443,14 @@ def mp_plantation_preparation(gadm_index_shp, planted_index_shp): - # Name of the vrt of 1x1 planted forest gain rate standard deviation tiles + # Name of the vrt of 1x1 planted forest removals rate standard deviation tiles plant_stdev_1x1_vrt = 'plant_stdev_1x1.vrt' - # Creates a mosaic of all the 1x1 plantation gain rate standard deviation tiles - uu.print_log("Creating vrt of 1x1 plantation gain rate standard deviation tiles") + # Creates a mosaic of all the 1x1 plantation removals rate standard deviation tiles + uu.print_log("Creating vrt of 1x1 plantation removals rate standard deviation tiles") os.system('gdalbuildvrt {} plant_stdev_*.tif'.format(plant_stdev_1x1_vrt)) - # Creates 10x10 degree tiles of plantation gain rate standard deviation by iterating over the set of pixel area tiles supplied + # Creates 10x10 degree tiles of plantation removals rate standard deviation by iterating over the set of pixel area tiles supplied # at the start of the script that are in latitudes with planted forests. # For multiprocessor use num_of_processes = 26 @@ -429,16 +462,22 @@ def mp_plantation_preparation(gadm_index_shp, planted_index_shp): if __name__ == '__main__': - parser = argparse.ArgumentParser(description='Create planted forest carbon gain rate tiles') + parser = argparse.ArgumentParser(description='Create planted forest carbon removals rate tiles') parser.add_argument('--gadm-tile-index', '-gi', required=True, help='Shapefile of 1x1 degree tiles of countries that contain planted forests (i.e. countries with planted forests rasterized to 1x1 deg). If no shapefile, write None.') parser.add_argument('--planted-tile-index', '-pi', required=True, help='Shapefile of 1x1 degree tiles of that contain planted forests (i.e. planted forest extent rasterized to 1x1 deg). If no shapefile, write None.') - # # This is the beginning of adding a way to have the model run on a selected area, rather than globally. I didn't finish implementing it, though. - # parser.add_argument('--bounding-box', '-bb', required=False, type=int, nargs='+', - # help='The bounding box of the tiles to be update, supplied in the order min-x, max-x, min-y, max-y. They must be at 10 degree increments.') + parser.add_argument('--tile_id_list', '-l', required=True, + help='List of tile ids to use in the model. Should be of form 00N_110E or 00N_110E,00N_120E or all.') + parser.add_argument('--run-date', '-d', required=False, + help='Date of run. Must be format YYYYMMDD.') + parser.add_argument('--no-upload', '-nu', action='store_true', + help='Disables uploading of outputs to s3') args = parser.parse_args() + tile_id_list = args.tile_id_list + run_date = args.run_date + no_upload = args.no_upload # Creates the directory and shapefile names for the two possible arguments (index shapefiles) gadm_index = os.path.split(args.gadm_tile_index) @@ -450,7 +489,16 @@ def mp_plantation_preparation(gadm_index_shp, planted_index_shp): planted_index_shp = planted_index[1] planted_index_shp = planted_index_shp[:-4] + # Disables upload to s3 if no AWS credentials are found in environment + if not uu.check_aws_creds(): + no_upload = True + # Create the output log - uu.initiate_log() + uu.initiate_log(tile_id_list=tile_id_list, sensit_type=sensit_type, run_date=run_date, no_upload=no_upload) + + # Checks whether the sensitivity analysis and tile_id_list arguments are valid + uu.check_sensit_type(sensit_type) + tile_id_list = uu.tile_id_list_check(tile_id_list) - mp_plantation_preparation(gadm_index_shp=gadm_index_shp, planted_index_shp=planted_index_shp) + mp_plantation_preparation(gadm_index_shp=gadm_index_shp, planted_index_shp=planted_index_shp, + tile_id_list=tile_id_list, run_date=run_date, no_upload=no_upload) diff --git a/data_prep/mp_prep_other_inputs.py b/data_prep/mp_prep_other_inputs.py index 52208625..de38b6f2 100644 --- a/data_prep/mp_prep_other_inputs.py +++ b/data_prep/mp_prep_other_inputs.py @@ -144,7 +144,7 @@ def mp_prep_other_inputs(tile_id_list, run_date, no_upload = None): # processes = 80 # 32 processors = 210 GB peak; 60 = 370 GB peak; 80 = XXX GB peak # else: # processes = int(cn.count/2) - # uu.print_log("Creating young natural forest gain rate tiles with {} processors...".format(processes)) + # uu.print_log("Creating young natural forest removals rate tiles with {} processors...".format(processes)) # pool = multiprocessing.Pool(processes) # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt, no_upload=no_upload), tile_id_list) # pool.close() @@ -196,7 +196,7 @@ def mp_prep_other_inputs(tile_id_list, run_date, no_upload = None): # processes = 60 # 32 processors = 60 GB peak; 60 = XXX GB peak # else: # processes = int(cn.count/2) - # uu.print_log("Creating European natural forest gain rate tiles with {} processors...".format(processes)) + # uu.print_log("Creating European natural forest removals rate tiles with {} processors...".format(processes)) # pool = multiprocessing.Pool(processes) # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt, no_upload=no_upload), tile_id_list) # pool.close() @@ -210,7 +210,7 @@ def mp_prep_other_inputs(tile_id_list, run_date, no_upload = None): # processes = 32 # 32 processors = 60 GB peak; 60 = XXX GB peak # else: # processes = int(cn.count/2) - # uu.print_log("Creating standard deviation for European natural forest gain rate tiles with {} processors...".format(processes)) + # uu.print_log("Creating standard deviation for European natural forest removals rate tiles with {} processors...".format(processes)) # pool = multiprocessing.Pool(processes) # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt, no_upload=no_upload), tile_id_list) # pool.close() @@ -342,7 +342,7 @@ def mp_prep_other_inputs(tile_id_list, run_date, no_upload = None): if __name__ == '__main__': parser = argparse.ArgumentParser( - description='Create tiles of the annual AGB and BGB gain rates for mangrove forests') + description='Create tiles of the annual AGB and BGB removals rates for mangrove forests') parser.add_argument('--tile_id_list', '-l', required=True, help='List of tile ids to use in the model. Should be of form 00N_110E or 00N_110E,00N_120E or all.') parser.add_argument('--run-date', '-d', required=False, diff --git a/ec2_launch_template_startup_instructions.TXT b/ec2_launch_template_startup_instructions.TXT index a7427b46..136f5d18 100644 --- a/ec2_launch_template_startup_instructions.TXT +++ b/ec2_launch_template_startup_instructions.TXT @@ -2,16 +2,55 @@ yum install -y rsync git nano htop tmux ####################################### -# Mount the ephemeral storage +# Mount the ephemeral SSD storage. +# If four volumes exist, merge them and mount jointly. +# If only two volumes exist, merge them and mount jointly. +# If only one volume exists, just mount that. +# Regardless of how many volumes there are, the name of the folder they are mounted to is the same. ####################################### -mkfs.ext4 /dev/nvme1n1 -mkdir -p /mnt/ext -mount -t ext4 /dev/nvme1n1 /mnt/ext +# Name of the second volume (if it exists) +SSD2=nvme2n1 + +# Name of the fourth volume (if it exists) +SSD4=nvme4n1 + +# Checks if the second volume exists +CHECK2=$(lsblk -l | grep $SSD2) + +# Checks if the fourth volume exists +CHECK4=$(lsblk -l | grep $SSD4) + +# Checks for four volumes first, then for two volumes +if [[ $CHECK4 ]] +then + # Fourth SSD volume found + # Follows https://objectivefs.com/howto/how-to-raid-ec2-instance-stores + sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --chunk=64 --raid-devices=4 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 #requires superuser status to use sudo, and doesn't work without sudo + sudo mkfs.ext4 -L MY_RAID /dev/md0 #requires sudo to determine file system size + sudo mkdir -p /mnt/ext # doesn’t need sudo but adding for consistency + sudo mount -t ext4 /dev/md0 /mnt/ext # needs sudo because only root can use --types option + +elif [[ $CHECK2 ]] +then + # Second SSD volume found + # Follows https://objectivefs.com/howto/how-to-raid-ec2-instance-stores + sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --chunk=64 --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1 #requires superuser status to use sudo, and doesn't work without sudo + sudo mkfs.ext4 -L MY_RAID /dev/md0 #requires sudo to determine file system size + sudo mkdir -p /mnt/ext # doesn’t need sudo but adding for consistency + sudo mount -t ext4 /dev/md0 /mnt/ext # needs sudo because only root can use --types option + +else + # Only one SSD volume + mkfs.ext4 /dev/nvme1n1 + mkdir -p /mnt/ext + mount -t ext4 /dev/nvme1n1 /mnt/ext +fi + # make temp directory for containers usage # should be used in the Batch job definition (MountPoints) mkdir /mnt/ext/tmp -rsync -avPHSX /tmp/ /mnt/ext/tmp/ +rsync -avPHSX /tmp/ /mnt/ext/tmp/ mkdir -p /var/lib/docker mkdir -p /mnt/ext/docker @@ -50,7 +89,7 @@ cd /home/ec2-user/carbon-budget/ sudo service docker start ###################################### -# Gives the user (ec2-user) various permissions, such as ability to git pull. +# Gives the user (ec2-user) various permissions, such as ability to git pull and enter the docker container. #Based on https://techoverflow.net/2019/05/07/how-to-fix-git-error-cannot-open-git-fetch_head-permission-denied/ ###################################### cd / diff --git a/emissions/calculate_gross_emissions.py b/emissions/calculate_gross_emissions.py index 91992ce6..f4fa95c8 100644 --- a/emissions/calculate_gross_emissions.py +++ b/emissions/calculate_gross_emissions.py @@ -15,6 +15,8 @@ def calc_emissions(tile_id, emitted_pools, sensit_type, folder, no_upload): start = datetime.datetime.now() + uu.check_memory() + # Runs the correct c++ script given the emitted_pools (biomass+soil or soil_only) and model type selected. # soil_only, no_shiftin_ag, and convert_to_grassland have special gross emissions C++ scripts. # The other sensitivity analyses and the standard model all use the same gross emissions C++ script. diff --git a/emissions/cpp_util/calc_gross_emissions_generic.cpp b/emissions/cpp_util/calc_gross_emissions_generic.cpp index eae2a292..29fc4599 100644 --- a/emissions/cpp_util/calc_gross_emissions_generic.cpp +++ b/emissions/cpp_util/calc_gross_emissions_generic.cpp @@ -848,7 +848,7 @@ for(x=0; x 0) // Forestry, not peat, burned { diff --git a/emissions/cpp_util/constants.h b/emissions/cpp_util/constants.h index 0b4e2dc1..d74e76a4 100644 --- a/emissions/cpp_util/constants.h +++ b/emissions/cpp_util/constants.h @@ -5,7 +5,7 @@ namespace constants { // Emissions constants // per https://www.learncpp.com/cpp-tutorial/global-constants-and-inline-variables/ - constexpr int model_years {20}; // How many loss years are in the model. Must also be updated in equations.cpp! + constexpr int model_years {21}; // How many loss years are in the model. Must also be updated in equations.cpp! constexpr int CH4_equiv {28}; // The CO2 equivalency (global warming potential) of CH4 @@ -34,7 +34,7 @@ namespace constants constexpr char legal_Amazon_loss[] = "_legal_Amazon_annual_loss_2001_2019.tif"; - constexpr char lossyear[] = "GFW2020_"; + constexpr char lossyear[] = "GFW2021_"; constexpr char burnyear[] = "_burnyear_with_Hansen_loss.tif"; constexpr char fao_ecozones[] = "_fao_ecozones_bor_tem_tro_processed.tif"; constexpr char climate_zones[] = "_climate_zone_processed.tif"; diff --git a/emissions/cpp_util/equations.cpp b/emissions/cpp_util/equations.cpp index 6f93b47d..a1efc159 100644 --- a/emissions/cpp_util/equations.cpp +++ b/emissions/cpp_util/equations.cpp @@ -17,7 +17,7 @@ void def_variables(float *q, int ecozone, int forestmodel_data, int ifl, int cli { int model_years; // How many loss years are in the model - model_years = 20; + model_years = 21; int tropical; // The ecozone code for the tropics tropical = 1; diff --git a/emissions/mp_calculate_gross_emissions.py b/emissions/mp_calculate_gross_emissions.py index 55b5caf1..a03124d2 100644 --- a/emissions/mp_calculate_gross_emissions.py +++ b/emissions/mp_calculate_gross_emissions.py @@ -173,7 +173,6 @@ def mp_calculate_gross_emissions(sensit_type, tile_id_list, emitted_pools, run_d if run_date is not None and no_upload is not None: output_dir_list = uu.replace_output_dir_date(output_dir_list, run_date) - uu.print_log(output_dir_list) uu.print_log(output_pattern_list) diff --git a/pg_hba.conf b/pg_hba.conf new file mode 100644 index 00000000..72ce3f5c --- /dev/null +++ b/pg_hba.conf @@ -0,0 +1,99 @@ +# PostgreSQL Client Authentication Configuration File +# =================================================== +# +# Refer to the "Client Authentication" section in the PostgreSQL +# documentation for a complete description of this file. A short +# synopsis follows. +# +# This file controls: which hosts are allowed to connect, how clients +# are authenticated, which PostgreSQL user names they can use, which +# databases they can access. Records take one of these forms: +# +# local DATABASE USER METHOD [OPTIONS] +# host DATABASE USER ADDRESS METHOD [OPTIONS] +# hostssl DATABASE USER ADDRESS METHOD [OPTIONS] +# hostnossl DATABASE USER ADDRESS METHOD [OPTIONS] +# +# (The uppercase items must be replaced by actual values.) +# +# The first field is the connection type: "local" is a Unix-domain +# socket, "host" is either a plain or SSL-encrypted TCP/IP socket, +# "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a +# plain TCP/IP socket. +# +# DATABASE can be "all", "sameuser", "samerole", "replication", a +# database name, or a comma-separated list thereof. The "all" +# keyword does not match "replication". Access to replication +# must be enabled in a separate record (see example below). +# +# USER can be "all", a user name, a group name prefixed with "+", or a +# comma-separated list thereof. In both the DATABASE and USER fields +# you can also write a file name prefixed with "@" to include names +# from a separate file. +# +# ADDRESS specifies the set of hosts the record matches. It can be a +# host name, or it is made up of an IP address and a CIDR mask that is +# an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that +# specifies the number of significant bits in the mask. A host name +# that starts with a dot (.) matches a suffix of the actual host name. +# Alternatively, you can write an IP address and netmask in separate +# columns to specify the set of hosts. Instead of a CIDR-address, you +# can write "samehost" to match any of the server's own IP addresses, +# or "samenet" to match any address in any subnet that the server is +# directly connected to. +# +# METHOD can be "trust", "reject", "md5", "password", "scram-sha-256", +# "gss", "sspi", "ident", "peer", "pam", "ldap", "radius" or "cert". +# Note that "password" sends passwords in clear text; "md5" or +# "scram-sha-256" are preferred since they send encrypted passwords. +# +# OPTIONS are a set of options for the authentication in the format +# NAME=VALUE. The available options depend on the different +# authentication methods -- refer to the "Client Authentication" +# section in the documentation for a list of which options are +# available for which authentication methods. +# +# Database and user names containing spaces, commas, quotes and other +# special characters must be quoted. Quoting one of the keywords +# "all", "sameuser", "samerole" or "replication" makes the name lose +# its special character, and just match a database or username with +# that name. +# +# This file is read on server startup and when the server receives a +# SIGHUP signal. If you edit the file on a running system, you have to +# SIGHUP the server for the changes to take effect, run "pg_ctl reload", +# or execute "SELECT pg_reload_conf()". +# +# Put your actual configuration here +# ---------------------------------- +# +# If you want to allow non-local connections, you need to add more +# "host" records. In that case you will also need to make PostgreSQL +# listen on a non-local interface via the listen_addresses +# configuration parameter, or via the -i or -h command line switches. + + + + +# DO NOT DISABLE! +# If you change this first entry you will need to make sure that the +# database superuser can access the database using some other method. +# Noninteractive access to all databases is required during automatic +# maintenance (custom daily cronjobs, replication, and similar tasks). +# +# Database administrative login by Unix domain socket +local all postgres trust + +# TYPE DATABASE USER ADDRESS METHOD + +# "local" is for Unix domain socket connections only +local all all peer +# IPv4 local connections: +host all all 127.0.0.1/32 md5 +# IPv6 local connections: +host all all ::1/128 md5 +# Allow replication connections from localhost, by a user with the +# replication privilege. +local replication all peer +host replication all 127.0.0.1/32 md5 +host replication all ::1/128 md5 diff --git a/readme.md b/readme.md index edf9dc65..31acbc70 100644 --- a/readme.md +++ b/readme.md @@ -3,8 +3,9 @@ ### Purpose and scope This model maps gross annual greenhouse gas emissions from forests, gross carbon removals (sequestration) by forests, and the difference between them -(net flux), all between 2001 and 2020. -Gross emissions includes CO2, NH4, and N20 and all carbon pools, and gross removals includes removals into aboveground and belowground biomass carbon. +(net flux), all between 2001 and 2021. +Gross emissions includes CO2, NH4, and N20 and all carbon pools (abovegroung biomass, belowground biomass, +dead wood, litter, and soil), and gross removals includes removals into aboveground and belowground biomass carbon. Although the model is run for all tree canopy densities (per Hansen et al. 2013), it is most relevant to pixels with canopy density >30% in 2000 or pixels which subsequently had tree cover gain (per Hansen et al. 2013). It covers planted forests in most of the world, mangroves, and non-mangrove natural forests, and excludes palm oil plantations that existed more than 20 years ago. @@ -12,12 +13,12 @@ It essentially spatially applies IPCC national greenhouse gas inventory rules (2 It covers only forests converting to non-forests, non-forests converted to forests and forests remaining forests (no other land use transitions). The model is described and published in Harris et al. (2021) Nature Climate Change "Global maps of twenty-first century forest carbon fluxes" (https://www.nature.com/articles/s41558-020-00976-6). -Although the published model covered 2001-2019, the same methods were used to update the model to include 2020. +Although the published model covered 2001-2019, the same methods were used to update the model to include 2021. ### Inputs Well over twenty inputs are needed to run this model. Most are spatial, but some are tabular. -All spatial data are converted to 10 x 10 degree raster tiles at 0.00025 x 0.00025 degree resolution -(approximately 30 x 30 m at the equator) before inclusion in the model. The tabular data are generally annual biomass removal (i.e. +All spatial data are converted to 10x10 degree raster tiles at 0.00025x0.00025 degree resolution +(approximately 30x30 m at the equator) before inclusion in the model. The tabular data are generally annual biomass removal (i.e. sequestration) factors (e.g., mangroves, planted forests, natural forests), which are then applied to spatial data. Spatial data include annual tree cover loss, biomass densities in 2000, drivers of tree cover loss, ecozones, tree cover extent in 2000, elevation, etc. Different inputs are needed for different @@ -26,16 +27,17 @@ Many inputs can be processed the same way (e.g., many rasters can be processed u The input processing scripts are scattered among almost all the folders, unfortunately, a historical legacy of how I built this out which I haven't fixed. The data prep scripts are generally in the folder for which their outputs are most relevant. -Inputs can either be downloaded from AWS s3 storage or used if found locally in the folder `/usr/local/tiles/` in the Docker container. +Inputs can either be downloaded from AWS s3 storage or used if found locally in the folder `/usr/local/tiles/` in the Docker container +(see below for more on the Docker container). The model looks for files locally before downloading them. The model can still be run without AWS credentials; inputs will be downloaded from s3 but outputs will not be uploaded to s3. In that case, outputs will only be stored locally. ### Outputs -There are three key outputs produced: gross GHG emissions, gross removals, and net flux, all totaled for 2001-2020. -These are produced at two resolutions: 0.00025 x 0.00025 degrees -(approximately 30 x 30 m at the equator) in 10 x 10 degree rasters (to make outputs a -manageable size), and 0.04 x 0.04 degrees (approximately 4 x 4 km at the equator) as global rasters. +There are three key outputs produced: gross GHG emissions, gross removals, and net flux, all totaled for 2001-2021. +These are produced at two resolutions: 0.00025x0.00025 degrees +(approximately 30x30 m at the equator) in 10x10 degree rasters (to make outputs a +manageable size), and 0.04x0.04 degrees (approximately 4x4km at the equator) as global rasters for static maps. Model runs also automatically generate a txt log. This log includes nearly everything that is output in the console. This log is useful for documenting model runs and checking for mistakes/errors in retrospect, although it does not capture errors that terminate the model. @@ -47,13 +49,13 @@ When either of these happens, neither raster outputs nor logs are uploaded to s3 of the model that are independent of s3 (that is, inputs are stored locally and no on s3, and the user does not have a connection to s3 storage or s3 credentials). -#### 30-m outputs +#### 30-m output rasters The 30-m outputs are used for zonal statistics analyses (i.e. emissions, removals, or net in polygons of interest) and mapping on the Global Forest Watch web platform or at small scales (where 30-m pixels can be distinguished). Individual emissions can be assigned years based on Hansen loss during further analyses but removals and net flux are cumulative over the entire model run and cannot be assigned specific years. -This 30-m output is in megagrams (Mg) CO2e/ha 2001-2020 (i.e. densities) and includes all tree cover densities ("full extent"): +This 30-m output is in megagrams (Mg) CO2e/ha 2001-2021 (i.e. densities) and includes all tree cover densities ("full extent"): `(((TCD2000>0 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)`. However, the model is designed to be used specifically for forests, so the model creates three derivative 30-m outputs for each key output (gross emissions, gross removals, net flux) as well @@ -61,30 +63,31 @@ outputs for each key output (gross emissions, gross removals, net flux) as well 1) Per pixel values for the full model extent (all tree cover densities): `(((TCD2000>0 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)` -2) Per hectare values for forest pixels only: +2) Per hectare values for forest pixels only (colloquially, TCD>30 or Hansen gain pixels): `(((TCD2000>30 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)` -3) Per pixel values for forest pixels only: +3) Per pixel values for forest pixels only (colloquially, TCD>30 or Hansen gain pixels): `(((TCD2000>30 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)` The per hectare outputs are used for making pixel-level maps (essentially showing emission and removal factors), while the per pixel outputs are used for getting total values within areas because the values of those pixels can be summed within areas of interest. The per pixel maps are `per hectare * pixel area/10000`. (The pixels of the per hectare outputs should not be summed but they can be averaged in areas of interest.) -Statistics from this model are always based on the "forest extent" rasters, not the "full extent" rasters. +Statistics from this model should always be based on the "forest extent" rasters, not the "full extent" rasters. The full model extent outputs should generally not be used but are created by the model in case they are needed. In addition to these three key outputs, there are many intermediate output rasters from the model, -some of which may be useful for QC, analyses by area of interest, or something else. -All of these are at 0.00025 x 0.00025 degree resolution and reported as per hectare values (as opposed to per pixel values), if applicable. +some of which may be useful for QC, analyses by area of interest, or other purposes. +All of these are at 0.00025x0.00025 degree resolution and reported as per hectare values (as opposed to per pixel values), if applicable. Intermediate outputs include the annual aboveground and belowground biomass removal rates for all kinds of forests, the type of removal factor applied to each pixel, the carbon pool densities in 2000, carbon pool densities in the year of tree cover loss, and the number of years in which removals occurred. Almost all model output have metadata associated with them, viewable using the `gdalinfo` command line utility (https://gdal.org/programs/gdalinfo.html). -Metadata includes units, date created, model version, geographic extent, and more. Unfortunately, the metadata are not viewable in ArcMap +Metadata includes units, date created, model version, geographic extent, and more. Unfortunately, the metadata are not viewable +when looking at file properties in ArcMap or in the versions of these files downloadable from the Global Forest Watch Open Data Portal (https://data.globalforestwatch.org/). -#### 4-km outputs +#### 4-km output rasters The 4-km outputs are used for static large-scale maps, like in publications and presentations. The units are Mt CO2e/pixel/year (in order to show absolute values). They are created using the "forest extent" @@ -94,13 +97,14 @@ per pixel 30-m rasters, not the "full extent" 30-m rasters. They should not be u Although gross emissions are traditionally given positive (+) values and gross removals are traditionally given negative (-) values, the 30-m gross removals rasters are positive, while the 4-km gross removals rasters are negative. -Net flux at both scales can be positive or negative depending on the balance of emissions and removals in the area of interest. +Net flux at both scales can be positive or negative depending on the balance of emissions and removals in the area of interest +(negative for net sink, positive for net source). ### Running the model The model runs from the command line inside a Linux Docker container. Once you have Docker configured on your system, have cloned this repository, -and have configured access to AWS (if desired, or have the input files stored locally), +and have configured access to AWS (if desired, or have the input files stored in the correct local folder), you will be able to run the model. There are two ways to run the model: as a series of individual scripts, or from a master script, which runs the individual scripts sequentially. @@ -108,7 +112,7 @@ Which one to use depends on what you are trying to do. Generally, the individual more appropriate for development and testing, while the master script is better for running the main part of the model from start to finish in one go. In either case, the code must be cloned from this repository (on the command line in the folder you want to clone to, `git clone https://github.com/wri/carbon-budget`). -Run globally, both options iterate through a list of ~275 10x10 degree tiles. (Different model stages have different numbers of tiles.) +Run globally, both options iterate through a list of ~275 10 x 10 degree tiles. (Different model stages have different numbers of tiles.) Run all tiles in the model extent fully through one model stage before starting on the next stage. (The master script does this automatically.) If a user wants to run the model on just one or a few tiles, that can be done through a command line argument (`--tile-id-list` or `-l`). @@ -120,27 +124,29 @@ command line argument would be `-l 10S_040E,10S_050E,20S_040E`. You can do the following on the command line in the same folder as the repository on your system. This will enter the command line in the Docker container -For runs on a local computer, use `docker-compose` so that the Docker is mapped to my computer's drives. -I do this for development and testing. If running on another computer, you will need to change the local +For runs on a local computer, use `docker-compose` so that the Docker is mapped to your computer's drives. +In my setup, `C:/GIS/Carbon_model/test_tiles/docker_output/` on my computer is mapped to `/usr/local/tiles` in +the Docker container in `docker-compose.yaml`. If running on another computer, you will need to change the local folder being mapped in `docker-compose.yaml` to match your computer's directory structure. +I do this for development and testing. If you want the model to be able to download from and upload to s3, you will also need to provide -your own AWS secret key and access key as environment variables (`-e`) in the `docker-compose run` command. +your own AWS secret key and access key as environment variables (`-e`) in the `docker-compose run` command: `docker-compose build` `docker-compose run --rm -e AWS_SECRET_ACCESS_KEY=... -e AWS_ACCESS_KEY_ID=... carbon-budget` -If you don't have AWS credentials, you can still run the model in the docker container but downloads and uploads will +If you don't have AWS credentials, you can still run the model in the docker container but uploads will not occur. In this situation, you need all the basic input files for all tiles in the docker folder `/usr/local/tiles/` -on your computer. +on your computer: `docker-compose build` `docker-compose run --rm carbon-budget` For runs on an AWS r5d spot machine (for full model runs), use `docker build`. -You need to supply AWS credentials for the model to work because otherwise you won't be able to get input tiles onto -the spot machine or output tiles out of the spot machine. +You need to supply AWS credentials for the model to work because otherwise you won't be able to get +output tiles off of the spot machine. `docker build . -t gfw/carbon-budget` @@ -168,8 +174,8 @@ Users can track memory usage in realtime using the `htop` command line utility i The flux model is comprised of many separate scripts (or stages), each of which can be run separately and has its own inputs and output(s). Combined, these comprise the flux model. There are several data preparation scripts, several for the removals (sequestration/gain) model, a few to generate carbon pools, one for calculating -gross emissions, one for calculating net flux, and one for aggregating key results into coarser -resolution rasters for mapping. +gross emissions, one for calculating net flux, one for aggregating key results into coarser +resolution rasters for mapping, and one for creating per-pixel and forest-extent outputs (supplementary outputs). Each script really has two parts: its `mp_` (multiprocessing) part and the part that actually does the calculations on each 10x10 degree tile. The `mp_` scripts (e.g., `mp_create_model_extent.py`) are the ones that are run. They download input files, @@ -180,23 +186,19 @@ the outputs of other scripts. Looking at the files that must be downloaded for t script to run will show what files must already be created and therefore what scripts must have already been run. Alternatively, you can look at the top of `run_full_model.py` to see the order in which model stages are run. The date component of the output directory on s3 generally must be changed in `constants_and_names.py` -for each output file unless a date argument is provided on the command line. +for each output file. -Each script can be run either using multiple processors or one processor. The former is for full model runs, -while the latter is for model development. The user can switch between these two versions by commenting out -the appropriate code chunks. #### Master script The master script runs through all of the non-preparatory scripts in the model: some removal factor creation, gross removals, carbon -pool generation, gross emissions, net flux, and aggregation. It includes all the arguments needed to run +pool generation, gross emissions, net flux, aggregation, and supplementary output creation. +It includes all the arguments needed to run every script. Thus, the table below also explains the potential arguments for the individual model stages. The user can control what model components are run to some extent and set the date part of the output directories. The emissions C++ code has to be be compiled before running the master script (see below). Preparatory scripts like creating soil carbon tiles or mangrove tiles are not included in the master script because they are run very infrequently. -`python run_full_model.py -t std -s all -r -d 20219999 -l all -ce loss -p biomass_soil -tcd 30 -ma -us -si -ln "This will run the entire standard model, including creating mangrove and US removal factor tiles, on all tiles and output everything in s3 folders with the date 20219999. It will save intermediate files."` - | Argument | Short argument | Required/Optional | Relevant stage | Description | | -------- | ----- | ----------- | ------- | ------ | | `model-type` | `-t` | Required | All | Standard model (`std`) or a sensitivity analysis. Refer to `constants_and_names.py` for valid list of sensitivity analyses. | @@ -205,6 +207,7 @@ they are run very infrequently. | `run-date` | `-d` | Required | All | Date of run. Must be format YYYYMMDD. This sets the output folder in s3. | | `tile-id-list` | `-l` | Required | All | List of tile ids to use in the model. Should be of form 00N_110E or 00N_110E,00N_120E or all | | `no-upload` | `-nu` | Optional | All | No files are uploaded to s3 during or after model run (including logs and model outputs). Use for testing to save time. When AWS credentials are not available, upload is automatically disabled and this flag does not have to be manually activated. | +| `save-intermdiates` | `-si`| Optional | `run_full_model.py` | Intermediate outputs are not deleted within `run_full_model.py`. Use for local model runs. If uploading to s3 is not enabled, intermediate files are automatically saved. | | `log-note` | `-ln`| Optional | All | Adds text to the beginning of the log | | `carbon-pool-extent` | `-ce` | Optional | Carbon pool creation | Extent over which carbon pools should be calculated: loss or 2000 or loss,2000 or 2000,loss | | `pools-to-use` | `-p` | Optional | Emissions| Options are soil_only or biomass_soil. Former only considers emissions from soil. Latter considers emissions from biomass and soil. | @@ -212,7 +215,41 @@ they are run very infrequently. | `std-net-flux-aggreg` | `-std` | Optional | Aggregation | The s3 standard model net flux aggregated tif, for comparison with the sensitivity analysis map. | | `mangroves` | `-ma` | Optional | `run_full_model.py` | Create mangrove removal factor tiles as the first stage. Activate with flag. | | `us-rates` | `-us` | Optional | `run_full_model.py` | Create US-specific removal factor tiles as the first stage (or second stage, if mangroves are enabled). Activate with flag. | -| `save-intermdiates` | `-si`| Optional | `run_full_model.py` | Intermediate outputs are not deleted within `run_full_model.py`. Use for local model runs. If uploading to s3 is not enabled, intermediate files are automatically saved. | + +These are some sample commands for running the flux model in various configurations. You wouldn't necessarily want to use all of these; +they simply illustrate different configurations for the command line arguments. + +Run 00N_000E in standard model; save intermediate outputs; upload outputs to s3; run all model stages; +starting from the beginning; get carbon pools at time of loss; emissions from biomass and soil: + +`python run_full_model.py -si -t std -s all -r -d 20229999 -l 00N_000E -ce loss -p biomass_soil -tcd 30 -ln "00N_000E test"` + +Run 00N_110E in standard model; save intermediate outputs; don't upload outputs to s3; +start at forest_age_category_IPCC step; run all stages after that; get carbon pools at time of loss; emissions from biomass and soil: + +`python run_full_model.py -si -nu -t std -s forest_age_category_IPCC -r -d 20229999 -l 00N_000E -ce loss -p biomass_soil -tcd 30 -ln "00N_000E test"` + +Run 00N_000E and 00N_110E in standard model; don't save intermediate outputs; do upload outputs to s3; +run model_extent step; don't run sunsequent steps (no `-r` flag); run mangrove step beforehand: + +`python run_full_model.py -t std -s model_extent -d 20229999 -l 00N_000E,00N_110E -ma -ln "Two tile test"` + +Run 00N_000E, 00N_110E, and 30N_090W in standard model; save intermediate outputs; do upload outputs to s3; +start at gross_emissions step; run all stages after that; emissions from soil only: + +`python run_full_model.py -si -t std -s gross_emissions -r -d 20229999 -l 00N_000E,00N_110E,30N_090W -p soil_only -tcd 30 -ln "Three tile test"` + +FULL STANDARD MODEL RUN: Run all tiles in standard model; save intermediate outputs; do upload outputs to s3; +run all model stages; starting from the beginning; get carbon pools at time of loss; emissions from biomass and soil: + +`python run_full_model.py -si -t std -s all -r -l all -ce loss -p biomass_soil -tcd 30 -ln "Run all tiles"` + +Run three tiles in biomass_swap sensitivity analysis; don't upload intermediates (forces saving of intermediate outputs); +run model_extent stage; don't continue after that stage (no run-through); get carbon pools at time of loss; emissions from biomass and soil; +compare aggregated outputs to specified file (although not used in this specific launch because only the first step runs): + +`python run_full_model.py -nu -t biomass_swap -s model_extent -r false -d 20229999 -l 00N_000E,00N_110E,40N_90W -ce loss -p biomass_soil -tcd 30 -sagg s3://gfw2-data/climate/carbon_model/0_04deg_output_aggregation/biomass_soil/standard/20200914/net_flux_Mt_CO2e_biomass_soil_per_year_tcd30_0_4deg_modelv1_2_0_std_20200914.tif -ln "Multi-tile test"` + ##### Running the emissions model The gross emissions script is the only part of the model that uses C++. Thus, it must be manually compiled before running. @@ -252,34 +289,38 @@ model with a new year of tree cover loss data. In the order in which the changes 1) Update the model version variable `version` in `constants_and_names.py`. -2) Change the tree cover loss tile source to the new tree cover loss tiles in `constants_and_names.py`. If the tree cover -loss tile pattern is different from the previous year, that will need to be changed in several places in various scripts, unfortunately. +2) Change the tree cover loss tile source to the new tree cover loss tiles in `constants_and_names.py`. +Change the tree cover loss tile pattern in `constants_and_names.py`. 3) Change the number of loss years variable `loss_years` in `constants_and_names.py`. -4) Make sure that changes in forest age category produced by `mp_forest_age_category_IPCC.py` - and the number of gain years produced by `mp_gain_year_count_all_forest_types.py` still make sense. +4) In `constants.h` (emissions/cpp_util/), change the number of model years (`int model_years`) and the loss tile pattern (`char lossyear[]`). -5) Obtain and pre-process the updated drivers of tree cover loss model in `mp_prep_other_inputs.py`. +5) In `equations.cpp` (emissions/cpp_util/), change the number of model years (`int model_years`). -6) Create a new year of burned area data using `mp_burn_year.py` (multiple changes to script needed, and potentially - some reworking if the burned area ftp site has changed its structure or download protocol). +6) Make sure that changes in forest age category produced by `mp_forest_age_category_IPCC.py` + and the number of gain years produced by `mp_gain_year_count_all_forest_types.py` still make sense. -7) In `constants.h`, change the number of model years. +7) Obtain and pre-process the updated drivers of tree cover loss model in `mp_prep_other_inputs.py` + (comment out everything except the drivers lines). Note that the drivers map probably needs to be reprojected to WGS84 + and resampled (0.005x0.005 deg) in ArcMap or similar before processing into 0.00025x0.00025 deg 10x10 tiles using this script. -8) In `equations.cpp`, change the number of model years. +8) Create a new year of burned area data using `mp_burn_year.py` (multiple changes to script needed, and potentially + some reworking if the burned area ftp site has changed its structure or download protocol). + Further instructions are at the top of `burn_date/mp_burn_year.py`. Strictly speaking, if only the drivers, burn year, and tree cover loss are being updated, the model only needs to be run from forest_age_category_IPCC onwards (loss affects IPCC age category but model extent isn't affected by any of these inputs). However, for completeness, I suggest running all stages of the model from model_extent onwards for an update so that model outputs from all stages have the same version in their metadata and the same dates of output as the model stages -that are actually being changed. +that are actually being changed. A full model run (all tiles, all stages) takes about 18 hours on an r5d.24xlarge +EC2 instance with 3.7 TB of storage and 96 processors. ### Other modifications to the model It is recommended that any changes to the model be tested in a local Docker instance before running on an EC2 instance. -I like to output files to test folders on s3 with dates 20219999 because that is clearly not a real run date. +I like to output files to test folders on s3 with dates 20229999 because that is clearly not a real run date. A standard development route is: 1) Make changes to a single model script and run using the single processor option on a single tile (easiest for debugging) in local Docker. diff --git a/gain/.gitignore b/removals/.gitignore similarity index 100% rename from gain/.gitignore rename to removals/.gitignore diff --git a/gain/US_removal_rates.py b/removals/US_removal_rates.py similarity index 90% rename from gain/US_removal_rates.py rename to removals/US_removal_rates.py index 656cab93..116f2bb5 100644 --- a/gain/US_removal_rates.py +++ b/removals/US_removal_rates.py @@ -79,7 +79,7 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g ### For removal factors - # Creates empty windows (arrays) that will store gain rates. There are separate arrays for + # Creates empty windows (arrays) that will store removals rates. There are separate arrays for # no Hansen gain pixels and for Hansen gain pixels. These are later combined. # Pixels without and with Hansen gain are treated separately because gain pixels automatically get the youngest # removal rate, regardless of their age category. @@ -98,8 +98,8 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g group_region_age_combined_window = np.ma.masked_where(US_region_window == 0, group_region_age_combined_window).filled(0) group_region_age_combined_window = np.ma.masked_where(gain_window != 0, group_region_age_combined_window).filled(0) - # Applies the dictionary of group-region-age gain rates to the group-region-age numpy array to - # get annual gain rates (Mg AGC+BGC/ha/yr) for each non-Hansen gain pixel + # Applies the dictionary of group-region-age removals rates to the group-region-age numpy array to + # get annual removals rates (Mg AGC+BGC/ha/yr) for each non-Hansen gain pixel for key, value in gain_table_group_region_age_dict.items(): agc_bgc_without_gain_pixel_window[group_region_age_combined_window == key] = value @@ -118,8 +118,8 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g group_region_combined_window = np.ma.masked_where(US_region_window == 0, group_region_combined_window).filled(0) group_region_combined_window = np.ma.masked_where(gain_window == 0, group_region_combined_window).filled(0) - # Applies the dictionary of group-region gain rates to the group-region numpy array to - # get annual gain rates (Mg AGC+BGC/ha/yr) for each pixel that doesn't have Hansen gain + # Applies the dictionary of group-region removals rates to the group-region numpy array to + # get annual removals rates (Mg AGC+BGC/ha/yr) for each pixel that doesn't have Hansen gain for key, value in gain_table_group_region_dict.items(): agc_bgc_with_gain_pixel_window[group_region_combined_window == key] = value @@ -135,18 +135,18 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g # Creates empty windows (arrays) that will store stdev. There are separate arrays for # no Hansen gain pixels and for Hansen gain pixels. These are later combined. - # Pixels without and with Hansen gain are treated separately because gain pixels automatically get the youngest + # Pixels without and with Hansen gain are treated separately because removals pixels automatically get the youngest # removal rate stdev, regardless of their age category. stdev_agc_bgc_without_gain_pixel_window = np.zeros((window.height, window.width), dtype='float32') stdev_agc_bgc_with_gain_pixel_window = np.zeros((window.height, window.width), dtype='float32') - # Applies the dictionary of group-region-age gain rates to the group-region-age numpy array to - # get annual gain rates (Mg AGC+BGC/ha/yr) for each non-Hansen gain pixel + # Applies the dictionary of group-region-age removals rates to the group-region-age numpy array to + # get annual removals rates (Mg AGC+BGC/ha/yr) for each non-Hansen gain pixel for key, value in stdev_table_group_region_age_dict.items(): stdev_agc_bgc_without_gain_pixel_window[group_region_age_combined_window == key] = value - # Applies the dictionary of group-region gain rates to the group-region numpy array to - # get annual gain rates (Mg AGC+BGC/ha/yr) for each pixel that doesn't have Hansen gain + # Applies the dictionary of group-region removals rates to the group-region numpy array to + # get annual removals rates (Mg AGC+BGC/ha/yr) for each pixel that doesn't have Hansen gain for key, value in stdev_table_group_region_dict.items(): stdev_agc_bgc_with_gain_pixel_window[group_region_combined_window == key] = value diff --git a/gain/__init__.py b/removals/__init__.py similarity index 100% rename from gain/__init__.py rename to removals/__init__.py diff --git a/gain/annual_gain_rate_AGC_BGC_all_forest_types.py b/removals/annual_gain_rate_AGC_BGC_all_forest_types.py similarity index 99% rename from gain/annual_gain_rate_AGC_BGC_all_forest_types.py rename to removals/annual_gain_rate_AGC_BGC_all_forest_types.py index 11ceb04e..88702be4 100644 --- a/gain/annual_gain_rate_AGC_BGC_all_forest_types.py +++ b/removals/annual_gain_rate_AGC_BGC_all_forest_types.py @@ -167,6 +167,8 @@ def annual_gain_rate_AGC_BGC_all_forest_types(tile_id, output_pattern_list, sens uu.print_log(" Creating removal model forest type tile, AGC removal factor tile, BGC removal factor tile, and AGC removal factor standard deviation tile for {}".format(tile_id)) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tile for idx, window in windows: diff --git a/gain/annual_gain_rate_IPCC_defaults.py b/removals/annual_gain_rate_IPCC_defaults.py similarity index 83% rename from gain/annual_gain_rate_IPCC_defaults.py rename to removals/annual_gain_rate_IPCC_defaults.py index eceaced6..58676f67 100644 --- a/gain/annual_gain_rate_IPCC_defaults.py +++ b/removals/annual_gain_rate_IPCC_defaults.py @@ -19,7 +19,7 @@ def annual_gain_rate(tile_id, sensit_type, gain_table_dict, stdev_table_dict, ou # The key in the dictionary is the forest age category decision tree endpoints. age_dict = {0: 0, 1: 10000, 2: 20000, 3: 30000} - uu.print_log("Processing:", tile_id) + uu.print_log("Creating IPCC default biomass removals rates and standard deviation for {}".format(tile_id)) # Start time start = datetime.datetime.now() @@ -28,25 +28,23 @@ def annual_gain_rate(tile_id, sensit_type, gain_table_dict, stdev_table_dict, ou age_cat = uu.sensit_tile_rename(sensit_type, tile_id, cn.pattern_age_cat_IPCC) cont_eco = uu.sensit_tile_rename(sensit_type, tile_id, cn.pattern_cont_eco_processed) - # Names of the output natural forest gain rate tiles (above and belowground) + # Names of the output natural forest removals rate tiles (above and belowground) AGB_IPCC_default_gain_rate = '{0}_{1}.tif'.format(tile_id, output_pattern_list[0]) BGB_IPCC_default_gain_rate = '{0}_{1}.tif'.format(tile_id, output_pattern_list[1]) AGB_IPCC_default_gain_stdev = '{0}_{1}.tif'.format(tile_id, output_pattern_list[2]) - uu.print_log(" Creating IPCC default biomass gain rates and standard deviation for {}".format(tile_id)) - # Opens the input tiles if they exist. kips tile if either input doesn't exist. try: age_cat_src = rasterio.open(age_cat) - uu.print_log(" Age category tile found for {}".format(tile_id)) + uu.print_log(" Age category tile found for {}".format(tile_id)) except: - return uu.print_log(" No age category tile found for {}. Skipping tile.".format(tile_id)) + return uu.print_log(" No age category tile found for {}. Skipping tile.".format(tile_id)) try: cont_eco_src = rasterio.open(cont_eco) - uu.print_log(" Continent-ecozone tile found for {}".format(tile_id)) + uu.print_log(" Continent-ecozone tile found for {}".format(tile_id)) except: - return uu.print_log(" No continent-ecozone tile found for {}. Skipping tile.".format(tile_id)) + return uu.print_log(" No continent-ecozone tile found for {}. Skipping tile.".format(tile_id)) # Grabs metadata about the continent ecozone tile, like its location/projection/cellsize kwargs = cont_eco_src.meta @@ -55,7 +53,7 @@ def annual_gain_rate(tile_id, sensit_type, gain_table_dict, stdev_table_dict, ou windows = cont_eco_src.block_windows(1) # Updates kwargs for the output dataset. - # Need to update data type to float 32 so that it can handle fractional gain rates + # Need to update data type to float 32 so that it can handle fractional removals rates kwargs.update( driver='GTiff', count=1, @@ -64,7 +62,7 @@ def annual_gain_rate(tile_id, sensit_type, gain_table_dict, stdev_table_dict, ou dtype='float32' ) - # The output files, aboveground and belowground biomass gain rates + # The output files, aboveground and belowground biomass removals rates dst_above = rasterio.open(AGB_IPCC_default_gain_rate, 'w', **kwargs) # Adds metadata tags to the output raster uu.add_rasterio_tags(dst_above, sensit_type) @@ -95,6 +93,8 @@ def annual_gain_rate(tile_id, sensit_type, gain_table_dict, stdev_table_dict, ou dst_stdev_above.update_tags( extent='Full model extent, even though these standard deviations will not be used over the full model extent') + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tiles for idx, window in windows: @@ -116,11 +116,11 @@ def annual_gain_rate(tile_id, sensit_type, gain_table_dict, stdev_table_dict, ou cont_eco_age = cont_eco_window + age_recode ## Aboveground removal factors - # Converts the continent-ecozone array to float so that the values can be replaced with fractional gain rates + # Converts the continent-ecozone array to float so that the values can be replaced with fractional removals rates gain_rate_AGB = cont_eco_age.astype('float32') - # Applies the dictionary of continent-ecozone-age gain rates to the continent-ecozone-age array to - # get annual gain rates (metric tons aboveground biomass/yr) for each pixel + # Applies the dictionary of continent-ecozone-age removals rates to the continent-ecozone-age array to + # get annual removals rates (metric tons aboveground biomass/yr) for each pixel for key, value in gain_table_dict.items(): gain_rate_AGB[gain_rate_AGB == key] = value @@ -138,8 +138,8 @@ def annual_gain_rate(tile_id, sensit_type, gain_table_dict, stdev_table_dict, ou # Converts the continent-ecozone array to float so that the values can be replaced with fractional standard deviations gain_stdev_AGB = cont_eco_age.astype('float32') - # Applies the dictionary of continent-ecozone-age gain rate standard deviations to the continent-ecozone-age array to - # get annual gain rate standard deviations (metric tons aboveground biomass/yr) for each pixel + # Applies the dictionary of continent-ecozone-age removals rate standard deviations to the continent-ecozone-age array to + # get annual removals rate standard deviations (metric tons aboveground biomass/yr) for each pixel for key, value in stdev_table_dict.items(): gain_stdev_AGB[gain_stdev_AGB == key] = value diff --git a/gain/annual_gain_rate_mangrove.py b/removals/annual_gain_rate_mangrove.py similarity index 83% rename from gain/annual_gain_rate_mangrove.py rename to removals/annual_gain_rate_mangrove.py index ada1ab0a..306ba6e4 100644 --- a/gain/annual_gain_rate_mangrove.py +++ b/removals/annual_gain_rate_mangrove.py @@ -1,6 +1,6 @@ -### Creates tiles of annual aboveground and belowground biomass gain rates for mangroves using IPCC Wetlands Supplement Table 4.4 rates. +### Creates tiles of annual aboveground and belowground biomass removals rates for mangroves using IPCC Wetlands Supplement Table 4.4 rates. ### Its inputs are the continent-ecozone tiles, mangrove biomass tiles (for locations of mangroves), and the IPCC -### gain rate table. +### removals rate table. import datetime import numpy as np @@ -32,12 +32,12 @@ def annual_gain_rate(tile_id, sensit_type, output_pattern_list, gain_above_dict, mangrove_biomass = uu.sensit_tile_rename(sensit_type, tile_id, cn.pattern_mangrove_biomass_2000) cont_eco = uu.sensit_tile_rename(sensit_type, tile_id, cn.pattern_cont_eco_processed) - # Names of the output aboveground and belowground mangrove gain rate tiles + # Names of the output aboveground and belowground mangrove removals rate tiles AGB_gain_rate = '{0}_{1}.tif'.format(tile_id, output_pattern_list[0]) BGB_gain_rate = '{0}_{1}.tif'.format(tile_id, output_pattern_list[1]) AGB_gain_stdev = '{0}_{1}.tif'.format(tile_id, output_pattern_list[2]) - uu.print_log(" Reading input files and creating aboveground and belowground biomass gain rates for {}".format(tile_id)) + uu.print_log(" Reading input files and creating aboveground and belowground biomass removals rates for {}".format(tile_id)) cont_eco_src = rasterio.open(cont_eco) mangrove_AGB_src = rasterio.open(mangrove_biomass) @@ -49,7 +49,7 @@ def annual_gain_rate(tile_id, sensit_type, output_pattern_list, gain_above_dict, windows = cont_eco_src.block_windows(1) # Updates kwargs for the output dataset. - # Need to update data type to float 32 so that it can handle fractional gain rates + # Need to update data type to float 32 so that it can handle fractional removals rates kwargs.update( driver='GTiff', count=1, @@ -95,9 +95,9 @@ def annual_gain_rate(tile_id, sensit_type, output_pattern_list, gain_above_dict, cont_eco = cont_eco_src.read(1, window=window) mangrove_AGB = mangrove_AGB_src.read(1, window=window) - # Converts the continent-ecozone array to float so that the values can be replaced with fractional gain rates. - # Creates two copies: one for aboveground gain and one for belowground gain. - # Creating only one copy of the cont_eco raster made it so that belowground gain rates weren't being + # Converts the continent-ecozone array to float so that the values can be replaced with fractional removals rates. + # Creates two copies: one for aboveground removals and one for belowground removals. + # Creating only one copy of the cont_eco raster made it so that belowground removals rates weren't being # written correctly for some reason. cont_eco_above = cont_eco.astype('float32') cont_eco_below = cont_eco.astype('float32') @@ -108,19 +108,19 @@ def annual_gain_rate(tile_id, sensit_type, output_pattern_list, gain_above_dict, mangrove_AGB[mangrove_AGB > 0] = 1 - # Applies the dictionary of continent-ecozone aboveground gain rates to the continent-ecozone array to - # get annual aboveground gain rates (metric tons aboveground biomass/yr) for each pixel + # Applies the dictionary of continent-ecozone aboveground removals rates to the continent-ecozone array to + # get annual aboveground removals rates (metric tons aboveground biomass/yr) for each pixel for key, value in gain_above_dict.items(): cont_eco_above[cont_eco_above == key] = value - # Masks out pixels without mangroves, leaving gain rates in only pixels with mangroves + # Masks out pixels without mangroves, leaving removals rates in only pixels with mangroves dst_above_data = cont_eco_above * mangrove_AGB # Writes the output window to the output dst_above.write_band(1, dst_above_data, window=window) - # Same as above but for belowground gain rates + # Same as above but for belowground removals rates for key, value in gain_below_dict.items(): cont_eco_below[cont_eco_below == key] = value @@ -129,12 +129,12 @@ def annual_gain_rate(tile_id, sensit_type, output_pattern_list, gain_above_dict, dst_below.write_band(1, dst_below_data, window=window) - # Applies the dictionary of continent-ecozone aboveground gain rate standard deviations to the continent-ecozone array to - # get annual aboveground gain rate standard deviations (metric tons aboveground biomass/yr) for each pixel + # Applies the dictionary of continent-ecozone aboveground removals rate standard deviations to the continent-ecozone array to + # get annual aboveground removals rate standard deviations (metric tons aboveground biomass/yr) for each pixel for key, value in stdev_dict.items(): cont_eco_stdev[cont_eco_stdev == key] = value - # Masks out pixels without mangroves, leaving gain rates in only pixels with mangroves + # Masks out pixels without mangroves, leaving removals rates in only pixels with mangroves dst_stdev = cont_eco_stdev * mangrove_AGB # Writes the output window to the output diff --git a/gain/continent_ecozone_tiles.py b/removals/continent_ecozone_tiles.py similarity index 97% rename from gain/continent_ecozone_tiles.py rename to removals/continent_ecozone_tiles.py index c896ccd0..882498e8 100644 --- a/gain/continent_ecozone_tiles.py +++ b/removals/continent_ecozone_tiles.py @@ -6,14 +6,14 @@ ### continents assigned to it. The creation of the continent-ecozone shapefile was done in ArcMap. ### In the resulting ecozone-continent shapefile, the final field has continent and ecozone concatenated. ### That ecozone-continent field can be parsed to get the ecozone and continent for every pixel, -### which are necessary for assigning gain rates to pixels. +### which are necessary for assigning removals rates to pixels. ### This script also breaks the input tiles into windows that are 1024 pixels on each side and assigns all pixels that ### don't have a continent-ecozone code to the most common code in that window. ### This is done to expand the extent of the continent-ecozone tiles to include pixels that don't have a continent-ecozone ### code because they are just outside the original shapefile. ### It is necessary to expand the continent-ecozone codes into those nearby areas because otherwise some forest age category -### pixels are outside the continent-ecozone pixels and can't have gain rates assigned to them. -### This maneuver provides the necessary continent-ecozone information to assign gain rates. +### pixels are outside the continent-ecozone pixels and can't have removals rates assigned to them. +### This maneuver provides the necessary continent-ecozone information to assign removals rates. import rasterio import numpy as np @@ -58,7 +58,7 @@ def create_continent_ecozone_tiles(tile_id): windows = cont_eco_raw_src.block_windows(1) # Updates kwargs for the output dataset. - # Need to update data type to float 32 so that it can handle fractional gain rates + # Need to update data type to float 32 so that it can handle fractional removals rates kwargs.update( driver='GTiff', count=1, diff --git a/gain/forest_age_category_IPCC.py b/removals/forest_age_category_IPCC.py similarity index 99% rename from gain/forest_age_category_IPCC.py rename to removals/forest_age_category_IPCC.py index f632cfc4..df4a40e0 100644 --- a/gain/forest_age_category_IPCC.py +++ b/removals/forest_age_category_IPCC.py @@ -51,8 +51,6 @@ def forest_age_category(tile_id, gain_table_dict, pattern, sensit_type, no_uploa loss = '{0}_{1}.tif'.format(cn.pattern_loss, tile_id) uu.print_log("Using Hansen loss tile {0} for {1} model run".format(tile_id, sensit_type)) - uu.print_log(" Assigning age categories") - # Opens biomass tile with rasterio.open(model_extent) as model_extent_src: @@ -116,6 +114,8 @@ def forest_age_category(tile_id, gain_table_dict, pattern, sensit_type, no_uploa uu.print_log(" Assigning IPCC age categories for", tile_id) + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tile for idx, window in windows: diff --git a/gain/gain_year_count_all_forest_types.py b/removals/gain_year_count_all_forest_types.py similarity index 87% rename from gain/gain_year_count_all_forest_types.py rename to removals/gain_year_count_all_forest_types.py index 1f11f64e..847cbf4d 100644 --- a/gain/gain_year_count_all_forest_types.py +++ b/removals/gain_year_count_all_forest_types.py @@ -27,14 +27,16 @@ def create_gain_year_count_loss_only(tile_id, sensit_type, no_upload): uu.print_log("Gain year count for loss only pixels:", tile_id) + # Names of the loss, gain and tree cover density tiles + loss, gain, model_extent = tile_names(tile_id, sensit_type) + # start time start = datetime.datetime.now() - # Names of the loss, gain and tree cover density tiles - loss, gain, model_extent = tile_names(tile_id, sensit_type) + uu.check_memory() if os.path.exists(loss): - uu.print_log("Loss tile found for {}. Using it in loss only pixel gain year count.".format(tile_id)) + uu.print_log(" Loss tile found for {}. Using it in loss only pixel gain year count.".format(tile_id)) loss_calc = '--calc=(A>0)*(B==0)*(C>0)*(A-1)' loss_outfilename = '{}_growth_years_loss_only.tif'.format(tile_id) loss_outfilearg = '--outfile={}'.format(loss_outfilename) @@ -53,14 +55,17 @@ def create_gain_year_count_gain_only_standard(tile_id, sensit_type, no_upload): uu.print_log("Gain year count for gain only pixels using standard function:", tile_id) + # Names of the loss, gain and tree cover density tiles + loss, gain, model_extent = tile_names(tile_id, sensit_type) + # start time start = datetime.datetime.now() - # Names of the loss, gain and tree cover density tiles - loss, gain, model_extent = tile_names(tile_id, sensit_type) + uu.check_memory() + # Need to check if loss tile exists because the calc string is depends on the presene/absence of the loss tile if os.path.exists(loss): - uu.print_log("Loss tile found for {}. Using it in gain only pixel gain year count.".format(tile_id)) + uu.print_log(" Loss tile found for {}. Using it in gain only pixel gain year count.".format(tile_id)) gain_calc = '--calc=(A==0)*(B==1)*(C>0)*({}/2)'.format(cn.gain_years) gain_outfilename = '{}_growth_years_gain_only.tif'.format(tile_id) gain_outfilearg = '--outfile={}'.format(gain_outfilename) @@ -68,7 +73,7 @@ def create_gain_year_count_gain_only_standard(tile_id, sensit_type, no_upload): '--NoDataValue=0', '--overwrite', '--co', 'COMPRESS=DEFLATE', '--type', 'Byte', '--quiet'] uu.log_subprocess_output_full(cmd) else: - uu.print_log("No loss tile found for {}. Not using it for gain only pixel gain year count.".format(tile_id)) + uu.print_log(" No loss tile found for {}. Not using it for gain only pixel gain year count.".format(tile_id)) gain_calc = '--calc=(A==1)*(B>0)*({}/2)'.format(cn.gain_years) gain_outfilename = '{}_growth_years_gain_only.tif'.format(tile_id) gain_outfilearg = '--outfile={}'.format(gain_outfilename) @@ -91,8 +96,10 @@ def create_gain_year_count_gain_only_maxgain(tile_id, sensit_type, no_upload): # start time start = datetime.datetime.now() + uu.check_memory() + if os.path.exists(loss): - uu.print_log("Loss tile found for {}. Using it in gain only pixel gain year count.".format(tile_id)) + uu.print_log(" Loss tile found for {}. Using it in gain only pixel gain year count.".format(tile_id)) gain_calc = '--calc=(A==0)*(B==1)*(C>0)*({})'.format(cn.loss_years) gain_outfilename = '{}_growth_years_gain_only.tif'.format(tile_id) gain_outfilearg = '--outfile={}'.format(gain_outfilename) @@ -100,7 +107,7 @@ def create_gain_year_count_gain_only_maxgain(tile_id, sensit_type, no_upload): '--NoDataValue=0', '--overwrite', '--co', 'COMPRESS=DEFLATE', '--type', 'Byte', '--quiet'] uu.log_subprocess_output_full(cmd) else: - uu.print_log("No loss tile found for {}. Not using loss for gain only pixel gain year count.".format(tile_id)) + uu.print_log(" No loss tile found for {}. Not using loss for gain only pixel gain year count.".format(tile_id)) gain_calc = '--calc=(A==1)*(B>0)*({})'.format(cn.loss_years) gain_outfilename = '{}_growth_years_gain_only.tif'.format(tile_id) gain_outfilearg = '--outfile={}'.format(gain_outfilename) @@ -124,8 +131,10 @@ def create_gain_year_count_no_change_standard(tile_id, sensit_type, no_upload): # start time start = datetime.datetime.now() + uu.check_memory() + if os.path.exists(loss): - uu.print_log("Loss tile found for {}. Using it in no change pixel gain year count.".format(tile_id)) + uu.print_log(" Loss tile found for {}. Using it in no change pixel gain year count.".format(tile_id)) no_change_calc = '--calc=(A==0)*(B==0)*(C>0)*{}'.format(cn.loss_years) no_change_outfilename = '{}_growth_years_no_change.tif'.format(tile_id) no_change_outfilearg = '--outfile={}'.format(no_change_outfilename) @@ -133,7 +142,7 @@ def create_gain_year_count_no_change_standard(tile_id, sensit_type, no_upload): no_change_outfilearg, '--NoDataValue=0', '--overwrite', '--co', 'COMPRESS=DEFLATE', '--type', 'Byte', '--quiet'] uu.log_subprocess_output_full(cmd) else: - uu.print_log("No loss tile found for {}. Not using it for no change pixel gain year count.".format(tile_id)) + uu.print_log(" No loss tile found for {}. Not using it for no change pixel gain year count.".format(tile_id)) no_change_calc = '--calc=(A==0)*(B>0)*{}'.format(cn.loss_years) no_change_outfilename = '{}_growth_years_no_change.tif'.format(tile_id) no_change_outfilearg = '--outfile={}'.format(no_change_outfilename) @@ -157,6 +166,8 @@ def create_gain_year_count_no_change_legal_Amazon_loss(tile_id, sensit_type, no_ # start time start = datetime.datetime.now() + uu.check_memory() + # For unclear reasons, gdal_calc doesn't register the 0 (NoData) pixels in the loss tile, so I have to convert it # to a vrt so that the 0 pixels are recognized. # This was the case with PRODES loss in model v.1.1.2. @@ -187,8 +198,10 @@ def create_gain_year_count_loss_and_gain_standard(tile_id, sensit_type, no_uploa # start time start = datetime.datetime.now() + uu.check_memory() + if os.path.exists(loss): - uu.print_log("Loss tile found for {}. Using it in loss and gain pixel gain year count.".format(tile_id)) + uu.print_log(" Loss tile found for {}. Using it in loss and gain pixel gain year count.".format(tile_id)) loss_and_gain_calc = '--calc=((A>0)*(B==1)*(C>0)*((A-1)+floor(({}+1-A)/2)))'.format(cn.loss_years) loss_and_gain_outfilename = '{}_growth_years_loss_and_gain.tif'.format(tile_id) loss_and_gain_outfilearg = '--outfile={}'.format(loss_and_gain_outfilename) @@ -196,7 +209,7 @@ def create_gain_year_count_loss_and_gain_standard(tile_id, sensit_type, no_uploa loss_and_gain_outfilearg, '--NoDataValue=0', '--overwrite', '--co', 'COMPRESS=DEFLATE', '--type', 'Byte', '--quiet'] uu.log_subprocess_output_full(cmd) else: - uu.print_log("No loss tile found for {}. Skipping loss and gain pixel gain year count.".format(tile_id)) + uu.print_log(" No loss tile found for {}. Skipping loss and gain pixel gain year count.".format(tile_id)) # Prints information about the tile that was just processed @@ -214,8 +227,10 @@ def create_gain_year_count_loss_and_gain_maxgain(tile_id, sensit_type, no_upload # start time start = datetime.datetime.now() + uu.check_memory() + if os.path.exists(loss): - uu.print_log("Loss tile found for {}. Using it in loss and gain pixel gain year count".format(tile_id)) + uu.print_log(" Loss tile found for {}. Using it in loss and gain pixel gain year count".format(tile_id)) loss_and_gain_calc = '--calc=((A>0)*(B==1)*(C>0)*({}-1))'.format(cn.loss_years) loss_and_gain_outfilename = '{}_growth_years_loss_and_gain.tif'.format(tile_id) loss_and_gain_outfilearg = '--outfile={}'.format(loss_and_gain_outfilename) @@ -223,7 +238,7 @@ def create_gain_year_count_loss_and_gain_maxgain(tile_id, sensit_type, no_upload loss_and_gain_outfilearg, '--NoDataValue=0', '--overwrite', '--co', 'COMPRESS=DEFLATE', '--type', 'Byte', '--quiet'] uu.log_subprocess_output_full(cmd) else: - uu.print_log("No loss tile found for {}. Skipping loss and gain pixel gain year count.".format(tile_id)) + uu.print_log(" No loss tile found for {}. Skipping loss and gain pixel gain year count.".format(tile_id)) # Prints information about the tile that was just processed uu.end_of_fx_summary(start, tile_id, 'growth_years_loss_and_gain', no_upload) @@ -232,7 +247,7 @@ def create_gain_year_count_loss_and_gain_maxgain(tile_id, sensit_type, no_upload # Merges the four gain year count tiles above to create a single gain year count tile def create_gain_year_count_merge(tile_id, pattern, sensit_type, no_upload): - uu.print_log("Merging loss, gain, no change, and loss/gain pixels into single raster for {}".format(tile_id)) + uu.print_log("Merging loss, gain, no change, and loss/gain pixels into single gain year count raster for {}".format(tile_id)) # start time start = datetime.datetime.now() @@ -263,26 +278,26 @@ def create_gain_year_count_merge(tile_id, pattern, sensit_type, no_upload): nodata=0 ) - uu.print_log(" No change tile exists for {} by default".format(tile_id)) + uu.print_log(" No change tile exists for {} by default".format(tile_id)) # Opens the other gain year count tiles. They may not exist for all other tiles. try: loss_only_gain_years_src = rasterio.open(loss_only_gain_years) - uu.print_log(" Loss only tile found for {}".format(tile_id)) + uu.print_log(" Loss only tile found for {}".format(tile_id)) except: - uu.print_log(" No loss only tile found for {}".format(tile_id)) + uu.print_log(" No loss only tile found for {}".format(tile_id)) try: gain_only_gain_years_src = rasterio.open(gain_only_gain_years) - uu.print_log(" Gain only tile found for {}".format(tile_id)) + uu.print_log(" Gain only tile found for {}".format(tile_id)) except: - uu.print_log(" No gain only tile found for {}".format(tile_id)) + uu.print_log(" No gain only tile found for {}".format(tile_id)) try: loss_and_gain_gain_years_src = rasterio.open(loss_and_gain_gain_years) - uu.print_log(" Loss and gain tile found for {}".format(tile_id)) + uu.print_log(" Loss and gain tile found for {}".format(tile_id)) except: - uu.print_log(" No loss and gain tile found for {}".format(tile_id)) + uu.print_log(" No loss and gain tile found for {}".format(tile_id)) # Opens the output tile, giving it the arguments of the input tiles gain_year_count_merged_dst = rasterio.open(gain_year_count_merged, 'w', **kwargs) @@ -300,6 +315,8 @@ def create_gain_year_count_merge(tile_id, pattern, sensit_type, no_upload): gain_year_count_merged_dst.update_tags( extent='Full model extent') + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tile for idx, window in windows: diff --git a/gain/gross_removals_all_forest_types.py b/removals/gross_removals_all_forest_types.py similarity index 98% rename from gain/gross_removals_all_forest_types.py rename to removals/gross_removals_all_forest_types.py index 4a3fecb9..2c0b3eff 100644 --- a/gain/gross_removals_all_forest_types.py +++ b/removals/gross_removals_all_forest_types.py @@ -6,7 +6,7 @@ import constants_and_names as cn import universal_util as uu -# Calculates cumulative aboveground carbon dioxide gain in mangroves +# Calculates cumulative aboveground carbon dioxide removals in mangroves def gross_removals_all_forest_types(tile_id, output_pattern_list, sensit_type, no_upload): uu.print_log("Calculating cumulative CO2 removals:", tile_id) @@ -86,6 +86,8 @@ def gross_removals_all_forest_types(tile_id, output_pattern_list, sensit_type, n cumulative_gain_AGCO2_BGCO2_dst.update_tags( extent='Full model extent') + uu.check_memory() + # Iterates across the windows (1 pixel strips) of the input tiles for idx, window in windows: diff --git a/gain/mp_US_removal_rates.py b/removals/mp_US_removal_rates.py similarity index 93% rename from gain/mp_US_removal_rates.py rename to removals/mp_US_removal_rates.py index 5c37cffd..4c445da0 100644 --- a/gain/mp_US_removal_rates.py +++ b/removals/mp_US_removal_rates.py @@ -93,7 +93,8 @@ def mp_US_removal_rates(sensit_type, tile_id_list, run_date): # Table with US-specific removal rates - cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.table_US_removal_rate), cn.docker_base_dir, '--no-sign-request'] + # cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.table_US_removal_rate), cn.docker_base_dir, '--no-sign-request'] + cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.table_US_removal_rate), cn.docker_base_dir] uu.log_subprocess_output_full(cmd) @@ -103,7 +104,7 @@ def mp_US_removal_rates(sensit_type, tile_id_list, run_date): gain_table = pd.read_excel("{}".format(cn.table_US_removal_rate), sheet_name="US_rates_AGC+BGC") - # Converts gain table from wide to long, so each region-group-age category has its own row + # Converts removals table from wide to long, so each region-group-age category has its own row gain_table_group_region_by_age = pd.melt(gain_table, id_vars=['FIA_region_code', 'forest_group_code'], value_vars=['growth_young', 'growth_middle', 'growth_old']) @@ -113,14 +114,14 @@ def mp_US_removal_rates(sensit_type, tile_id_list, run_date): age_dict = {'growth_young': 1000, 'growth_middle': 2000, 'growth_old': 3000} # Creates a unique value for each forest group-region-age category in the table. - # Although these rates are applied to all standard gain model pixels at first, they are not ultimately used for + # Although these rates are applied to all standard removals model pixels at first, they are not ultimately used for # pixels that have Hansen gain (see below). gain_table_group_region_age = gain_table_group_region_by_age.replace({"variable": age_dict}) gain_table_group_region_age['age_cat'] = gain_table_group_region_age['variable']*10 gain_table_group_region_age['group_region_age_combined'] = gain_table_group_region_age['age_cat'] + \ gain_table_group_region_age['forest_group_code']*100 + \ gain_table_group_region_age['FIA_region_code'] - # Converts the forest group-region-age codes and corresponding gain rates to a dictionary, + # Converts the forest group-region-age codes and corresponding removals rates to a dictionary, # where the key is the unique group-region-age code and the value is the AGB removal rate. gain_table_group_region_age_dict = pd.Series(gain_table_group_region_age.value.values, index=gain_table_group_region_age.group_region_age_combined).to_dict() uu.print_log(gain_table_group_region_age_dict) @@ -132,7 +133,7 @@ def mp_US_removal_rates(sensit_type, tile_id_list, run_date): gain_table_group_region = gain_table_group_region_age.drop(gain_table_group_region_age[gain_table_group_region_age.age_cat != 10000].index) gain_table_group_region['group_region_combined'] = gain_table_group_region['forest_group_code']*100 + \ gain_table_group_region['FIA_region_code'] - # Converts the forest group-region codes and corresponding gain rates to a dictionary, + # Converts the forest group-region codes and corresponding removals rates to a dictionary, # where the key is the unique group-region code (youngest age category) and the value is the AGB removal rate. gain_table_group_region_dict = pd.Series(gain_table_group_region.value.values, index=gain_table_group_region.group_region_combined).to_dict() uu.print_log(gain_table_group_region_dict) @@ -140,7 +141,7 @@ def mp_US_removal_rates(sensit_type, tile_id_list, run_date): ### To make the removal factor standard deviation dictionaries - # Converts gain table from wide to long, so each region-group-age category has its own row + # Converts removals table from wide to long, so each region-group-age category has its own row stdev_table_group_region_by_age = pd.melt(gain_table, id_vars=['FIA_region_code', 'forest_group_code'], value_vars=['SD_young', 'SD_middle', 'SD_old']) @@ -150,14 +151,14 @@ def mp_US_removal_rates(sensit_type, tile_id_list, run_date): stdev_dict = {'SD_young': 1000, 'SD_middle': 2000, 'SD_old': 3000} # Creates a unique value for each forest group-region-age category in the table. - # Although these rates are applied to all standard gain model pixels at first, they are not ultimately used for + # Although these rates are applied to all standard removals model pixels at first, they are not ultimately used for # pixels that have Hansen gain (see below). stdev_table_group_region_age = stdev_table_group_region_by_age.replace({"variable": stdev_dict}) stdev_table_group_region_age['age_cat'] = stdev_table_group_region_age['variable'] * 10 stdev_table_group_region_age['group_region_age_combined'] = stdev_table_group_region_age['age_cat'] + \ stdev_table_group_region_age['forest_group_code'] * 100 + \ stdev_table_group_region_age['FIA_region_code'] - # Converts the forest group-region-age codes and corresponding gain rates to a dictionary, + # Converts the forest group-region-age codes and corresponding removals rates to a dictionary, # where the key is the unique group-region-age code and the value is the AGB removal rate. stdev_table_group_region_age_dict = pd.Series(stdev_table_group_region_age.value.values, index=stdev_table_group_region_age.group_region_age_combined).to_dict() @@ -170,7 +171,7 @@ def mp_US_removal_rates(sensit_type, tile_id_list, run_date): stdev_table_group_region_age[stdev_table_group_region_age.age_cat != 10000].index) stdev_table_group_region['group_region_combined'] = stdev_table_group_region['forest_group_code'] * 100 + \ stdev_table_group_region['FIA_region_code'] - # Converts the forest group-region codes and corresponding gain rates to a dictionary, + # Converts the forest group-region codes and corresponding removals rates to a dictionary, # where the key is the unique group-region code (youngest age category) and the value is the AGB removal rate. stdev_table_group_region_dict = pd.Series(stdev_table_group_region.value.values, index=stdev_table_group_region.group_region_combined).to_dict() diff --git a/gain/mp_annual_gain_rate_AGC_BGC_all_forest_types.py b/removals/mp_annual_gain_rate_AGC_BGC_all_forest_types.py similarity index 99% rename from gain/mp_annual_gain_rate_AGC_BGC_all_forest_types.py rename to removals/mp_annual_gain_rate_AGC_BGC_all_forest_types.py index a99d3dd8..55378085 100644 --- a/gain/mp_annual_gain_rate_AGC_BGC_all_forest_types.py +++ b/removals/mp_annual_gain_rate_AGC_BGC_all_forest_types.py @@ -21,7 +21,7 @@ sys.path.append('../') import constants_and_names as cn import universal_util as uu -sys.path.append(os.path.join(cn.docker_app,'gain')) +sys.path.append(os.path.join(cn.docker_app,'removals')) import annual_gain_rate_AGC_BGC_all_forest_types def mp_annual_gain_rate_AGC_BGC_all_forest_types(sensit_type, tile_id_list, run_date = None, no_upload = None): diff --git a/gain/mp_annual_gain_rate_IPCC_defaults.py b/removals/mp_annual_gain_rate_IPCC_defaults.py similarity index 93% rename from gain/mp_annual_gain_rate_IPCC_defaults.py rename to removals/mp_annual_gain_rate_IPCC_defaults.py index 04290da8..dc33b6f6 100644 --- a/gain/mp_annual_gain_rate_IPCC_defaults.py +++ b/removals/mp_annual_gain_rate_IPCC_defaults.py @@ -5,7 +5,7 @@ (in the units of IPCC Table 4.9 (currently tonnes biomass/ha/yr)) to the entire model extent. The standard deviation tiles are used in the uncertainty analysis. It requires IPCC Table 4.9, formatted for easy ingestion by pandas. -Essentially, this does some processing of the IPCC gain rate table, then uses it as a dictionary that it applies +Essentially, this does some processing of the IPCC removals rate table, then uses it as a dictionary that it applies to every pixel in every tile. Each continent-ecozone-forest age category combination gets its own code, which matches the codes in the processed IPCC table. @@ -26,7 +26,7 @@ sys.path.append('../') import constants_and_names as cn import universal_util as uu -sys.path.append(os.path.join(cn.docker_app,'gain')) +sys.path.append(os.path.join(cn.docker_app,'removals')) import annual_gain_rate_IPCC_defaults os.chdir(cn.docker_base_dir) @@ -77,8 +77,9 @@ def mp_annual_gain_rate_IPCC_defaults(sensit_type, tile_id_list, run_date = None pattern = values[0] uu.s3_flexible_download(dir, pattern, cn.docker_base_dir, sensit_type, tile_id_list) - # Table with IPCC Table 4.9 default gain rates - cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + # Table with IPCC Table 4.9 default removals rates + # cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir] uu.log_subprocess_output_full(cmd) @@ -86,21 +87,21 @@ def mp_annual_gain_rate_IPCC_defaults(sensit_type, tile_id_list, run_date = None # Special removal rate table for no_primary_gain sensitivity analysis: primary forests and IFLs have removal rate of 0 if sensit_type == 'no_primary_gain': - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates gain_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name = "natrl fores gain, no_prim_gain") uu.print_log("Using no_primary_gain IPCC default rates for tile creation") # All other analyses use the standard removal rates else: - # Imports the table with the ecozone-continent codes and the biomass gain rates + # Imports the table with the ecozone-continent codes and the biomass removals rates gain_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name = "natrl fores gain, for std model") # Removes rows with duplicate codes (N. and S. America for the same ecozone) gain_table_simplified = gain_table.drop_duplicates(subset='gainEcoCon', keep='first') - # Converts gain table from wide to long, so each continent-ecozone-age category has its own row + # Converts removals table from wide to long, so each continent-ecozone-age category has its own row gain_table_cont_eco_age = pd.melt(gain_table_simplified, id_vars = ['gainEcoCon'], value_vars = ['growth_primary', 'growth_secondary_greater_20', 'growth_secondary_less_20']) gain_table_cont_eco_age = gain_table_cont_eco_age.dropna() @@ -122,7 +123,7 @@ def mp_annual_gain_rate_IPCC_defaults(sensit_type, tile_id_list, run_date = None # Merges the table of just continent-ecozone codes and the table of continent-ecozone-age codes gain_table_all_combos = pd.concat([gain_table_con_eco_only, gain_table_cont_eco_age]) - # Converts the continent-ecozone-age codes and corresponding gain rates to a dictionary + # Converts the continent-ecozone-age codes and corresponding removals rates to a dictionary gain_table_dict = pd.Series(gain_table_all_combos.value.values,index=gain_table_all_combos.cont_eco_age).to_dict() # Adds a dictionary entry for where the ecozone-continent-age code is 0 (not in a continent) @@ -141,21 +142,21 @@ def mp_annual_gain_rate_IPCC_defaults(sensit_type, tile_id_list, run_date = None # Special removal rate table for no_primary_gain sensitivity analysis: primary forests and IFLs have removal rate of 0 if sensit_type == 'no_primary_gain': - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates stdev_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name="natrl fores stdv, no_prim_gain") uu.print_log("Using no_primary_gain IPCC default standard deviations for tile creation") # All other analyses use the standard removal rates else: - # Imports the table with the ecozone-continent codes and the biomass gain rate standard deviations + # Imports the table with the ecozone-continent codes and the biomass removals rate standard deviations stdev_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name="natrl fores stdv, for std model") # Removes rows with duplicate codes (N. and S. America for the same ecozone) stdev_table_simplified = stdev_table.drop_duplicates(subset='gainEcoCon', keep='first') - # Converts gain table from wide to long, so each continent-ecozone-age category has its own row + # Converts removals table from wide to long, so each continent-ecozone-age category has its own row stdev_table_cont_eco_age = pd.melt(stdev_table_simplified, id_vars = ['gainEcoCon'], value_vars = ['stdev_primary', 'stdev_secondary_greater_20', 'stdev_secondary_less_20']) stdev_table_cont_eco_age = stdev_table_cont_eco_age.dropna() @@ -178,7 +179,7 @@ def mp_annual_gain_rate_IPCC_defaults(sensit_type, tile_id_list, run_date = None # Merges the table of just continent-ecozone codes and the table of continent-ecozone-age codes stdev_table_all_combos = pd.concat([stdev_table_con_eco_only, stdev_table_cont_eco_age]) - # Converts the continent-ecozone-age codes and corresponding gain rates to a dictionary + # Converts the continent-ecozone-age codes and corresponding removals rates to a dictionary stdev_table_dict = pd.Series(stdev_table_all_combos.value.values,index=stdev_table_all_combos.cont_eco_age).to_dict() # Adds a dictionary entry for where the ecozone-continent-age code is 0 (not in a continent) @@ -202,7 +203,7 @@ def mp_annual_gain_rate_IPCC_defaults(sensit_type, tile_id_list, run_date = None processes = 30 # 30 processors = 725 GB peak else: processes = 2 - uu.print_log('Annual gain rate natural forest max processors=', processes) + uu.print_log('Annual removals rate natural forest max processors=', processes) pool = multiprocessing.Pool(processes) pool.map(partial(annual_gain_rate_IPCC_defaults.annual_gain_rate, sensit_type=sensit_type, gain_table_dict=gain_table_dict, stdev_table_dict=stdev_table_dict, diff --git a/gain/mp_annual_gain_rate_mangrove.py b/removals/mp_annual_gain_rate_mangrove.py similarity index 89% rename from gain/mp_annual_gain_rate_mangrove.py rename to removals/mp_annual_gain_rate_mangrove.py index 936e345c..035cbbab 100644 --- a/gain/mp_annual_gain_rate_mangrove.py +++ b/removals/mp_annual_gain_rate_mangrove.py @@ -1,7 +1,7 @@ ''' Creates tiles of annual aboveground and belowground biomass removal rates for mangroves using IPCC Wetlands Supplement Table 4.4 rates. Its inputs are the continent-ecozone tiles, mangrove biomass tiles (for locations of mangroves), and the IPCC mangrove -gain rate table. +removals rate table. Also creates tiles of standard deviation in mangrove aboveground biomass removal rates based on the 95% CI in IPCC Wetlands Supplement Table 4.4. ''' @@ -16,7 +16,7 @@ sys.path.append('../') import constants_and_names as cn import universal_util as uu -sys.path.append(os.path.join(cn.docker_app,'gain')) +sys.path.append(os.path.join(cn.docker_app,'removals')) import annual_gain_rate_mangrove def mp_annual_gain_rate_mangrove(sensit_type, tile_id_list, run_date = None): @@ -28,7 +28,7 @@ def mp_annual_gain_rate_mangrove(sensit_type, tile_id_list, run_date = None): # If a full model run is specified, the correct set of tiles for the particular script is listed if tile_id_list == 'all': # Lists the tiles that have both mangrove biomass and FAO ecozone information because both of these are necessary for - # calculating mangrove gain + # calculating mangrove removals mangrove_biomass_tile_list = uu.tile_list_s3(cn.mangrove_biomass_2000_dir) ecozone_tile_list = uu.tile_list_s3(cn.cont_eco_dir) tile_id_list = list(set(mangrove_biomass_tile_list).intersection(ecozone_tile_list)) @@ -60,14 +60,15 @@ def mp_annual_gain_rate_mangrove(sensit_type, tile_id_list, run_date = None): uu.s3_flexible_download(dir, pattern, cn.docker_base_dir, sensit_type, tile_id_list) - # Table with IPCC Wetland Supplement Table 4.4 default mangrove gain rates - cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + # Table with IPCC Wetland Supplement Table 4.4 default mangrove removals rates + # cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir] uu.log_subprocess_output_full(cmd) ### To make the removal factor dictionaries - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates gain_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name = "mangrove gain, for model") @@ -75,18 +76,18 @@ def mp_annual_gain_rate_mangrove(sensit_type, tile_id_list, run_date = None): gain_table_simplified = gain_table.drop_duplicates(subset='gainEcoCon', keep='first') # Creates belowground:aboveground biomass ratio dictionary for the three mangrove types, where the keys correspond to - # the "mangType" field in the gain rate spreadsheet. + # the "mangType" field in the removals rate spreadsheet. # If the assignment of mangTypes to ecozones changes, that column in the spreadsheet may need to change and the # keys in this dictionary would need to change accordingly. type_ratio_dict = {'1': cn.below_to_above_trop_dry_mang, '2' :cn.below_to_above_trop_wet_mang, '3': cn.below_to_above_subtrop_mang} type_ratio_dict_final = {int(k):float(v) for k,v in list(type_ratio_dict.items())} - # Applies the belowground:aboveground biomass ratios for the three mangrove types to the annual aboveground gain rates to get - # a column of belowground annual gain rates by mangrove type + # Applies the belowground:aboveground biomass ratios for the three mangrove types to the annual aboveground removals rates to get + # a column of belowground annual removals rates by mangrove type gain_table_simplified['BGB_AGB_ratio'] = gain_table_simplified['mangType'].map(type_ratio_dict_final) gain_table_simplified['BGB_annual_rate'] = gain_table_simplified.AGB_gain_tons_ha_yr * gain_table_simplified.BGB_AGB_ratio - # Converts the continent-ecozone codes and corresponding gain rates to dictionaries for aboveground and belowground gain rates + # Converts the continent-ecozone codes and corresponding removals rates to dictionaries for aboveground and belowground removals rates gain_above_dict = pd.Series(gain_table_simplified.AGB_gain_tons_ha_yr.values,index=gain_table_simplified.gainEcoCon).to_dict() gain_below_dict = pd.Series(gain_table_simplified.BGB_annual_rate.values,index=gain_table_simplified.gainEcoCon).to_dict() @@ -101,14 +102,14 @@ def mp_annual_gain_rate_mangrove(sensit_type, tile_id_list, run_date = None): ### To make the removal factor standard deviation dictionary - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates stdev_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name = "mangrove stdev, for model") # Removes rows with duplicate codes (N. and S. America for the same ecozone) stdev_table_simplified = stdev_table.drop_duplicates(subset='gainEcoCon', keep='first') - # Converts the continent-ecozone codes and corresponding gain rate standard deviations to dictionaries for aboveground and belowground gain rate stdevs + # Converts the continent-ecozone codes and corresponding removals rate standard deviations to dictionaries for aboveground and belowground removals rate stdevs stdev_dict = pd.Series(stdev_table_simplified.AGB_gain_stdev_tons_ha_yr.values,index=stdev_table_simplified.gainEcoCon).to_dict() # Adds a dictionary entry for where the ecozone-continent code is 0 (not in a continent) @@ -125,7 +126,7 @@ def mp_annual_gain_rate_mangrove(sensit_type, tile_id_list, run_date = None): processes = 20 #26 processors = >740 GB peak; 18 = 550 GB peak; 20 = 610 GB peak; 23 = 700 GB peak; 24 > 750 GB peak else: processes = 4 - uu.print_log('Mangrove annual gain rate max processors=', processes) + uu.print_log('Mangrove annual removals rate max processors=', processes) pool = multiprocessing.Pool(processes) pool.map(partial(annual_gain_rate_mangrove.annual_gain_rate, sensit_type=sensit_type, output_pattern_list=output_pattern_list, gain_above_dict=gain_above_dict, gain_below_dict=gain_below_dict, stdev_dict=stdev_dict), tile_id_list) diff --git a/gain/mp_continent_ecozone_tiles.py b/removals/mp_continent_ecozone_tiles.py similarity index 97% rename from gain/mp_continent_ecozone_tiles.py rename to removals/mp_continent_ecozone_tiles.py index b59e1987..b513deb9 100644 --- a/gain/mp_continent_ecozone_tiles.py +++ b/removals/mp_continent_ecozone_tiles.py @@ -6,14 +6,14 @@ ### continents assigned to it. The creation of the continent-ecozone shapefile was done in ArcMap. ### In the resulting ecozone-continent shapefile, the final field has continent and ecozone concatenated. ### That ecozone-continent field can be parsed to get the ecozone and continent for every pixel, -### which are necessary for assigning gain rates to pixels. +### which are necessary for assigning removals rates to pixels. ### This script also breaks the input tiles into windows that are 1024 pixels on each side and assigns all pixels that ### don't have a continent-ecozone code to the most common code in that window. ### This is done to expand the extent of the continent-ecozone tiles to include pixels that don't have a continent-ecozone ### code because they are just outside the original shapefile. ### It is necessary to expand the continent-ecozone codes into those nearby areas because otherwise some forest age category -### pixels are outside the continent-ecozone pixels and can't have gain rates assigned to them. -### This maneuver provides the necessary continent-ecozone information to assign gain rates. +### pixels are outside the continent-ecozone pixels and can't have removals rates assigned to them. +### This maneuver provides the necessary continent-ecozone information to assign removals rates. import multiprocessing diff --git a/gain/mp_forest_age_category_IPCC.py b/removals/mp_forest_age_category_IPCC.py similarity index 94% rename from gain/mp_forest_age_category_IPCC.py rename to removals/mp_forest_age_category_IPCC.py index 3040e022..c90203d9 100644 --- a/gain/mp_forest_age_category_IPCC.py +++ b/removals/mp_forest_age_category_IPCC.py @@ -3,7 +3,7 @@ The age categories are: <= 20 year old secondary forest (1), >20 year old secondary forest (2), and primary forest (3). The decision tree is implemented as a series of numpy array statements rather than as nested if statements or gdal_calc operations. The output tiles have 3 possible values, each value representing an end of the decision tree. -The forest age category tiles are inputs for assigning gain rates to +The forest age category tiles are inputs for assigning removals rates to non-mangrove, non-planted, non-European, non-US, older secondary and primary forests.pixels. The extent of this layer is greater than the extent of the rates which are based on this, though. This assigns forest age category to all pixels within the model but they are ultimately only used for @@ -23,7 +23,7 @@ sys.path.append('../') import constants_and_names as cn import universal_util as uu -sys.path.append(os.path.join(cn.docker_app,'gain')) +sys.path.append(os.path.join(cn.docker_app,'removals')) import forest_age_category_IPCC def mp_forest_age_category_IPCC(sensit_type, tile_id_list, run_date = None, no_upload = None): @@ -87,19 +87,20 @@ def mp_forest_age_category_IPCC(sensit_type, tile_id_list, run_date = None, no_u output_dir_list = uu.replace_output_dir_date(output_dir_list, run_date) - # Table with IPCC Table 4.9 default gain rates - cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + # Table with IPCC Table 4.9 default removals rates + # cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir, '--no-sign-request'] + cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir] uu.log_subprocess_output_full(cmd) - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates gain_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name = "natrl fores gain, for std model") # Removes rows with duplicate codes (N. and S. America for the same ecozone) gain_table_simplified = gain_table.drop_duplicates(subset='gainEcoCon', keep='first') - # Converts the continent-ecozone codes and young forest gain rates to a dictionary + # Converts the continent-ecozone codes and young forest removals rates to a dictionary gain_table_dict = pd.Series(gain_table_simplified.growth_secondary_less_20.values,index=gain_table_simplified.gainEcoCon).to_dict() # Adds a dictionary entry for where the ecozone-continent code is 0 (not in a continent) diff --git a/gain/mp_gain_year_count_all_forest_types.py b/removals/mp_gain_year_count_all_forest_types.py similarity index 99% rename from gain/mp_gain_year_count_all_forest_types.py rename to removals/mp_gain_year_count_all_forest_types.py index 93301ea7..6638be58 100644 --- a/gain/mp_gain_year_count_all_forest_types.py +++ b/removals/mp_gain_year_count_all_forest_types.py @@ -21,7 +21,7 @@ import constants_and_names as cn import universal_util as uu -def mp_gain_year_count_all_forest_types(sensit_type, tile_id_list, run_date = None, no_upload = True): +def mp_gain_year_count_all_forest_types(sensit_type, tile_id_list, run_date = None, no_upload = None): os.chdir(cn.docker_base_dir) @@ -173,6 +173,8 @@ def mp_gain_year_count_all_forest_types(sensit_type, tile_id_list, run_date = No # If no_upload flag is not activated (by choice or by lack of AWS credentials), output is uploaded if not no_upload: + print("in upload area") + # Intermediate output tiles for checking outputs uu.upload_final_set(output_dir_list[0], "growth_years_loss_only") uu.upload_final_set(output_dir_list[0], "growth_years_gain_only") diff --git a/gain/mp_gross_removals_all_forest_types.py b/removals/mp_gross_removals_all_forest_types.py similarity index 97% rename from gain/mp_gross_removals_all_forest_types.py rename to removals/mp_gross_removals_all_forest_types.py index 1e8c04e3..ceb89545 100644 --- a/gain/mp_gross_removals_all_forest_types.py +++ b/removals/mp_gross_removals_all_forest_types.py @@ -1,7 +1,7 @@ ''' -This script calculates the cumulative above and belowground carbon dioxide gain (removals) for all forest types +This script calculates the cumulative above and belowground carbon dioxide removals (removals) for all forest types for the duration of the model. -It multiplies the annual aboveground and belowground carbon removal factors by the number of years of gain and the C to CO2 conversion. +It multiplies the annual aboveground and belowground carbon removal factors by the number of years of removals and the C to CO2 conversion. It then sums the aboveground and belowground gross removals to get gross removals for all forest types in both emitted_pools. That is the final gross removals for the entire model. Note that gross removals from this script are reported as positive values. @@ -16,7 +16,7 @@ sys.path.append('../') import constants_and_names as cn import universal_util as uu -sys.path.append(os.path.join(cn.docker_app,'gain')) +sys.path.append(os.path.join(cn.docker_app,'removals')) import gross_removals_all_forest_types def mp_gross_removals_all_forest_types(sensit_type, tile_id_list, run_date = None, no_upload = True): diff --git a/requirements.txt b/requirements.txt index 4705d387..d1baa6e6 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,4 @@ +cftime~=1.4.1 awscli~=1.16.50 boto3~=1.9.40 botocore~=1.12.40 @@ -10,3 +11,4 @@ scipy~=1.1.0 simpledbf~=0.2.6 virtualenv~=16.0.0 xlrd~=1.1.0 +psutil diff --git a/run_full_model.py b/run_full_model.py index 90e32a35..f10b4099 100644 --- a/run_full_model.py +++ b/run_full_model.py @@ -1,13 +1,20 @@ ''' +Clone repositoroy: git clone https://github.com/wri/carbon-budget -spotutil new r4.16xlarge dgibbs_wri --disk_size 1024 -c++ /home/dgibbs/carbon-budget/emissions/cpp_util/calc_gross_emissions_generic.cpp -o /home/dgibbs/carbon-budget/emissions/cpp_util/calc_gross_emissions_generic.exe -lgdal -python run_full_model.py -si -nu -t std -s forest_age_category_natrl_forest -r false -d 20219999 -l 00N_000E -ce loss -p biomass_soil -tcd 30 -ln "This is a log note" -python run_full_model.py -si -nu -t std -s all -r -d 20200327 -l all -ce loss -p biomass_soil -tcd 30 -ma true -pl true -ln "This is a log note" -python run_full_model.py -si -nu -t std -s all -r -d 20200822 -l all -ce loss,2000 -p biomass_soil -tcd 30 -ma true -ln "First attempt at running full standard model on all tiles for model v1.2.0. Hopefully this will be the sole, definitive run of standard model v1.2.0." +Create spot machine using spotutil: +spotutil new r5d.24xlarge dgibbs_wri -python run_full_model.py -si -nu -t biomass_swap -s all -r -d 20200919 -l all -ce loss -p biomass_soil -tcd 30 -sagg s3://gfw2-data/climate/carbon_model/0_4deg_output_aggregation/biomass_soil/standard/20200914/net_flux_Mt_CO2e_biomass_soil_per_year_tcd30_0_4deg_modelv1_2_0_std_20200914.tif -ln "Running sensitivity analysis for model v1.2.0" +Compile C++ emissions modulte (for standard model and sensitivity analyses that using standard emissions model) +c++ /usr/local/app/emissions/cpp_util/calc_gross_emissions_generic.cpp -o /usr/local/app/emissions/cpp_util/calc_gross_emissions_generic.exe -lgdal + +Run 00N_000E in standard model; save intermediate outputs; do upload outputs to s3; run all model stages; +starting from the beginning; get carbon pools at time of loss; emissions from biomass and soil +python run_full_model.py -si -t std -s all -r -d 20229999 -l 00N_000E -ce loss -p biomass_soil -tcd 30 -ln "00N_000E test" + +FULL STANDARD MODEL RUN: Run all tiles in standard model; save intermediate outputs; do upload outputs to s3; +run all model stages; starting from the beginning; get carbon pools at time of loss; emissions from biomass and soil +python run_full_model.py -si -t std -s all -r -l all -ce loss -p biomass_soil -tcd 30 -ln "Running all tiles" ''' @@ -19,13 +26,13 @@ import constants_and_names as cn import universal_util as uu from data_prep.mp_model_extent import mp_model_extent -from gain.mp_annual_gain_rate_mangrove import mp_annual_gain_rate_mangrove -from gain.mp_US_removal_rates import mp_US_removal_rates -from gain.mp_forest_age_category_IPCC import mp_forest_age_category_IPCC -from gain.mp_annual_gain_rate_IPCC_defaults import mp_annual_gain_rate_IPCC_defaults -from gain.mp_annual_gain_rate_AGC_BGC_all_forest_types import mp_annual_gain_rate_AGC_BGC_all_forest_types -from gain.mp_gain_year_count_all_forest_types import mp_gain_year_count_all_forest_types -from gain.mp_gross_removals_all_forest_types import mp_gross_removals_all_forest_types +from removals.mp_annual_gain_rate_mangrove import mp_annual_gain_rate_mangrove +from removals.mp_US_removal_rates import mp_US_removal_rates +from removals.mp_forest_age_category_IPCC import mp_forest_age_category_IPCC +from removals.mp_annual_gain_rate_IPCC_defaults import mp_annual_gain_rate_IPCC_defaults +from removals.mp_annual_gain_rate_AGC_BGC_all_forest_types import mp_annual_gain_rate_AGC_BGC_all_forest_types +from removals.mp_gain_year_count_all_forest_types import mp_gain_year_count_all_forest_types +from removals.mp_gross_removals_all_forest_types import mp_gross_removals_all_forest_types from carbon_pools.mp_create_carbon_pools import mp_create_carbon_pools from emissions.mp_calculate_gross_emissions import mp_calculate_gross_emissions from analyses.mp_net_flux import mp_net_flux @@ -220,47 +227,46 @@ def main (): cn.soil_C_full_extent_2000_dir, cn.total_C_2000_dir] # Adds the biomass_soil output directories or the soil_only output directories depending on the model run - if 'gross_emissions' in actual_stages: - if emitted_pools == 'biomass_soil': - output_dir_list = output_dir_list + [cn.gross_emis_commod_biomass_soil_dir, - cn.gross_emis_shifting_ag_biomass_soil_dir, - cn.gross_emis_forestry_biomass_soil_dir, - cn.gross_emis_wildfire_biomass_soil_dir, - cn.gross_emis_urban_biomass_soil_dir, - cn.gross_emis_no_driver_biomass_soil_dir, - cn.gross_emis_all_gases_all_drivers_biomass_soil_dir, - cn.gross_emis_co2_only_all_drivers_biomass_soil_dir, - cn.gross_emis_non_co2_all_drivers_biomass_soil_dir, - cn.gross_emis_nodes_biomass_soil_dir] + if emitted_pools == 'biomass_soil': + output_dir_list = output_dir_list + [cn.gross_emis_commod_biomass_soil_dir, + cn.gross_emis_shifting_ag_biomass_soil_dir, + cn.gross_emis_forestry_biomass_soil_dir, + cn.gross_emis_wildfire_biomass_soil_dir, + cn.gross_emis_urban_biomass_soil_dir, + cn.gross_emis_no_driver_biomass_soil_dir, + cn.gross_emis_all_gases_all_drivers_biomass_soil_dir, + cn.gross_emis_co2_only_all_drivers_biomass_soil_dir, + cn.gross_emis_non_co2_all_drivers_biomass_soil_dir, + cn.gross_emis_nodes_biomass_soil_dir] - else: - output_dir_list = output_dir_list + [cn.gross_emis_commod_soil_only_dir, - cn.gross_emis_shifting_ag_soil_only_dir, - cn.gross_emis_forestry_soil_only_dir, - cn.gross_emis_wildfire_soil_only_dir, - cn.gross_emis_urban_soil_only_dir, - cn.gross_emis_no_driver_soil_only_dir, - cn.gross_emis_all_gases_all_drivers_soil_only_dir, - cn.gross_emis_co2_only_all_drivers_soil_only_dir, - cn.gross_emis_non_co2_all_drivers_soil_only_dir, - cn.gross_emis_nodes_soil_only_dir] + else: + output_dir_list = output_dir_list + [cn.gross_emis_commod_soil_only_dir, + cn.gross_emis_shifting_ag_soil_only_dir, + cn.gross_emis_forestry_soil_only_dir, + cn.gross_emis_wildfire_soil_only_dir, + cn.gross_emis_urban_soil_only_dir, + cn.gross_emis_no_driver_soil_only_dir, + cn.gross_emis_all_gases_all_drivers_soil_only_dir, + cn.gross_emis_co2_only_all_drivers_soil_only_dir, + cn.gross_emis_non_co2_all_drivers_soil_only_dir, + cn.gross_emis_nodes_soil_only_dir] output_dir_list = output_dir_list + [cn.net_flux_dir] - if 'create_supplementary_outputs' in actual_stages: - output_dir_list = output_dir_list + \ - [cn.cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir, - cn.cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir, - cn.cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent_dir, - cn.gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent_dir, - cn.gross_emis_all_gases_all_drivers_biomass_soil_forest_extent_dir, - cn.gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent_dir, - cn.net_flux_per_pixel_full_extent_dir, - cn.net_flux_forest_extent_dir, - cn.net_flux_per_pixel_forest_extent_dir] - - - # Creates tiles of annual AGB and BGB gain rate and AGB stdev for mangroves using the standard model + # Supplementary outputs + output_dir_list = output_dir_list + \ + [cn.cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir, + cn.cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir, + cn.cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent_dir, + cn.gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent_dir, + cn.gross_emis_all_gases_all_drivers_biomass_soil_forest_extent_dir, + cn.gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent_dir, + cn.net_flux_per_pixel_full_extent_dir, + cn.net_flux_forest_extent_dir, + cn.net_flux_per_pixel_forest_extent_dir] + + + # Creates tiles of annual AGB and BGB removals rate and AGB stdev for mangroves using the standard model # removal function if 'annual_removals_mangrove' in actual_stages: @@ -275,7 +281,7 @@ def main (): uu.print_log(":::::Processing time for annual_gain_rate_mangrove:", elapsed_time, "\n") - # Creates tiles of annual AGC+BGC gain rate and AGC stdev for US-specific removals using the standard model + # Creates tiles of annual AGC+BGC removals rate and AGC stdev for US-specific removals using the standard model # removal function if 'annual_removals_us' in actual_stages: @@ -318,7 +324,7 @@ def main (): uu.print_log(":::::Processing time for forest_age_category_IPCC:", elapsed_time, "\n", "\n") - # Creates tiles of annual AGB and BGB gain rates using IPCC Table 4.9 defaults + # Creates tiles of annual AGB and BGB removals rates using IPCC Table 4.9 defaults if 'annual_removals_IPCC' in actual_stages: uu.print_log(":::::Creating tiles of annual aboveground and belowground removal rates using IPCC defaults") @@ -353,20 +359,20 @@ def main (): uu.print_log(":::::Freeing up memory for gain year count creation by deleting unneeded tiles") tiles_to_delete = [] # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_mangrove_biomass_2000))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_WHRC_biomass_2000_unmasked))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_WHRC_biomass_2000_unmasked))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGB_mangrove))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_BGB_mangrove))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGC_BGC_natrl_forest_Europe))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGC_BGC_planted_forest_unmasked))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGC_BGC_natrl_forest_US))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGC_natrl_forest_young))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_age_cat_IPCC))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGB_IPCC_defaults))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_BGB_IPCC_defaults))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_age_cat_IPCC))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGB_IPCC_defaults))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_BGB_IPCC_defaults))) # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGC_BGC_all_types))) # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_ifl_primary))) # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_planted_forest_type_unmasked))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_plant_pre_2000))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_plant_pre_2000))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGB_mangrove))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGC_BGC_natrl_forest_Europe))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGC_BGC_planted_forest_unmasked))) @@ -385,7 +391,7 @@ def main (): uu.print_log(":::::Creating tiles of gain year count for all removal pixels") start = datetime.datetime.now() - mp_gain_year_count_all_forest_types(sensit_type, tile_id_list, run_date = run_date) + mp_gain_year_count_all_forest_types(sensit_type, tile_id_list, run_date = run_date, no_upload=no_upload) end = datetime.datetime.now() elapsed_time = end - start @@ -414,8 +420,8 @@ def main (): uu.print_log(":::::Freeing up memory for carbon pool creation by deleting unneeded tiles") tiles_to_delete = [] - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_model_extent))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_age_cat_IPCC))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_model_extent))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_age_cat_IPCC))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGB_IPCC_defaults))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_BGB_IPCC_defaults))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_BGC_all_types))) @@ -426,14 +432,6 @@ def main (): # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_cumul_gain_AGCO2_BGCO2_all_types))) # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_ifl_primary))) # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_planted_forest_type_unmasked))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_plant_pre_2000))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGB_mangrove))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGC_BGC_natrl_forest_Europe))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGC_BGC_planted_forest_unmasked))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGC_BGC_natrl_forest_US))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGC_natrl_forest_young))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGB_IPCC_defaults))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_stdev_annual_gain_AGC_all_types))) uu.print_log(" Deleting", len(tiles_to_delete), "tiles...") for tile_to_delete in tiles_to_delete: @@ -472,7 +470,7 @@ def main (): tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_annual_gain_AGC_all_types))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_cumul_gain_AGCO2_all_types))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_cont_eco_processed))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_WHRC_biomass_2000_unmasked))) + # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_WHRC_biomass_2000_unmasked))) # tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_mangrove_biomass_2000))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_removal_forest_type))) uu.print_log(" Deleting", len(tiles_to_delete), "tiles...") diff --git a/sensitivity_analysis/US_removal_rates.py b/sensitivity_analysis/US_removal_rates.py index cd90fbd4..2b004476 100644 --- a/sensitivity_analysis/US_removal_rates.py +++ b/sensitivity_analysis/US_removal_rates.py @@ -59,7 +59,7 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g US_forest_group = '{0}_{1}.tif'.format(tile_id, cn.pattern_FIA_forest_group_processed) US_region = '{0}_{1}.tif'.format(tile_id, cn.pattern_FIA_regions_processed) - # Opens standard model gain rate tile + # Opens standard model removals rate tile with rasterio.open(annual_gain_standard) as annual_gain_standard_src: # Grabs metadata about the tif, like its location/projection/cell size @@ -96,7 +96,7 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g US_forest_group_window = US_forest_group_src.read(1, window=window) US_region_window = US_region_src.read(1, window=window) - # Masks the three input tiles (age category, forest group, FIA region) to the pixels to the standard gain model extent + # Masks the three input tiles (age category, forest group, FIA region) to the pixels to the standard removals model extent age_cat_masked_window = np.ma.masked_where(annual_gain_standard_window == 0, US_age_cat_window).filled(0).astype('uint16') US_forest_group_masked_window = np.ma.masked_where(annual_gain_standard_window == 0, US_forest_group_window).filled(0).astype('uint16') US_region_masked_window = np.ma.masked_where(annual_gain_standard_window == 0, US_region_window).filled(0).astype('uint16') @@ -105,12 +105,12 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g # make the codes (dictionary key) match. Then, combines the three rasters. These values now match the key values in the spreadsheet. group_region_age_combined_window = (age_cat_masked_window * 10 + US_forest_group_masked_window * 100 + US_region_masked_window).astype('float32') - # Applies the dictionary of group-region-age gain rates to the group-region-age numpy array to - # get annual gain rates (metric tons aboveground biomass/yr) for each pixel that has gain in the standard model + # Applies the dictionary of group-region-age removals rates to the group-region-age numpy array to + # get annual removals rates (metric tons aboveground biomass/yr) for each pixel that has removals in the standard model for key, value in gain_table_group_region_age_dict.items(): annual_gain_standard_window[group_region_age_combined_window == key] = value - # Replaces all values that have Hansen gain pixels with 0 so that they can be filled with Hansen gain pixel-specific + # Replaces all values that have Hansen gain pixels with 0 so that they can be filled with Hansen removals pixel-specific # values (rates for youngest forest age category) agb_without_gain_pixel_window = np.ma.masked_where(gain_window != 0, annual_gain_standard_window).filled(0) @@ -119,8 +119,8 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g agb_with_gain_pixel_window = (US_forest_group_masked_window * 100 + US_region_masked_window).astype('float32') agb_with_gain_pixel_window = np.ma.masked_where((gain_window == 0) & (annual_gain_standard_window != 0), agb_with_gain_pixel_window).filled(0) - # Applies the dictionary of region-age-group gain rates to the region-age-group array to - # get annual gain rates (metric tons aboveground biomass/yr) for each pixel that has gain in the standard model + # Applies the dictionary of region-age-group removals rates to the region-age-group array to + # get annual removals rates (metric tons aboveground biomass/yr) for each pixel that has removals in the standard model for key, value in gain_table_group_region_dict.items(): agb_with_gain_pixel_window[agb_with_gain_pixel_window == key] = value @@ -128,7 +128,7 @@ def US_removal_rate_calc(tile_id, gain_table_group_region_age_dict, gain_table_g # removal rates that has has values only where there are Hansen gain pixels agb_dst_window = agb_without_gain_pixel_window + agb_with_gain_pixel_window - # The maximum rate in the US-specific gain rate dictionary + # The maximum rate in the US-specific removals rate dictionary max_rate = max(gain_table_group_region_age_dict.values()) # Any pixel that has a rate 1.2x as large as the largest rate in the dictionary gets assigned the standard model's rate diff --git a/sensitivity_analysis/mp_US_removal_rates.py b/sensitivity_analysis/mp_US_removal_rates.py index d1bc355d..6a547a0c 100644 --- a/sensitivity_analysis/mp_US_removal_rates.py +++ b/sensitivity_analysis/mp_US_removal_rates.py @@ -17,7 +17,7 @@ as already incorporated that, so the youngest category (1000) means 0-20 years for non-south and 0-10 years for south, etc. After Hansenizing region, group, and age category, this script creates two dictionaries: one with removal rates by region-group-age combinations and another with the youngest rate for each region-group combination. -The first dictionary is applied to all standard gain model pixels according to their region-group-age combination +The first dictionary is applied to all standard removals model pixels according to their region-group-age combination but then is overwritten for any Hansen gain pixel with the youngest rate for that region-group combination applied (using the second dictionary). That is because we can assume that any Hansen gain pixel is in the youngest age category, i.e. it is more specific information than the Pan et al. forest age category raster, so we give that info priority. @@ -153,7 +153,8 @@ def main (): # Table with US-specific removal rates - cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.table_US_removal_rate), cn.docker_base_dir, '--no-sign-request'] + # cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.table_US_removal_rate), cn.docker_base_dir, '--no-sign-request'] + cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.table_US_removal_rate), cn.docker_base_dir] # Solution for adding subprocess output to log is from https://stackoverflow.com/questions/21953835/run-subprocess-and-print-output-to-logging process = Popen(cmd, stdout=PIPE, stderr=STDOUT) @@ -164,7 +165,7 @@ def main (): gain_table = pd.read_excel("{}".format(cn.table_US_removal_rate), sheet_name="US_rates_for_model") - # Converts gain table from wide to long, so each region-group-age category has its own row + # Converts removals table from wide to long, so each region-group-age category has its own row gain_table_group_region_by_age = pd.melt(gain_table, id_vars=['FIA_region_code', 'forest_group_code'], value_vars=['growth_young', 'growth_middle', 'growth_old']) @@ -174,14 +175,14 @@ def main (): age_dict = {'growth_young': 1000, 'growth_middle': 2000, 'growth_old': 3000} # Creates a unique value for each forest group-region-age category in the table. - # Although these rates are applied to all standard gain model pixels at first, they are not ultimately used for + # Although these rates are applied to all standard removals model pixels at first, they are not ultimately used for # pixels that have Hansen gain (see below). gain_table_group_region_age = gain_table_group_region_by_age.replace({"variable": age_dict}) gain_table_group_region_age['age_cat'] = gain_table_group_region_age['variable']*10 gain_table_group_region_age['group_region_age_combined'] = gain_table_group_region_age['age_cat'] + \ gain_table_group_region_age['forest_group_code']*100 + \ gain_table_group_region_age['FIA_region_code'] - # Converts the forest group-region-age codes and corresponding gain rates to a dictionary, + # Converts the forest group-region-age codes and corresponding removals rates to a dictionary, # where the key is the unique group-region-age code and the value is the AGB removal rate. gain_table_group_region_age_dict = pd.Series(gain_table_group_region_age.value.values, index=gain_table_group_region_age.group_region_age_combined).to_dict() uu.print_log(gain_table_group_region_age_dict) @@ -193,7 +194,7 @@ def main (): gain_table_group_region = gain_table_group_region_age.drop(gain_table_group_region_age[gain_table_group_region_age.age_cat != 10000].index) gain_table_group_region['group_region_combined'] = gain_table_group_region['forest_group_code']*100 + \ gain_table_group_region['FIA_region_code'] - # Converts the forest group-region codes and corresponding gain rates to a dictionary, + # Converts the forest group-region codes and corresponding removals rates to a dictionary, # where the key is the unique group-region code (youngest age category) and the value is the AGB removal rate. gain_table_group_region_dict = pd.Series(gain_table_group_region.value.values, index=gain_table_group_region.group_region_combined).to_dict() uu.print_log(gain_table_group_region_dict) diff --git a/sensitivity_analysis/mp_legal_AMZ_loss.py b/sensitivity_analysis/mp_legal_AMZ_loss.py index c9d34393..fc7a43de 100644 --- a/sensitivity_analysis/mp_legal_AMZ_loss.py +++ b/sensitivity_analysis/mp_legal_AMZ_loss.py @@ -305,7 +305,7 @@ def main (): uu.upload_final_set(stage_output_dir_list[3], stage_output_pattern_list[3]) - # Creates tiles of annual AGB and BGB gain rate for non-mangrove, non-planted forest using the standard model + # Creates tiles of annual AGB and BGB removals rate for non-mangrove, non-planted forest using the standard model # removal function if 'annual_removals' in actual_stages: @@ -340,7 +340,7 @@ def main (): uu.s3_flexible_download(dir, pattern, cn.docker_base_dir, sensit_type, tile_id_list) - # Table with IPCC Table 4.9 default gain rates + # Table with IPCC Table 4.9 default removals rates cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir] # Solution for adding subprocess output to log is from https://stackoverflow.com/questions/21953835/run-subprocess-and-print-output-to-logging @@ -350,14 +350,14 @@ def main (): pd.options.mode.chained_assignment = None - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates gain_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name="natrl fores gain, for std model") # Removes rows with duplicate codes (N. and S. America for the same ecozone) gain_table_simplified = gain_table.drop_duplicates(subset='gainEcoCon', keep='first') - # Converts gain table from wide to long, so each continent-ecozone-age category has its own row + # Converts removals table from wide to long, so each continent-ecozone-age category has its own row gain_table_cont_eco_age = pd.melt(gain_table_simplified, id_vars=['gainEcoCon'], value_vars=['growth_primary', 'growth_secondary_greater_20', 'growth_secondary_less_20']) @@ -382,7 +382,7 @@ def main (): # Merges the table of just continent-ecozone codes and the table of continent-ecozone-age codes gain_table_all_combos = pd.concat([gain_table_con_eco_only, gain_table_cont_eco_age]) - # Converts the continent-ecozone-age codes and corresponding gain rates to a dictionary + # Converts the continent-ecozone-age codes and corresponding removals rates to a dictionary gain_table_dict = pd.Series(gain_table_all_combos.value.values, index=gain_table_all_combos.cont_eco_age).to_dict() @@ -421,7 +421,7 @@ def main (): uu.upload_final_set(stage_output_dir_list[i], stage_output_pattern_list[i]) - # Creates tiles of cumulative AGCO2 and BGCO2 gain rate for non-mangrove, non-planted forest using the standard model + # Creates tiles of cumulative AGCO2 and BGCO2 removals rate for non-mangrove, non-planted forest using the standard model # removal function if 'cumulative_removals' in actual_stages: @@ -456,13 +456,13 @@ def main (): uu.s3_flexible_download(dir, pattern, cn.docker_base_dir, sensit_type, tile_id_list) - # Calculates cumulative aboveground carbon gain in non-mangrove planted forests + # Calculates cumulative aboveground carbon removals in non-mangrove planted forests output_pattern_list = stage_output_pattern_list pool = multiprocessing.Pool(int(cn.count/3)) pool.map(partial(cumulative_gain_natrl_forest.cumulative_gain_AGCO2, output_pattern_list=output_pattern_list, sensit_type=sensit_type), tile_id_list) - # Calculates cumulative belowground carbon gain in non-mangrove planted forests + # Calculates cumulative belowground carbon removals in non-mangrove planted forests pool = multiprocessing.Pool(int(cn.count/3)) pool.map(partial(cumulative_gain_natrl_forest.cumulative_gain_BGCO2, output_pattern_list=output_pattern_list, sensit_type=sensit_type), tile_id_list) @@ -482,7 +482,7 @@ def main (): uu.upload_final_set(stage_output_dir_list[i], stage_output_pattern_list[i]) - # Creates tiles of annual gain rate and cumulative removals for all forest types (above + belowground) + # Creates tiles of annual removals rate and cumulative removals for all forest types (above + belowground) if 'removals_merged' in actual_stages: uu.print_log('Creating annual and cumulative removals for all forest types combined (above + belowground)') @@ -602,7 +602,7 @@ def main (): stage_output_pattern_list = uu.alter_patterns(sensit_type, master_output_pattern_list[10:16]) - # Table with IPCC Wetland Supplement Table 4.4 default mangrove gain rates + # Table with IPCC Wetland Supplement Table 4.4 default mangrove removals rates cmd = ['aws', 's3', 'cp', os.path.join(cn.gain_spreadsheet_dir, cn.gain_spreadsheet), cn.docker_base_dir] # Solution for adding subprocess output to log is from https://stackoverflow.com/questions/21953835/run-subprocess-and-print-output-to-logging @@ -613,7 +613,7 @@ def main (): pd.options.mode.chained_assignment = None - # Imports the table with the ecozone-continent codes and the carbon gain rates + # Imports the table with the ecozone-continent codes and the carbon removals rates gain_table = pd.read_excel("{}".format(cn.gain_spreadsheet), sheet_name="mangrove gain, for model") diff --git a/universal_util.py b/universal_util.py index 1faa6024..3585dcf0 100644 --- a/universal_util.py +++ b/universal_util.py @@ -7,11 +7,11 @@ import rasterio import logging import csv +import psutil from shutil import copyfile import os import multiprocessing from multiprocessing.pool import Pool -from functools import partial from shutil import copy import re import pandas as pd @@ -240,6 +240,41 @@ def check_storage(): "; Percent storage used:", percent_storage_used) +# Obtains the absolute number of RAM gigabytes currently in use by the entire system (all processors). +# https://www.pragmaticlinux.com/2020/12/monitor-cpu-and-ram-usage-in-python-with-psutil/ +# The outputs from this don't exactly match the memory shown in htop but I think it's close enough to be useful. +# It seems to slightly over-estimate memory usage (by ~1-2 GB). +def check_memory(): + + used_memory = (psutil.virtual_memory().total - psutil.virtual_memory().available)/1024/1024/1000 + total_memory = psutil.virtual_memory().total/1024/1024/1000 + percent_memory = used_memory/total_memory*100 + print_log(f"Memory usage is: {round(used_memory,2)} GB out of {round(total_memory,2)} = {round(percent_memory,1)}% usage") + + if percent_memory > 99: + print_log("WARNING: MEMORY USAGE DANGEROUSLY HIGH! TERMINATING PROGRAM.") # Not sure if this is necessary + exception_log("EXCEPTION: MEMORY USAGE DANGEROUSLY HIGH! TERMINATING PROGRAM.") + + +# Not currently using because it shows 1 when using with multiprocessing +# (although it seems to work fine when not using multiprocessing) +def counter(func): + """ + A decorator that counts and prints the number of times a function has been executed + https://stackoverflow.com/a/1594484 way down at the bottom of the post in the examples section + """ + + @functools.wraps(func) + def wrapper_count(*args, **kwargs): + wrapper_count.count = wrapper_count.count + 1 + print("Number of times {0} has been used: {1}".format(func.__name__, wrapper_count.count)) + res = func(*args, **kwargs) + return res + + wrapper_count.count = 0 + return wrapper_count + + # Gets the tile id from the full tile name using a regular expression def get_tile_id(tile_name): @@ -289,7 +324,8 @@ def tile_list_s3(source, sensit_type='std'): ## For an s3 folder in a bucket using AWSCLI # Captures the list of the files in the folder - out = Popen(['aws', 's3', 'ls', new_source, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', new_source, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', new_source], stdout=PIPE, stderr=STDOUT) stdout, stderr = out.communicate() # Writes the output string to a text file for easier interpretation @@ -322,7 +358,8 @@ def tile_list_s3(source, sensit_type='std'): ## For an s3 folder in a bucket using AWSCLI # Captures the list of the files in the folder - out = Popen(['aws', 's3', 'ls', source, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', source, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', source], stdout=PIPE, stderr=STDOUT) stdout, stderr = out.communicate() # Writes the output string to a text file for easier interpretation @@ -403,14 +440,16 @@ def create_combined_tile_list(set1, set2, set3=None, sensit_type='std'): set2 = set2.replace('standard', sensit_type) - out = Popen(['aws', 's3', 'ls', set1, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', set1, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', set1], stdout=PIPE, stderr=STDOUT) stdout, stderr = out.communicate() # Writes the output string to a text file for easier interpretation set1_tiles = open("set1.txt", "wb") set1_tiles.write(stdout) set1_tiles.close() - out = Popen(['aws', 's3', 'ls', set2, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', set2, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', set2], stdout=PIPE, stderr=STDOUT) stdout2, stderr2 = out.communicate() # Writes the output string to a text file for easier interpretation set2_tiles = open("set2.txt", "wb") @@ -454,7 +493,8 @@ def create_combined_tile_list(set1, set2, set3=None, sensit_type='std'): set1 = set1.replace(sensit_type, 'standard') print_log(" Looking for alternative tile set in {}".format(set1)) - out = Popen(['aws', 's3', 'ls', set1, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', set1, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', set1], stdout=PIPE, stderr=STDOUT) stdout, stderr = out.communicate() # Writes the output string to a text file for easier interpretation set1_tiles = open("set1.txt", "wb") @@ -485,7 +525,8 @@ def create_combined_tile_list(set1, set2, set3=None, sensit_type='std'): set2 = set2.replace(sensit_type, 'standard') print_log(" Looking for alternative tile set in {}".format(set2)) - out = Popen(['aws', 's3', 'ls', set2, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', set2, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', set2], stdout=PIPE, stderr=STDOUT) stdout2, stderr2 = out.communicate() # Writes the output string to a text file for easier interpretation set2_tiles = open("set2.txt", "wb") @@ -518,7 +559,8 @@ def create_combined_tile_list(set1, set2, set3=None, sensit_type='std'): else: set3 = set3.replace('standard', sensit_type) - out = Popen(['aws', 's3', 'ls', set3, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', set3, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', set3], stdout=PIPE, stderr=STDOUT) stdout3, stderr3 = out.communicate() # Writes the output string to a text file for easier interpretation set3_tiles = open("set3.txt", "wb") @@ -572,7 +614,8 @@ def count_tiles_s3(source, pattern=None): ## For an s3 folder in a bucket using AWSCLI # Captures the list of the files in the folder - out = Popen(['aws', 's3', 'ls', source, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + # out = Popen(['aws', 's3', 'ls', source, '--no-sign-request'], stdout=PIPE, stderr=STDOUT) + out = Popen(['aws', 's3', 'ls', source], stdout=PIPE, stderr=STDOUT) stdout, stderr = out.communicate() # Writes the output string to a text file for easier interpretation @@ -589,8 +632,10 @@ def count_tiles_s3(source, pattern=None): num = len(line.strip('\n').split(" ")) tile_name = line.strip('\n').split(" ")[num - 1] - # For gain, tcd, and pixel area tiles, which have the tile_id after the the pattern - if pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_pixel_area, cn.pattern_loss]: + # For gain, tcd, pixel area, and loss tiles (and their rewindowed versions), + # which have the tile_id after the the pattern + if pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_pixel_area, cn.pattern_loss, + cn.pattern_gain_rewindow, cn.pattern_tcd_rewindow, cn.pattern_pixel_area_rewindow]: if tile_name.endswith('.tif'): tile_id = get_tile_id(tile_name) file_list.append(tile_id) @@ -641,7 +686,7 @@ def s3_flexible_download(source_dir, pattern, dest, sensit_type, tile_id_list): if tile_id_list == 'all': s3_folder_download(source_dir, dest, sensit_type, pattern) - # For downloading test tiles (twenty or fewer). Chose 10 because the US removals sensitivity analysis uses 16 tiles. + # For downloading test tiles (twenty or fewer). elif len(tile_id_list) <= 20: # Creates a full download name (path and file) @@ -716,6 +761,8 @@ def s3_folder_download(source, dest, sensit_type, pattern = None): cmd = ['aws', 's3', 'cp', source_final, dest, '--no-sign-request', '--recursive', '--exclude', '*tiled/*', '--exclude', '*geojason', '--exclude', '*vrt', '--exclude', '*csv', '--no-progress'] + # cmd = ['aws', 's3', 'cp', source_final, dest, '--no-sign-request', '--recursive', '--exclude', '*tiled/*', + # '--exclude', '*geojason', '--exclude', '*vrt', '--exclude', '*csv'] log_subprocess_output_full(cmd) print_log('\n') @@ -730,6 +777,8 @@ def s3_folder_download(source, dest, sensit_type, pattern = None): cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--recursive', '--exclude', '*tiled/*', '--exclude', '*geojason', '--exclude', '*vrt', '--exclude', '*csv', '--no-progress'] + # cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--recursive', '--exclude', '*tiled/*', + # '--exclude', '*geojason', '--exclude', '*vrt', '--exclude', '*csv'] log_subprocess_output_full(cmd) print_log('\n') @@ -750,6 +799,8 @@ def s3_folder_download(source, dest, sensit_type, pattern = None): cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--recursive', '--exclude', '*tiled/*', '--exclude', '*geojason', '--exclude', '*vrt', '--exclude', '*csv', '--no-progress'] + # cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--recursive', '--exclude', '*tiled/*', + # '--exclude', '*geojason', '--exclude', '*vrt', '--exclude', '*csv'] log_subprocess_output_full(cmd) @@ -790,7 +841,8 @@ def s3_file_download(source, dest, sensit_type): print_log("Option 2: Checking for sensitivity analysis tile {0}/{1} on s3...".format(dir_sens[15:], file_name_sens)) # If not already downloaded, first tries to download the sensitivity analysis version - cmd = ['aws', 's3', 'cp', '{0}/{1}'.format(dir_sens, file_name_sens), dest, '--no-sign-request', '--only-show-errors'] + # cmd = ['aws', 's3', 'cp', '{0}/{1}'.format(dir_sens, file_name_sens), dest, '--no-sign-request', '--only-show-errors'] + cmd = ['aws', 's3', 'cp', '{0}/{1}'.format(dir_sens, file_name_sens), dest, '--only-show-errors'] log_subprocess_output_full(cmd) if os.path.exists(file_name_sens): @@ -813,7 +865,8 @@ def s3_file_download(source, dest, sensit_type): # If not already downloaded, final option is to try to download the standard version of the tile. # If this doesn't work, the script throws a fatal error because no variant of this tile was found. - cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--only-show-errors'] + # cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--only-show-errors'] + cmd = ['aws', 's3', 'cp', source, dest, '--only-show-errors'] log_subprocess_output_full(cmd) if os.path.exists(file_name): @@ -836,7 +889,8 @@ def s3_file_download(source, dest, sensit_type): # If the tile isn't already downloaded, download is attempted source = os.path.join(dir, file_name) - cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--only-show-errors'] + # cmd = ['aws', 's3', 'cp', source, dest, '--no-sign-request', '--only-show-errors'] + cmd = ['aws', 's3', 'cp', source, dest, '--only-show-errors'] log_subprocess_output_full(cmd) if os.path.exists(os.path.join(dest, file_name)): print_log(" Option 2 success: Tile {} found on s3 and downloaded".format(source), "\n") @@ -867,7 +921,8 @@ def upload_final(upload_dir, tile_id, pattern): file = '{}_{}.tif'.format(tile_id, pattern) print_log("Uploading {}".format(file)) - cmd = ['aws', 's3', 'cp', file, upload_dir, '--no-sign-request', '--no-progress'] + # cmd = ['aws', 's3', 'cp', file, upload_dir, '--no-sign-request', '--no-progress'] + cmd = ['aws', 's3', 'cp', file, upload_dir, '--no-progress'] try: log_subprocess_output_full(cmd) @@ -975,9 +1030,13 @@ def get_raster_nodata_value(tile): # Prints information about the tile that was just processed: how long it took and how many tiles have been completed def end_of_fx_summary(start, tile_id, pattern, no_upload): + # Checking memory at this point (end of the function) seems to record memory usage when it is at its peak + check_memory() + end = datetime.datetime.now() elapsed_time = end-start print_log("Processing time for tile", tile_id, ":", elapsed_time) + count_completed_tiles(pattern) # If no_upload flag is not activated, log is uploaded @@ -1374,6 +1433,7 @@ def rewindow(tile_id, download_pattern_name, no_upload): in_tile = "{0}_{1}.tif".format(tile_id, download_pattern_name) out_tile = "{0}_{1}_rewindow.tif".format(tile_id, download_pattern_name) + check_memory() # Only rewindows if the tile exists if os.path.exists(in_tile):