Skip to content

Latest commit

 

History

History
114 lines (98 loc) · 11.5 KB

README.md

File metadata and controls

114 lines (98 loc) · 11.5 KB

ET_agriculture

This repository is for an analysis of the effect of ET on agriculture as witnessed by differences in ECOSTRESS ET estimates over land cover and crop types. See report from JPL 2021 internship: https://docs.google.com/document/d/1xOjTGxKnkD_1tZrXiKZeYXzrA0Pmw3cboQdNA6kfm3U/edit?usp=sharing

Structure and Contents

This repository is organized into two folders. The data folder contains raw and intermediate datasets, as well as final datasets used in the analysis. The code folder consists of code to (1) download data, (2) process the data into the intermediate and final datasets, (3) conduct analyses on the final data. The code folder is numbered in order of folders that are completed sequentially, and the numbers listed next to the data folders and data files correspond to the number of the code file used to create it.

  • data (1,0) : all data is listed in the .gitignore and is thus not stored on github. To retrieve data in its final form without using the given code to direct the raw data from its original source, please refer to this drive folder

  • code: all code for data download, processing, and analysis.

    • 1_download_data:download data found in data/raw/

      • 0_repo_structure.R: create the basic repository structure
      • 1_download_shapefiles.R: Download county shapefile and create the required individual county shapefiles
      • 2_ECOSTRESS
        • 0_repo_structure.R: create the basic repository structure of data/raw/ECOSTRESS
        • 1_APPEEARS_requests: documentation on the appeears (https://lpdaacsvc.cr.usgs.gov/appeears/task/area) requests which can be copied to make the requests again
        • 2_download_links: links provided by appeears that need to be downloaded
        • 3_download_scripts:
          • generic-download.sh: template to download the links listed in 2_download_links. Replace all instances of "https://..." with the correct links. Note to position oneself in the correct download folder when running these scripts (ex cd data/raw/ECOSTRESS) and then calling something like "bash ../../../code/1_download_data/2_ECOSTRESS/3_download_scripts/by_request/California-inst-PT-JPL-2-19-5-19.sh". Appeears account and password is needed.
          • by_request: a folder of scripts to download for each request
      • 3_CDL.R: Download years 2018-2020 of the Cropland Data Layer
      • 4_USGS_waterdata.R: Supposed to download the county level California irrigation data from the USGS. SCRIPT NOT FUNCTIONAL: DOWNLOAD MANUALLY and then break up into metadata (first few lines of data) and data
      • 5_DWR_crop.R: Download the 2018 crop map shapefile of California in 18 from the Department of Water Resources
    • 2_build_dataset: create data in data/intermediate and data/for_analysis

      • 1_intermediate: create data in data/intermediate
        • x_CDL_code_dictionary.R: create the code dictionary for crop types
        • 1_consistent_grid.R: create one consistent 70m grid for all data to be resampled to.
        • 2_agriculture.R: create a shapefile and raster of agriculture based on DWR data
        • 3_vegetation.R: create a shapefile and raster of the natural vegetation counterfacutal
        • 4_elevation_aspect_slope.R: create rasters of elevation, aspect, and slope from NED sampled to the consistent grid.
        • 5_soils.R: create a raster for the storie index resampled to the consistent grid
        • 6_PET.R: create a geotif rasterbrick of all the available PET data
        • 6.5_PET.py: take the output of 6_PET.R and resample it temporally to aggregate to necessary timesteps. Resample it to the common CA grid from 3_consistent_grid.R
        • 6.75_PET_grouped_avg.ipynb: average the output of 6.5_PET_yeargrouped.py across years such that you end up with a stack of only 6 images
        • 7_ECOSTRESS_resample.R: resample each ET tif and its corresponding uncertainties to the CA_grid. Remove all data that don't have uncertainties.
        • 7.5_ECOSTRESS_subyears.R: similar to 6.5_PET_yeargrouped_average.py, this takes the average of all tifs in a given time period.
        • 7_ECOSTRESS_scratch: I ended up trying to process the ECOSTRESS data in so many different ways that I created a scratch folder for the ways that didn't work out.
          • 7_ECOSTRESS.R: This file takes the ECOSTRESS data, resamples it to the consistent CA grid, and stacks it. It also creates an accompanying brick of uncertainties. If the uncertainties are missing, then that layer is simply NA. Note that this file can error out because it uses a lot of compute and memory, so I ended up running it in pieces as shown in the 7_ECOSTRESS folder.
          • 7.5_ECOSTRESS.py:take the output of 7_ECOSTRESS.R and resample it temporally to aggregate to necessary timesteps. Resample it to the common CA grid from 3_consistent_grid.R
          • 7_ECOSTRESS_OGmethod.R: This file generates a csv from all the ECOSTRESS data
      • 2_for_analysis: create data in data/for_analysis
        • 1_combine_intermediate.py: This script combines all the scripts created in 2,1 to make a dataset of the full grid for all time invariant data (not ET and PET)
        • 1.25_combine_intermediate_PET.py: This adds the time varrying PET column
        • 1.5_combine_intermediate_ET.py: This adds the time varrying ET column
        • 2_random_forest.py: This script uses sklearn random forest with 100 trees to predict ET for each of our timesteps. We validate the model both with a simple 20% test set and a spatial crossvalidation on 1x1 coordianate degree cells. We then apply the model to generate agriculture_sklearn_RF.csv.
    • 3_analysis: conduct analyses on final dataset(s)

      • 0_scratch: figures and tests done along the way that do not make it into the final results
        • 1_USGS_county_irrigation.Rmd: make some maps of the USGS county irrigation data
        • 2_CDL_DWR_USGS_crop_compare.Rmd: compare different land use data
      • 1_final: final results for the resulting paper. Written in Rmd; datasets used listed,