đź‘Ą Ryan McClure, Quinn Thomas, Mary Lofton, Whitney Woelmer, and Cayelan Carey
đź‘Ą Special thanks to: Ashley Mickens, Bethany Bookout, Alexandria Hounshell, Abby Lewis, Heather Wander, Rachel Corrigan, James Maze, Dexter Howard, Jacob Wynne, and Nick Hammond, Paul Hanson, Erin Hotchkiss, Madeline Schreiber, and Bobbie Niederlehner
Questions? 📧 [email protected] and [email protected]
Thank you for checking out CH4castR. Near-term, ecological forecasting with iterative model refitting and uncertainty partitioning has great promise for improving our understanding of ecological processes and the predictive skill of ecological models, but to date has been infrequently applied to predict biogeochemical fluxes. Bubble fluxes of methane (CH4) from aquatic sediments to the atmosphere (ebullition) dominate freshwater greenhouse gas emissions, but it remains unknown how best to make robust near-term CH4 ebullition predictions using models. Near-term forecasting workflows have the potential to address several current challenges in predicting CH4 ebullition rates, including: development of models that can be applied across time horizons and ecosystems, identification of the timescales for which predictions can provide useful information, and quantification of uncertainty in predictions.
We developed and tested a near-term, iterative forecasting workflow of CH4 ebullition rates in a small eutrophic reservoir throughout one open-water period. The workflow included the repeated updating of a CH4 ebullition forecast model over time with newly-collected data via iterative model refitting. We compared the CH4 forecasts from our workflow to both alternative forecasts generated without iterative model refitting and a persistence null model. This code provided is our workflow to generate forecasts of CH4 ebullition rates in a eutrophic reservoir in Virginia, USA.
CH4castR has been tested across Windows and Mac OS. NOTE --> CH4castR does not currently work with Apple's new M1 arm64 processor. Jags has not yet been prepped for this new processor. However, workarounds to get it working can be found at SourceForge
Jags download for Mac users can be found here: https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Mac%20OS%20X/ If you have a M1 arm 64 mac and are running the arm64 version of R 4.1.0, this code will not work as JAGS is built on x86-64 (I learned this the hard way!). Jags download for PC users can be found here: https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Windows/ Jags download for Linux users can be found here: https://packages.qa.debian.org/j/jags.html
- Go to the CH4castR repository and copy the repo URL.
- Open R
- Start a new project: File > New Project
- Select: Version Control > Git
- Paste the repo's URL into "Repository URL:", keep the project directory name as the default, select "open in new session", and click New Project
- 01_Data_Compile.R
- 02_model_training.R
- 03_Generate_Forecasts.R
- 04_Forecast_Evaluations.R
- You can either source the 01_Data_Compile.R script or run through it incrementally.
Getting default packages
# Download packages
if (!"pacman" %in% installed.packages()) install.packages("pacman")
pacman::p_load(tidyverse, MCMCvis, lubridate, tidybayes,
ncdf4, reshape2, zoo, patchwork, hydroGOF, viridis,
imputeTS, devtools, scales, forecast, coda, rjags, R2jags)
Pull all of the available data from Environmental Data Initiative
### Pull together all of the observations ###
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
### Catwalk data from dam site (See Figure 1 in MS)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# for more information on the catwalk data in EDI refer to this link:
# https://portal.edirepository.org/nis/mapbrowse?packageid=edi.271.5
inUrl1 <- "https://pasta.lternet.edu/package/data/eml/edi/271/5/c1b1f16b8e3edbbff15444824b65fe8f"
infile1 <- tempfile()
try(download.file(inUrl1,infile1,method="curl"))
if (is.na(file.size(infile1))) download.file(inUrl1,infile1,method="auto")
catwalk <-read.csv(infile1,header=F, skip=1, sep=",",
col.names=c("Reservoir",
"Site",
"DateTime",
"ThermistorTemp_C_surface",
"ThermistorTemp_C_1",
"ThermistorTemp_C_2",
"ThermistorTemp_C_3",
"ThermistorTemp_C_4",
"ThermistorTemp_C_5",
"ThermistorTemp_C_6",
"ThermistorTemp_C_7",
"ThermistorTemp_C_8",
"ThermistorTemp_C_9",
"RDO_mgL_5",
"RDOsat_percent_5",
"RDO_mgL_5_adjusted",
"RDOsat_percent_5_adjusted",
"RDOTemp_C_5",
"RDO_mgL_9",
"RDOsat_percent_9",
"RDO_mgL_9_adjusted",
"RDOsat_percent_9_adjusted",
"RDOTemp_C_9",
"EXOTemp_C_1",
"EXOCond_uScm_1",
"EXOSpCond_uScm_1",
"EXOTDS_mgL_1",
"EXODOsat_percent_1",
"EXODO_mgL_1",
"EXOChla_RFU_1",
"EXOChla_ugL_1",
"EXOBGAPC_RFU_1",
"EXOBGAPC_ugL_1",
"EXOfDOM_RFU_1",
"EXOfDOM_QSU_1",
"EXO_pressure",
"EXO_depth",
"EXO_battery",
"EXO_cablepower",
"EXO_wiper",
"Lvl_psi_9",
"LvlTemp_C_9",
"RECORD",
"CR6_Batt_V",
"CR6Panel_Temp_C",
"Flag_All",
"Flag_DO_1",
"Flag_DO_5",
"Flag_DO_9",
"Flag_Chla",
"Flag_Phyco",
"Flag_TDS",
"Flag_fDOM",
"Flag_Temp_Surf",
"Flag_Temp_1",
"Flag_Temp_2",
"Flag_Temp_3",
"Flag_Temp_4",
"Flag_Temp_5",
"Flag_Temp_6",
"Flag_Temp_7",
"Flag_Temp_8",
"Flag_Temp_9"),
check.names=TRUE)
unlink(infile1)
### Aggregate to daily temps between 2 and 3 meters
cat_sum <- catwalk %>%
select(DateTime, ThermistorTemp_C_2, ThermistorTemp_C_3) %>%
rename(time = DateTime)%>%
mutate(time = as_date(time))%>%
melt(., id = 'time')%>%
mutate(value = as.numeric(value))%>%
group_by(time)%>%
summarise(mean = mean(value),
mean_se = se(value))
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
### CTD data from dam site (See Figure 1 in MS)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# for more information on the CTD data in EDI refer to this link:
# https://portal.edirepository.org/nis/mapbrowse?packageid=edi.200.11
inUrl1 <- "https://pasta.lternet.edu/package/data/eml/edi/200/10/2461524a7da8f1906bfc3806d594f94c"
infile1 <- tempfile()
try(download.file(inUrl1,infile1,method="curl"))
if (is.na(file.size(infile1))) download.file(inUrl1,infile1,method="auto")
ctd <-read.csv(infile1, header=F, skip=1, sep=",",
col.names=c("Reservoir",
"Site",
"Date",
"Depth_m",
"Temp_C",
"DO_mgL",
"DO_pSat",
"Cond_uScm",
"Spec_Cond_uScm",
"Chla_ugL",
"Turb_NTU",
"pH",
"ORP_mV",
"PAR_umolm2s",
"Desc_rate",
"Flag_Temp",
"Flag_DO",
"Flag_Cond",
"Flag_SpecCond",
"Flag_Chla",
"Flag_Turb",
"Flag_pH",
"Flag_ORP",
"Flag_PAR",
"Flag_DescRate"),
check.names=TRUE)
unlink(infile1)
ctd_sum <- ctd %>% select(Reservoir, Date, Site, Depth_m, Temp_C)%>%
filter(Reservoir == "FCR" & Site == 50)%>%
filter(Date < "2018-07-05" & Date >= "2017-05-07")%>%
filter(Depth_m > 2 & Depth_m <= 3)%>%
rename(time = Date)%>%
mutate(time = as_date(time))%>%
select(time, Depth_m, Temp_C)%>%
mutate(temp = as.numeric(Temp_C))%>%
group_by(time)%>%
summarise(mean = mean(temp),
mean_se = se(temp))%>%
filter(time != "2017-05-27" & ### Extract CTD casts from days that FCR with Mixed using an epilimnetic Mixer.
time != "2017-07-07" & ### Refer to Lofton et al., 2018 and Chen et al., 2018 for details on the mixer
time != "2017-07-08" &
time != "2017-07-09" &
time != "2017-07-10" &
time != "2017-07-11" &
time != "2017-07-12" &
time != "2017-10-08")
# make an observed damn site water temp column.
water_temp <- rbind(ctd_sum, cat_sum)%>%
filter(time < "2019-11-08")
# Pull in hobo data
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hobo <- readRDS("./observed/EDI_DATA_HOBO_TEMPS_2017_2019.rds")%>%
dplyr::rename(time = DateTime)%>%
dplyr::mutate(time = as_date(time))%>%
dplyr::group_by(time) %>%
dplyr::summarize(temperature_se = se(Sed_temp, na.rm = T),
temperature = mean(Sed_temp, na.rm = T))%>%
select(time, temperature, temperature_se)%>%
filter(time > "2017-05-06")
# Read in the observed ebullition throughout the entire ebullition sampling period --> 2017 to 2019
# importantly, we are specifically selecting traps JUST FROM Transect 1
# Refer to Mcclure et al., 2020 for more information in regard to the other Transects in the reservoir
ebu <- read_csv("./observed/EDI_DATA_EBU_DIFF_DEPTH_2015_2019.csv") %>%
rename(time = DateTime) %>%
filter(Transect == "T1")%>%
select(time, Site, Ebu_rate)%>%
mutate(ebu_rate = ifelse(is.infinite(Ebu_rate),NA,Ebu_rate))%>%
group_by(time) %>%
summarize(ebu_rate_se = se(ebu_rate, na.rm = T),
ebu_rate = mean(ebu_rate, na.rm = T))%>%
select(time, ebu_rate, ebu_rate_se)
ebu$ebu_rate[is.nan(as.numeric(ebu$ebu_rate))] <- NA
time <- as.data.frame(seq(from = as.Date("2017-05-07"), to = as.Date("2019-11-07"), by = "day"))%>%
rename(time = `seq(from = as.Date(\"2017-05-07\"), to = as.Date(\"2019-11-07\"), by = \"day\")`)
Join all of the data products that will feed into the JAGS model-fitting
# Join with all previous data and make the object that will feed directly into the Jags model
full_ebullition_model <- left_join(time, hobo, by = "time") %>%
left_join(., water_temp, by = "time")%>%
left_join(., ebu, by = "time")%>%
mutate(year = year(time))%>%
filter(year != "2018")
full_ebullition_model_17 <- full_ebullition_model%>%
filter(time >= "2017-05-07") %>%
filter(time <= "2017-10-30") %>%
rename(water_temp_dam = mean) %>%
rename(water_temp_dam_sd = mean_se)
full_ebullition_model_18 <- full_ebullition_model%>%
filter(time >= "2018-05-07") %>%
filter(time <= "2018-10-29") %>%
rename(water_temp_dam = mean) %>%
rename(water_temp_dam_sd = mean_se)
full_ebullition_model_19 <- full_ebullition_model%>%
filter(time >= "2019-05-27") %>%
filter(time <= "2019-11-07") %>%
rename(water_temp_dam = mean) %>%
rename(water_temp_dam_sd = mean_se)
full_ebullition_model_alltrap <- bind_rows(full_ebullition_model_17, full_ebullition_model_18, full_ebullition_model_19)%>%
rename(hobo_temp = temperature, hobo_temp_se = temperature_se)
full_ebullition_model_alltrap$water_temp_dam[is.nan(as.numeric(full_ebullition_model_alltrap$water_temp_dam))] <- NA
full_ebullition_model_alltrap$water_temp_dam_sd[is.nan(as.numeric(full_ebullition_model_alltrap$water_temp_dam_sd))] <- NA
full_ebullition_model_alltrap$ebu_rate[is.nan(as.numeric(full_ebullition_model_alltrap$ebu_rate))] <- NA
full_ebullition_model_alltrap$ebu_rate_se[is.nan(as.numeric(full_ebullition_model_alltrap$ebu_rate_se))] <- NA
full_ebullition_model_alltrap$hobo_temp[is.nan(as.numeric(full_ebullition_model_alltrap$hobo_temp))] <- NA
full_ebullition_model_alltrap$hobo_temp_se[is.nan(as.numeric(full_ebullition_model_alltrap$hobo_temp_se))] <- NA
- When script has finished running click on the 01_model_training.R script.
- You can either source the 02_model_training.R script or run through it incrementally.
# CH4cast
# Ryan McClure
# TRAINING MODELS WITH 2017 DATA
# 2017 MODEL TRAINING ----
#* TEMPERATURE SCALING MODEL ----
model.lm17 = ("lm17.txt")
jagsscript = cat("
model {
mu ~ dnorm(0,1e-6) # intercet
beta ~ dnorm(0,1e-6) # cat temp paramter
sd.pro ~ dunif(0, 1000)
tau.pro <- pow(sd.pro, -2)
for(i in 2:N) {
predX[i] <- mu + C_mean[i]*beta
Y[i] ~ dnorm(predX[i],tau.pro) # model error
}
}
",
file = model.lm17)
#* RUNJAGS FOR 2017 TEMP SCALE ----
full_ebullition_model_alltrap_jags <- full_ebullition_model_alltrap%>%
select(time, ebu_rate, ebu_rate_se, hobo_temp, water_temp_dam)%>%
filter(time <= "2017-10-30")
for (i in colnames(full_ebullition_model_alltrap_jags[,c(3:5)])) {
full_ebullition_model_alltrap_jags[,i] <- imputeTS::na_interpolation(full_ebullition_model_alltrap_jags[,i],option = "linear")
}
full_ebullition_model_alltrap_jags <- na.omit(full_ebullition_model_alltrap_jags)%>%
mutate(ndays = difftime(time,lag(time)))
jags.data.lm = list(Y= full_ebullition_model_alltrap_jags$hobo_temp,
N = nrow(full_ebullition_model_alltrap_jags),
C_mean = full_ebullition_model_alltrap_jags$water_temp_dam)
jags.params.lm.eval = c("sd.pro", "mu", "beta", "tau.pro")
j.lm.model <- jags.model(file = model.lm17,
data = jags.data.lm,
n.chains = 3)
eval_temp <- coda.samples(model = j.lm.model,
variable.names = jags.params.lm.eval,
n.iter = 10000, n.burnin = 1000)
#plot(eval_temp)
print("TEMP SCALE MODEL DIAGNOSTICS")
print(gelman.diag(eval_temp))
#* TEMPERATURE SCALING PARAMETERS ----
temp_out_parms <- eval_temp %>%
spread_draws(sd.pro, mu, beta) %>%
filter(.chain == 1)
print(temp_out_parms)
#* AR EBULLITION PREDICTION MODEL ----
model.ar17 = ("ar17_model.txt")
jagsscript = cat("
model {
#priors===================================================
mu2 ~ dnorm(0,1e-6)
sd.pro ~ dunif(0, 1000)
phi ~ dnorm(0,1e-6)
omega ~ dnorm(0,1e-6)
#Informative priors on initial conditions based on first observation
X[1] ~ dnorm(x_init, tau.obs[1])
#end priors===============================================
for(i in 2:N) {
#process model=============================================
tau.pro[i] <- 1/((sd.pro*ndays[i])*(sd.pro*ndays[i]))
predX[i] <- mu2 + phi*X[i-1] + omega*D[i]
X[i] ~ dnorm(predX[i],tau.pro[i])
#end of process model======================================
#data model================================================
Y[i] ~ dnorm(X[i], tau.obs[i]) # Observation variation
#end of data model=========================================
}
}", file = model.ar17)
#* RUNJAGS FOR 2017 EBULLITION MODEL ----
jags.data.ar = list(x_init = full_ebullition_model_alltrap_jags$ebu_rate[1],
Y = full_ebullition_model_alltrap_jags$ebu_rate,
tau.obs = 1/((full_ebullition_model_alltrap_jags$ebu_rate_se)/sqrt(4)) ^ 2,
N = nrow(full_ebullition_model_alltrap_jags),
D = full_ebullition_model_alltrap_jags$hobo_temp,
ndays = full_ebullition_model_alltrap_jags$ndays)
nchain = 3
chain_seeds <- c(200,800,1400)
init <- list()
for(i in 1:nchain){
init[[i]] <- list(sd.pro = 0.5,
mu2 = -5,
omega = 0.5,
phi = 0.2,
.RNG.name = "base::Wichmann-Hill",
.RNG.seed = chain_seeds[i])
}
j.model <- jags.model(file = model.ar17,
data = jags.data.ar,
inits = init,
n.chains = 3)
eval_ebu <- coda.samples(model = j.model,
variable.names = c("sd.pro", "mu2", "phi", "omega"),
n.iter = 70000, n.burnin = 10000)
#plot(eval_ebu)
print("SS EBULLITION MODEL DIAGNOSTICS")
print(gelman.diag(eval_ebu))
#* EBULLITION MODEL PARAMETERS ----
ebu_out_parms <- eval_ebu %>%
spread_draws(sd.pro, mu2, phi, omega) %>%
filter(.chain == 1)
print(ebu_out_parms)
############### HAS NOT BEEN UPDATED ######################
null = ("null_model.txt")
jagsscript = cat("
model {
#priors===================================================
sd.pro ~ dunif(0, 1000)
#Informative priors on initial conditions based on first observation
X[1] ~ dnorm(x_init, tau.obs[1])
#end priors===============================================
for(i in 2:N) {
#process model=============================================
tau.pro[i] <- 1/((sd.pro*ndays[i])*(sd.pro*ndays[i]))
predX[i] <- X[i-1]
X[i] ~ dnorm(predX[i],tau.pro[i])
#end of process model======================================
#data model================================================
Y[i] ~ dnorm(X[i], tau.obs[i]) # Observation variation
#end of data model=========================================
}
}", file = null)
#* RUNJAGS FOR 2017 EBULLITION MODEL ----
jags.data.null = list(x_init = full_ebullition_model_alltrap_jags$ebu_rate[1],
Y = full_ebullition_model_alltrap_jags$ebu_rate,
tau.obs = 1/((full_ebullition_model_alltrap_jags$ebu_rate_se)/sqrt(4)) ^ 2,
N = nrow(full_ebullition_model_alltrap_jags),
ndays = full_ebullition_model_alltrap_jags$ndays)
nchain = 3
chain_seeds <- c(200,800,1400)
init <- list()
for(i in 1:nchain){
init[[i]] <- list(sd.pro = 0.5,
.RNG.name = "base::Wichmann-Hill",
.RNG.seed = chain_seeds[i])
}
j.model <- jags.model(file = null,
data = jags.data.null,
inits = init,
n.chains = 3)
eval_ebu_null <- coda.samples(model = j.model,
variable.names = c("sd.pro"),
n.iter = 70000, n.burnin = 10000)
plot(eval_ebu_null)
print("SS EBULLITION MODEL DIAGNOSTICS")
print(gelman.diag(eval_ebu_null))
#* PERSISTENCE NULL MODEL ----
#* PERSISTENCE NULL MODEL PARAMETERS ----
null_out_parms <- eval_ebu_null %>%
spread_draws(sd.pro) %>%
filter(.chain == 1)
print(null_out_parms)
- When script has finished running click on the 03_Generate_Forecasts.R script.
- If everything has been properly aligned, you should also be able to click source and the script will run through.
- NOTE --> You might get an error at the end when sourcing. This is OK. It is still developing all the forecasts needed for the output.
- This script includes the bulk for CH4castR. Everything else so far has been just staging to prep for the foreacsts. The coding block is almost the same as the model training script above.
- The forecasts are all saved in the forecast_output folder as .rds files.
- If everything has been properly aligned, you should also be able to click source and the script will run through.
- This will generate plots in the figures folder. This includes Figure 3, 4, 5, and 6 in the MS. Figures 1 and 2 have been manually added to the figures folder for transparency and access.