-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathReach01.Rmd
116 lines (80 loc) · 2.04 KB
/
Reach01.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
title: "Reach Scale Inputs"
output: html_notebook
---
# Inputs
## Load packages
```{r pkgLoad}
suppressMessages(library(tidyverse))
library(readxl)
library(lubridate)
```
## Read in
```{r}
r0 <- read_csv("./origData/Reach1.csv",
na = c("", "-9999.0"))
problems()
```
Really only need to parse out the date from the `Date` field.
```{r}
r0
```
```{r}
r0 %>%
distinct(Comments)
```
Ah so we have BQL issues.
```{r}
r0 %>%
distinct(`...15`)
```
So column 15 is blank?
```{r}
glimpse(r0)
```
So `Sampling Time` is OK as a time variable. Can we parse out the date from `Date`?
How many formats?
```{r}
r0 %>%
distinct(Date) %>%
print(n = 42)
```
These all appear to be dmy hms, but better check AUS dates
```{r}
r0 %>%
filter(Biome == "AUS") %>%
distinct(Date)
```
That looks OK. if we push this through Excel we get many problems with multiple data formats in one column. Even fancy stuff [like this](https://stackoverflow.com/questions/13764514/how-to-change-multiple-date-formats-in-same-column) does not get us out of that bother. So... maybe the parent data base exports its csv with some set of attributes that Excel tries to parse, but inconsistently.
## Parsing out sample dates
```{r}
r1 <- r0 %>%
# filter(Biome == "AUS") %>%
# distinct(Date) %>%
separate(Date, into = c("dt", "tm", "am"), sep = " ") %>%
mutate(SampDate = mdy(dt)) %>%
select(Biome:ReachType, SampDate,
SampNum = `Sample number`,
SampTime = `Sampling Time (hh:mm)`,
Amm = `Ammonium (ug N/L)`,
Cl = `Chloride tracer (mg/L)`,
Br = `Bromide tracer (mg/L)`,
Comm = Comments) %>%
mutate(Biome = factor(Biome),
SiteNumber = factor(SiteNumber),
ReachType = factor(ReachType))
```
## Data layout
All `SiteType` are "E"
```{r}
r1 %>% distinct(SiteType)
```
```{r}
xtabs(~ Biome + SiteNumber + ReachType, r1)
```
# Prelim plot
```{r}
ggplot(r1, aes(SampTime, Amm, colour = SiteNumber)) +
facet_grid(Biome ~ ReachType, scales = "free_y") +
geom_line()
```