-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
337 lines (257 loc) · 11.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
---
title: "folders"
output:
html_document:
keep_md: yes
---
# folders
This R package supports the use of standardized folder names in R projects. The
idea is to provide some functions to allow you to avoid using hardcoded paths
and `setwd()` in your R scripts.
Instead, you can use variables like `folders$data` to refer to folder paths.
These paths can be standardized between projects. The folders can be created
for you under the parent folder of your R project.
Using the defaults, or some other standardized list of folder names,
all of your projects can have the same general folder structure. This can help
you write cleaner, more portable, and more reproducible code.
## What does this package do?
Without this package, you could include some code like this in your scripts:
```{r, eval=FALSE}
# Create some standard folders in my project
library(here)
folders <- list(data = "data", figures = "figures", results = "results")
folders <- lapply(folders, here)
result <- lapply(folders, dir.create, showWarnings = FALSE, recursive = TRUE)
# Save some data to a file in the "data" folder
write.csv(iris, file = here(folders$data, "iris.csv"))
# Cleanup unused folders
result <- lapply(folders, function(x) {
if (length(dir(x)) == 0) unlink(x, recursive = TRUE)
})
# Confirm that the data file still exists
file.exists(here(folders$data, "iris.csv"))
```
There is no persistence of the standard folder names, however, and if you
needed to use an alternate folder path, you would need to modify this script
and any other that used that alternate path.
However, using *folders*, you can specify a project-wide configuration file
which will store these standard folder names. Any customization can be
made to that file, and the other scripts in your project can use that file
with no extra changes needed to them. Collaborators can have their own custom
folder paths without modifying the shared scripts and affecting the other
collaborators. Plus there is less code needed in each script to make use of
these standard or customized folder paths, as the *folders* package takes
care of the details.
## Default Folders
The package defaults provide "code", "conf", "data", "doc", "figures" and
"results" folders. You can specify alternatives in a YAML configuration file,
which this package will read and use instead. See "Configuration file" below
for more details.
You will note there is a "code" folder. If your scripts are in the "code"
folder, your code will still be able to find the other folders, thanks
to the *here* package.
## RStudio Projects
This package is intended to be used with RStudio Projects.
A benefit of using RStudio Projects is, once you open the project in RStudio,
you will be placed in the parent folder of your project (aka. "project root").
All of your work in the project will be relative to that location, especially if
your project only uses files and subfolders within that parent folder. This the
most portable way to work. Further, if you are working with a git repository, you
will most likely want to clone this repository into an RStudio Project.
## Other Supported Environments
This package will also work outside of RStudio Projects. For example,
if you are working in a folder tracked by git, then the top level of the git
repository will be identified as the "project root" folder. This behavior is
determined by the *here* package.
If you are neither working in an RStudio project, nor in a folder tracked by a
version control system (git or Subversion), nor an R package development
folder, then the current working directory at the time the *here* package was
loaded will be treated as the "project root" folder.
Or you can force a folder to be the "project root" with a `.here` file. You
can create one with the `here::set_here()` function. See the *here* package
documentation for more information. However, if your goal is to write more
reproducible code and follow best practices, you should really ask yourself why
you are not using RStudio Projects or version control.
## Installation
You can install the stable version from *CRAN* with:
```{r, eval=FALSE}
install.packages("folders")
```
You can install the development version from *GitHub* with:
```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("deohs/folders")
```
Or, if you prefer using *pacman*:
```{r, eval=FALSE, message=FALSE}
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_install_gh("deohs/folders")
```
## Dependencies
When you install this package, the following dependencies should be installed
for you: [config](https://CRAN.R-project.org/package=config),
[here](https://CRAN.R-project.org/package=here),
[yaml](https://CRAN.R-project.org/package=yaml).
You will need to load the *here* package with your scripts to make the most use
of the *folders* package, as seen in the Basic Usage examples below.
## Basic Usage
The following code chunk can be used at the beginning of your scripts to make
use of standardized folders in your projects.
```{r}
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_load(here, folders)
# Get the list of standard folders and create any folders which are missing
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)
result <- create_folders(folders)
```
Then, later in your scripts, you can refer to folders like this:
```{r}
dir.exists(here(folders$data))
```
Or you can add to the standard folder paths like this:
```{r}
file_path <- here(folders$data, "data.csv")
```
## Basic Usage Scenario
Here is an example of a script which will initialize the folders and then write
a data file to the `folders$data` folder. Aside from setting the path to the
configuration file, there are no hardcoded paths and there is no `setwd()`.
```{r, message=FALSE}
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_load(here, folders)
# Get the list of standard folders and create any folders which are missing
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)
result <- create_folders(folders)
# Check to see that the data folder has been created
dir.exists(here(folders$data))
# Create a dataset to use for writing a CSV file to the data folder
df <- data.frame(x = letters[1:3], y = 1:3)
# Confirm that the CSV file does not yet exist
file_path <- here(folders$data, "data.csv")
file.exists(file_path)
# Write the CSV file
write.csv(df, file_path, row.names = FALSE)
# Verify that the file was written
file.exists(file_path)
# Cleanup unused (empty) folders (Optional, as you may prefer to keep them)
result <- cleanup_folders(folders)
# Verify that the data folder and CSV file still exist after cleanup
file.exists(file_path)
# Verify that the configuration file still exists after cleanup
file.exists(conf_file)
```
## Working with subfolders
You can refer to subfolders relative to the paths in your `folders` list using
`here()`. For example, if you had a folder called "raw" under your data folder,
just refer to that folder with `here(folders$data, "raw")`:
```{r, eval=FALSE}
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)
raw_df <- here(folders$data, "raw", "file.csv")
```
If you want to create a subfolder hierarchy under all of your main folders,
you can use `lapply()` or `purrr::map()` to create that hierarchy. For example,
we can create a "phase" folder under each folder in `folders` and then a "01"
folder under each "phase" folder:
```{r, eval=FALSE}
conf_file <- here('conf', 'folders.yml')
folders <- lapply(get_folders(conf_file), here, "phase", "01")
res <- create_folders(folders)
```
You can place that near the top of each of your scripts, adjusting for the
project phase the script is used for, then you can then use `folders$data` to
refer to a path like `data/phase/01` within the parent folder. This way, your
scripts can always refer to the appropriate data, results, etc., folder for
that project phase using the same variables, e.g., `folders$data`,
`folders$results`, etc.
```{r, eval=FALSE}
df <- read.csv(here(folders$data, "data.csv"))
```
If you want to remove empty folders recursively, you can include this code at
the end of your script:
```{r, eval=FALSE}
# Cleanup empty folders recursively
dir_lst <- sort(list.dirs(unlist(lapply(folders, here))), decreasing = TRUE)
result <- sapply(dir_lst, cleanup_folders, conf_file = conf_file)
```
Or, if you are using the latest development version of the package from GitHub,
then it's as simple as:
```{r, eval=FALSE}
# Cleanup empty folders recursively (using development version of package)
result <- cleanup_folders(folders, recursive = TRUE)
```
## Configuration file
The configuration file, if not already present, will be written by `get_folders()`
to a YAML file with a path and filename that you provide. Usually this would be
named something like `folders.yml`, as in the examples above, and usually you will
want this file stored in either the parent folder or the "conf" subfolder of your
R project. This file will be read by `config::get()` on subsequent executions of
`get_folders()`. This behavior can be modified by function parameters.
The default configuration file looks like:
```
default:
code: code
conf: conf
data: data
doc: doc
figures: figures
results: results
```
Once this file has been created, you can edit it to modify the default
folder paths. However, we advise you to stick to the defaults to maintain
maximum consistency between your projects. If you do wish to edit it, you can
do so with any text editor or with R as shown below.
```{r, eval=FALSE}
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_load(here, yaml, folders)
# Get the list of standard folders, creating the configuration file if missing
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)
# Replace a default with a custom folder path
folders$data <- "data_folder"
# Edit the default configuration file to save the modification
write_yaml(list(default = folders), file = conf_file)
```
### Handling platform-dependent paths
You can add other configuration names besides "default". For example, if you
had a path that would be operating system dependent, you could edit your
configuration file like this (abbreviated here to show only the data path):
```
default:
data: data
Windows:
data: //server/path/to/data
Linux:
data: /path/to/data
Darwin:
data: /Volumes/path/to/data
```
And then you can read in the appropriate paths for the system you are using:
```{r, eval=FALSE}
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file, conf_name = Sys.info()[['sysname']])
data_folder <- folders$data
```
... so that your script can still be platform independent even though the data
path is not. If your specified `conf_name` does not exist in the configuration
file, then the defaults are used instead.
```{r cleanup, include=FALSE}
data_file <- here(folders$data, "data.csv")
if(file.exists(data_file)) unlink(data_file, recursive = TRUE)
result <- cleanup_folders(folders, conf_file, keep_conf = FALSE)
```
```{r build_notes, include=FALSE, eval=FALSE}
# 1. After rendering this Rmd to HTML, remove the html file. Only the md output
# from this is needed.
# 2. Make sure you have "R_BUILD_TAR=tar" in your ~/.Renviron.
# 3. Make sure you have "tidy", "pandoc", and a LaTeX environment installed.
# 4. In Terminal, go to the parent folder of "folders" and run these commands:
# R CMD build folders
# R CMD check --as-cran folders_X.Y.Z.tar.gz
# ... where X.Y.Z is the version number in DESCRIPTION.
```