This repository has been archived by the owner on Oct 6, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathr.html
1073 lines (947 loc) · 42.5 KB
/
r.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<title>R</title>
<script src="site_libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="site_libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<style>h1 {font-size: 34px;}
h1.title {font-size: 38px;}
h2 {font-size: 30px;}
h3 {font-size: 24px;}
h4 {font-size: 18px;}
h5 {font-size: 16px;}
h6 {font-size: 12px;}
code {color: inherit; background-color: rgba(0, 0, 0, 0.04);}
pre:not([class]) { background-color: white }</style>
<script src="site_libs/jqueryui-1.11.4/jquery-ui.min.js"></script>
<link href="site_libs/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" />
<script src="site_libs/tocify-1.9.1/jquery.tocify.js"></script>
<script src="site_libs/navigation-1.1/tabsets.js"></script>
<link href="site_libs/highlightjs-9.12.0/default.css" rel="stylesheet" />
<script src="site_libs/highlightjs-9.12.0/highlight.js"></script>
<link href="site_libs/font-awesome-5.1.0/css/all.css" rel="stylesheet" />
<link href="site_libs/font-awesome-5.1.0/css/v4-shims.css" rel="stylesheet" />
<link rel="shortcut icon" href="images/duck_guidance.png"/>
<script>
// Define dataLayer and the gtag function.
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
// Default ad_storage to 'denied' as a placeholder
// Determine actual values based on your own requirements
gtag('consent', 'default', {
'ad_storage': 'denied',
'analytics_storage': 'granted'
});
</script>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-KBJCBD114T"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
/*
The tracking number below MUST be replaced with a unique number, contact the statistics development team to set this up.
*/
gtag('config', 'G-KBJCBD114T');
</script>
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>
<style type="text/css">code{white-space: pre;}</style>
<script type="text/javascript">
if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === "complete") {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}
</script>
<link rel="stylesheet" href="images/acalat_theme.css" type="text/css" />
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
img {
max-width:100%;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
details > summary > p:only-child {
display: inline;
}
pre code {
padding: 0;
}
</style>
<style type="text/css">
.dropdown-submenu {
position: relative;
}
.dropdown-submenu>.dropdown-menu {
top: 0;
left: 100%;
margin-top: -6px;
margin-left: -1px;
border-radius: 0 6px 6px 6px;
}
.dropdown-submenu:hover>.dropdown-menu {
display: block;
}
.dropdown-submenu>a:after {
display: block;
content: " ";
float: right;
width: 0;
height: 0;
border-color: transparent;
border-style: solid;
border-width: 5px 0 5px 5px;
border-left-color: #cccccc;
margin-top: 5px;
margin-right: -10px;
}
.dropdown-submenu:hover>a:after {
border-left-color: #adb5bd;
}
.dropdown-submenu.pull-left {
float: none;
}
.dropdown-submenu.pull-left>.dropdown-menu {
left: -100%;
margin-left: 10px;
border-radius: 6px 0 6px 6px;
}
</style>
<script type="text/javascript">
// manage active state of menu based on current page
$(document).ready(function () {
// active menu anchor
href = window.location.pathname
href = href.substr(href.lastIndexOf('/') + 1)
if (href === "")
href = "index.html";
var menuAnchor = $('a[href="' + href + '"]');
// mark the anchor link active (and if it's in a dropdown, also mark that active)
var dropdown = menuAnchor.closest('li.dropdown');
if (window.bootstrap) { // Bootstrap 4+
menuAnchor.addClass('active');
dropdown.find('> .dropdown-toggle').addClass('active');
} else { // Bootstrap 3
menuAnchor.parent().addClass('active');
dropdown.addClass('active');
}
// Navbar adjustments
var navHeight = $(".navbar").first().height() + 15;
var style = document.createElement('style');
var pt = "padding-top: " + navHeight + "px; ";
var mt = "margin-top: -" + navHeight + "px; ";
var css = "";
// offset scroll position for anchor links (for fixed navbar)
for (var i = 1; i <= 6; i++) {
css += ".section h" + i + "{ " + pt + mt + "}\n";
}
style.innerHTML = "body {" + pt + "padding-bottom: 40px; }\n" + css;
document.head.appendChild(style);
});
</script>
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
background-color: transparent;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<!-- code folding -->
<style type="text/css">
#TOC {
margin: 25px 0px 20px 0px;
}
@media (max-width: 768px) {
#TOC {
position: relative;
width: 100%;
}
}
@media print {
.toc-content {
/* see https://github.com/w3c/csswg-drafts/issues/4434 */
float: right;
}
}
.toc-content {
padding-left: 30px;
padding-right: 40px;
}
div.main-container {
max-width: 1200px;
}
div.tocify {
width: 20%;
max-width: 260px;
max-height: 85%;
}
@media (min-width: 768px) and (max-width: 991px) {
div.tocify {
width: 25%;
}
}
@media (max-width: 767px) {
div.tocify {
width: 100%;
max-width: none;
}
}
.tocify ul, .tocify li {
line-height: 20px;
}
.tocify-subheader .tocify-item {
font-size: 0.90em;
}
.tocify .list-group-item {
border-radius: 0px;
}
</style>
</head>
<body>
<div class="container-fluid main-container">
<!-- setup 3col/9col grid for toc_float and main content -->
<div class="row">
<div class="col-xs-12 col-sm-4 col-md-3">
<div id="TOC" class="tocify">
</div>
</div>
<div class="toc-content col-xs-12 col-sm-8 col-md-9">
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-bs-toggle="collapse" data-target="#navbar" data-bs-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">Statistics Production Guidance</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
Learning and development
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="l+d.html">General resources</a>
</li>
<li>
<a href="sql.html">SQL</a>
</li>
<li>
<a href="r.html">R</a>
</li>
<li>
<a href="git.html">Git</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
Creating statistics
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="rap.html">Processes and RAP</a>
</li>
<li>
<a href="rap_managers.html">RAP for managers</a>
</li>
<li>
<a href="ud.html">Open data standards</a>
</li>
<li>
<a href="cd.html">Writing and visualising</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
Publishing statistics
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="pub.html">How to publish</a>
</li>
<li>
<a href="ees.html">Using EES</a>
</li>
<li>
<a href="examples.html">EES examples</a>
</li>
<li>
<a href="dashboards.html">Dashboards</a>
</li>
<li>
<a href="embedded-charts.html">Embedded R-Shiny charts</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
Understanding users
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="user_eng.html">User engagement</a>
</li>
<li>
<a href="user_an.html">EES analytics</a>
</li>
</ul>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
<a href="mailto:[email protected]">
<span class="fa fa-envelope"></span>
</a>
</li>
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
<div id="header">
<h1 class="title toc-ignore">R</h1>
</div>
<p class="text-muted">
Guidance and tips for using the programming language R
</p>
<hr />
<div id="what-is-it" class="section level2">
<h2>What is it</h2>
<hr />
<p>R is an open-source programming language specifically aimed at statisticians and data analysts.</p>
<hr />
</div>
<div id="what-is-it-for" class="section level2">
<h2>What is it for</h2>
<hr />
<p>R can be used for almost anything you can think of, notably data analysis, data visualisation, and creating reports and dashboards. It can also be used to extract data from SQL databases and run SQL queries.</p>
<hr />
</div>
<div id="how-to-get-it" class="section level2">
<h2>How to get it</h2>
<hr />
<p>Download R (language) and RStudio (IDE) from the DfE software center. We also recommend that you download RTools (a helpful R extension) at the same time.</p>
<p>There are usually a couple of different versions available for software on the software center, we’d recommend you always go for the latest (newest) version possible.</p>
<!-- gif getting this from the software center -->
<hr />
</div>
<div id="best-places-to-start" class="section level2">
<h2>Best places to start</h2>
<hr />
<ul>
<li><p>The DfE Analytics Academy host an <a href="https://trello.com/invite/b/QdDx3VmA/96f273b3438db2bb8ee5feae3943c3d4/analytics-academy-an-r-training-course" target="_blank" rel="noopener noreferrer">online R training course</a>. This is a great resource full of reproducible examples using DfE data. The course takes you through initially getting R downloaded, all the way through to developing apps in RShiny.</p></li>
<li><p>There is also the <a href="https://dfe-analytical-services.github.io/r-training-course/" target="_blank" rel="noopener noreferrer">DfE R training guide</a>, which is a great starting point and reference to guide you through how to get started using R and RStudio.</p></li>
<li><p>As an alternative, with a number of options for beginners to R, <a href="https://education.rstudio.com/learn/beginner/" target="_blank" rel="noopener noreferrer">RStudio Education</a> provide a variety of materials to suit different learning styles.</p></li>
</ul>
<hr />
</div>
<div id="best-practice" class="section level2">
<h2>Best practice</h2>
<hr />
<p>Tips for reaching best practice in R can be found on our <a href="rap.html#Clean_final_code">RAP page</a>, with guidance on meeting best practice in RAP for clean final code. This makes it easier to read and pick up if another person is running your code.</p>
<hr />
</div>
<div id="how-to-work-with-r" class="section level2">
<h2>How to work with R</h2>
<hr />
<div id="r-projects" class="section level3">
<h3>R Projects</h3>
<hr />
<p>Whenever you are using R, you should work in an RProject. This just makes sure you are set up in the correct working directory, so your code is pointing at the right folders and files.</p>
<p><a href="https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects" target="_blank" rel="noopener noreferrer">This guide for using projects in R</a> is a really useful article to help you set up a project.</p>
<p>You can check which project you are working in by looking in the top right hand corner of RStudio:</p>
<p><img src="images/Rproj.png" /><!-- --></p>
<hr />
</div>
<div id="outlines" class="section level3">
<h3>Outlines</h3>
<hr />
<p>In RStudio you can greatly increase the navigability of your code by taking advantage of outlines. <a href="https://support.rstudio.com/hc/en-us/articles/200484568-Code-Folding-and-Sections" target="_blank" rel="noopener noreferrer">More information on folding and navigating outlines in RStudio</a> can be found online, though when using rmarkdown reports, remember to use names first, such as <code>## Rows that aren't matching: r nrow(joined %>% filter(matching == FALSE))</code>, rather than having the R code first, so that they are easy to discern in the outline.</p>
<hr />
</div>
<div id="renv" class="section level3">
<h3>renv</h3>
<hr />
<p>You should use the <a href="https://rstudio.github.io/renv/articles/renv.html" target="_blank" rel="noopener noreferrer">renv package</a> for package and version control in R.</p>
<p>Packages and versions of R regularly update. Over time, this can cause code to break - e.g. if different dependencies are required for later versions of packages to work. Using renv creates a “snapshot” of your code and packages at the time you created it, which anyone can then recreate when they come to use your code.</p>
<p>This is really important for reproducibility, and will help you meet elements of great practice with <a href="rap.html#recyclable-code-for-future-use">recyclable code for future use</a>.</p>
<hr />
<div id="renvrestore" class="section level4">
<h4>renv::restore()</h4>
<hr />
<p>Sometimes renv::restore() can fail, and when in specific renv-controlled projects install.packages() will fail saying that packages aren’t available even when they clearly are. There are a couple of workarounds we have found that get around this failure.</p>
<ol style="list-style-type: decimal">
<li>Configuring the proxy settings by running the below in R - this also helps if you are getting timeout issues when trying to webscrape with R:</li>
</ol>
<pre><code>Sys.setenv(no_proxy="*")
</code></pre>
<ol start="2" style="list-style-type: decimal">
<li>Specifying the renv library as the install location. It’s a bit of a fudge, though these lines are helpful to get the packages from the renv lockfile installed and you running the project when needed:</li>
</ol>
<pre><code>myPath <- .libPaths()[1]
forceInstall <- function(pkg, path) {
missing <- suppressWarnings(eval(parse(text= paste0("!require(",pkg,")"))))
if(missing == FALSE){
message(pkg, " is already installed.")
} else{
install.packages(pkg, lib = path)
}
}
forceInstall("jsonlite", myPath)
renvPackages <- names(jsonlite::fromJSON("renv.lock", flatten = TRUE)$Packages)
invisible(lapply(renvPackages, forceInstall, path = myPath))</code></pre>
<p>More manual equivalent to use for specific packages:</p>
<pre><code>.libPaths() # note down output 1, and reuse in the lib argument of install.packages() as below
install.packages("rmarkdown", lib = "C:/Users/swong/OneDrive - Department for Education/Documents/stats-production-guidance/renv/library/R-4.0/x86_64-w64-mingw32")
</code></pre>
<hr />
</div>
<div id="updating-packages-in-renv" class="section level4">
<h4>Updating packages in renv</h4>
<hr />
<p>To update a single package run:</p>
<p><code>renv::update("dplyr")</code></p>
<p>To update all packages run:</p>
<p><code>renv::update()</code></p>
<hr />
</div>
<div id="installing-old-package-versions-in-renv" class="section level4">
<h4>Installing old package versions in renv</h4>
<hr />
<p>This is surprisingly neat to do. Let’s say you wanted to roll back to version 1.0.2 of dplyr, you would run the following:</p>
<p><code>renv::install("[email protected]")</code></p>
<hr />
</div>
</div>
</div>
<div id="quick-reference-lookup" class="section level2">
<h2>Quick reference lookup</h2>
<hr />
<ul>
<li>If you want a useful guide for R syntax or functions, then look no further than the <a href="https://rstudio.com/resources/cheatsheets/" target="_blank" rel="noopener noreferrer">R cheat sheets</a>, these can be an invaluable point of reference. Below we’ve included a few particularly relevant ones:
<ul>
<li>Introduction to the <a href="https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf" target="_blank" rel="noopener noreferrer">RStudio environment</a></li>
<li><a href="http://github.com/rstudio/cheatsheets/raw/master/base-r.pdf" target="_blank" rel="noopener noreferrer">Base R</a></li>
<li><a href="https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf" target="_blank" rel="noopener noreferrer">Importing data into R</a></li>
<li><a href="https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf" target="_blank" rel="noopener noreferrer">dplyr</a> for data manipulation</li>
<li><a href="https://github.com/rstudio/cheatsheets/raw/master/strings.pdf" target="_blank" rel="noopener noreferrer">stringr</a> for string manipulation</li>
<li><a href="https://github.com/rstudio/cheatsheets/raw/master/regex.pdf" target="_blank" rel="noopener noreferrer">Regex</a></li>
<li><a href="https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf" target="_blank" rel="noopener noreferrer">RMarkdown</a></li>
<li><a href="https://github.com/rstudio/cheatsheets/raw/master/shiny.pdf" target="_blank" rel="noopener noreferrer">RShiny</a></li>
<li><a href="https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf" target="_blank" rel="noopener noreferrer">ggplot2</a> for data visualisations</li>
<li><a href="https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf" target="_blank" rel="noopener noreferrer">purrr</a> for applying functions</li>
</ul></li>
</ul>
<hr />
</div>
<div id="other-resources" class="section level2">
<h2>Other resources</h2>
<hr />
<ul>
<li><p>Here is another free introduction to R course by <a href="https://www.quantargo.com/courses/course-r-introduction/" target="_blank" rel="noopener noreferrer">Quantargo</a>.</p></li>
<li><p><a href="https://bookdown.org/yihui/rmarkdown/" target="_blank" rel="noopener noreferrer">R Markdown: The Definitive Guide</a>, hopefully this one should be relatively self-explanatory!</p></li>
<li><p><a href="https://datascienceineducation.com/" target="_blank" rel="noopener noreferrer">Data science in education</a> provides a heavily detailed guide for beginners in R learning to process data, with some well written out sections that may be of interest.</p></li>
<li><p>Handy guide to <a href="https://support.rstudio.com/hc/en-us/articles/200484568-Code-Folding-and-Sections" target="_blank" rel="noopener noreferrer">collapsing and sectioning R code</a> for easy navigation in RStudio.</p></li>
<li><p>Here are <a href="https://towardsdatascience.com/five-tidyverse-tricks-you-may-not-know-about-c5026d5a19da" target="_blank" rel="noopener noreferrer">5 handy tidyverse functions</a> that you should know if you’re using R to process data. Number two is especially useful for those processing wide data into a tidy format!</p></li>
<li><p>MoJ have produced <a href="https://github.com/moj-analytical-services/writing_functions_in_r" target="_blank" rel="noopener noreferrer">guidance on writing functions in R</a></p></li>
<li><p>If you’re wondering how best to make the jump to R from Excel and SQL, take a look at this <a href="https://educationgovuk.sharepoint.com/sites/sarpi/g/WorkplaceDocuments/Forms/AllItems.aspx?FolderCTID=0x012000C61C1076C17C5547A6D6D8C2A27B5D97&View=%7B2B35083D%2D7626%2D48E2%2D9615%2D451544742692%7D&id=%2Fsites%2Fsarpi%2Fg%2FWorkplaceDocuments%2FInducation%20learning%20and%20career%20development%2FCoffee%20and%20Coding%2F181121%5FDavd%5FExceltoR%2FSQL%5FEXCEL%5Fto%5FR%5FHow%5Fto%5FMake%5Fthe%5FJump%2Ehtml&parent=%2Fsites%2Fsarpi%2Fg%2FWorkplaceDocuments%2FInducation%20learning%20and%20career%20development%2FCoffee%20and%20Coding%2F181121%5FDavd%5FExceltoR" target="_blank" rel="noopener noreferrer">coffee and coding presention</a> by David Sands.</p></li>
<li><p>Malcolm Barrett has done some slides on <a href="https://malco.io/slides/hs_dplyr/#1" target="_blank" rel="noopener noreferrer">dplyr</a>, <a href="https://malco.io/slides/hs_ggplot2/#1" target="_blank" rel="noopener noreferrer">ggplot2</a>, and using <a href="https://lar-purrr.netlify.app/#1" target="_blank" rel="noopener noreferrer">purrr</a> which may be useful if you’re looking at learning more about any of those packages.</p></li>
<li><p>Also check out the <a href="https://garthtarr.github.io/meatR/janitor.html" target="_blank" rel="noopener noreferrer">janitor</a> package, it has some particularly powerful functions that are worth a look for tidying and QA’ing data.</p></li>
</ul>
<hr />
</div>
<div id="excel-functions-in-r" class="section level2">
<h2>Excel functions in R</h2>
<hr />
<p>R can do everything you do in excel, but takes out the human error. The reference table below shows how you would carry out popular Excel commands in R.</p>
<p>R comes in with a built-in dataset called “iris”. We’ll use this for all examples so you can recreate them in your local area.</p>
<p><strong>REMEMBER:</strong> R is case sensitive, so all references to column names/entries need to be as-is in the dataset you are looking at. <a href="https://www.rdocumentation.org/packages/janitor/versions/1.2.0/topics/clean_names">Functions exist</a> that can translate all your columns to lower or snake case for ease!</p>
<table>
<colgroup>
<col width="18%" />
<col width="63%" />
<col width="18%" />
</colgroup>
<thead>
<tr class="header">
<th><strong>Common Excel Task</strong></th>
<th><strong>Example with iris </strong></th>
<th><strong>How to do in R with dplyr</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Select specific columns</strong></td>
<td>Select only species and petal length</td>
<td><code>iris %>% select(Species, Petal.Length)</code></td>
</tr>
<tr class="even">
<td><strong>List unique entries in field (column)</strong></td>
<td>Find the unique entries for the “Species” column in iris</td>
<td><code>iris %>% select(Species) %>% distinct()</code></td>
</tr>
<tr class="odd">
<td><strong>Filter/select based on criteria</strong></td>
<td>Filter for sepal length >4 and sepal width <2.5, but NOT “versicolor” species</td>
<td><code>iris %>% filter(Sepal.Length > 4 &</code> <br /> <code>Sepal.Width <2.5 & Species != "versicolor")</code></td>
</tr>
<tr class="even">
<td><strong>Filter for multiple criteria in same column</strong></td>
<td>Filter for all “setosa” and “versicolor” species</td>
<td><code>iris %>% filter(Species %in% c("setosa", "versicolor")</code></td>
</tr>
<tr class="odd">
<td><strong>If else with OR</strong></td>
<td>Create new column called “size_group” based on length or width of petal</td>
<td><code>iris %>% mutate(size_group =</code><br /><code>if_else( Petal.Length > 4 | Petal.Width >1.5, "Large", "Small"))</code></td>
</tr>
<tr class="even">
<td><strong>Multiple if else</strong></td>
<td>Create new column called “flower_price” based on species and petal length</td>
<td><code>iris %>% mutate(flower_price = case_when(</code> <br /> <code>Species == "setosa" & Petal.Length > 1.5 ~"top band",</code><br /><code>Species == "versicolor" & Petal.Length < 4 ~"low_band",</code><br /> <code>TRUE ~ "mid_band"))</code></td>
</tr>
<tr class="odd">
<td><strong>COUNTIF</strong></td>
<td>Count the number of species if they have a petal length >1.5</td>
<td><code>iris %>% filter(Petal.Length > 1.5 ) %>%</code><br /><code>group_by(Species) %>% count()</code></td>
</tr>
<tr class="even">
<td><strong>SUMIF</strong></td>
<td>Sum petal width of species if sepal width <3</td>
<td><code>iris %>% filter(Sepal.Width <3) %>%</code><br /><code>group_by(Species) %>%</code><br /><code>summarise(Petal.Width = sum(Petal.Width))</code></td>
</tr>
<tr class="odd">
<td><strong>VLOOKUP</strong></td>
<td>Lookup to a table called “lookup”</td>
<td><code>iris %>% left_join(lookup, by.x="Species", by.y ="plant_species")</code></td>
</tr>
<tr class="even">
<td><strong>Order by</strong></td>
<td>Order dataset by descending petal width</td>
<td><code>iris %>% arrange(desc(Petal.Width))</code></td>
</tr>
</tbody>
</table>
<p>More tips for moving from using Excel to using R can be found in the <a href="https://github.com/dfe-analytical-services/excel-to-R/wiki/Excel-to-R---how-do-I...">excel-to-R wiki</a>.</p>
<hr />
</div>
<div id="sql-functions-in-r" class="section level2">
<h2>SQL functions in R</h2>
<hr />
<p>R can do a lot of the things that are possible in SQL. The reference table below shows how you would carry out some popular SQL commands in R.</p>
<p><strong>REMEMBER:</strong> R is case sensitive, so all references to column names/entries need to be as-is in the dataset you are looking at. <a href="https://www.rdocumentation.org/packages/janitor/versions/1.2.0/topics/clean_names">Functions exist</a> that can translate all your columns to lower or snake case for ease!</p>
<table>
<colgroup>
<col width="47%" />
<col width="52%" />
</colgroup>
<thead>
<tr class="header">
<th>Common SQL Task</th>
<th>How to do in R (with dplyr)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>SELECT * FROM TABLE</strong></td>
<td><code>table %>% select()</code></td>
</tr>
<tr class="even">
<td><strong>SELECT ColA, ColB, ColC FROM TABLE</strong></td>
<td><code>table %>% select(ColA, ColB, ColC)</code></td>
</tr>
<tr class="odd">
<td><strong>SELECT DISTINCT ColA FROM TABLE</strong></td>
<td><code>table %>% select(ColA) %>% distinct()</code></td>
</tr>
<tr class="even">
<td><strong>TABLE A LEFT JOIN (TABLE B) ON TABLEA.x = TABLEB.y</strong></td>
<td><code>tableA %>% left_join(TableB, by = c(x = y))</code></td>
</tr>
<tr class="odd">
<td><strong>CASE WHEN x = 1 THEN 1, WHEN x =2 THEN 2, ELSE 0 END AS New_column_name</strong></td>
<td><code>%>% mutate (New_column_name = case_when (x == 1 ~ 1, x == 2 ~ 2, TRUE ~ 0))</code></td>
</tr>
<tr class="even">
<td><strong>CONCAT(LEA, ESTAB) AS LAESTAB</strong></td>
<td><code>%>% mutate(LAESTAB = paste0(LEA, ESTAB))</code></td>
</tr>
<tr class="odd">
<td><strong>SELECT COUNT(*) FROM TABLE</strong></td>
<td><code>table %>% nrow()</code></td>
</tr>
<tr class="even">
<td><strong>SELECT COUNT(ColA) FROM TABLE</strong></td>
<td><code>table %>% count(colA)</code></td>
</tr>
<tr class="odd">
<td><strong>SELECT Date_column = CONVERT(DATE, Date_column) FROM TABLE</strong></td>
<td><code>table %>% mutate(Date_column = as.Date(Date_column))</code></td>
</tr>
<tr class="even">
<td><strong>SELECT Number_column = CONVERT(INT, Number_column ) FROM TABLE</strong></td>
<td><code>table %>% mutate(Number_column = as.numeric(Number_column))</code></td>
</tr>
<tr class="odd">
<td><strong>SELECT String_column = CONVERT(VARCHAR, String_column ) FROM TABLE</strong></td>
<td><code>table %>% mutate(String_column = as.character(String_column))</code></td>
</tr>
<tr class="even">
<td><strong>DROP TableA</strong></td>
<td><code>rm(TableA)</code></td>
</tr>
</tbody>
</table>
<p>More tips for moving from using SQL to using R can be found in the <a href="https://github.com/dfe-analytical-services/excel-to-R/wiki/SQL-to-R-how-do-I...">SQL-to-R wiki</a>.</p>
<hr />
</div>
<div id="tips-for-using-r" class="section level2">
<h2>Tips for using R</h2>
<hr />
<p>A selection of handy bits of code and workarounds for common issues. More useful code snippets can also be found in our <a href="https://github.com/dfe-analytical-services/automated-data-qa" target="_blank" rel="noopener noreferrer">github repo</a></p>
<hr />
<div id="specifying-a-version-of-r-to-use" class="section level3">
<h3>Specifying a version of R to use</h3>
<hr />
<p>This can be done most easily by navigating in RStudio through <code>Tools > Global options > General > Basic > R version (change)</code>. It’s likely you’ll need to restart RStudio for the changes to take affect.</p>
<!-- We cover this in the excel to R section -->
<!-- ### Mutating columns -->
<!-- Where you have a simple if else scenario use `mutate(col = if_else(expression, TRUE, FALSE)`, rather than `mutate(col = case_when(expression ~ TRUE, TRUE ~FALSE)` it's both quicker and easier on the eye! -->
<hr />
<!-- This is going into DfER- remove once it's in? -->
</div>
<div id="rounding" class="section level3">
<h3>Rounding</h3>
<hr />
<p>The base R function of round() rounds 5’s downwards. To round them upwards you can create a custom function like the one below:</p>
<pre><code>roundFiveUp <- function(x, n){
z = abs(x)*10^n
z = z + 0.5 + sqrt(.Machine$double.eps)
z = trunc(z)
z = z/10^n
positiveNegative = sign(x)
return(z * positiveNegative)
}</code></pre>
<!-- covered in SQL to R section -->
<!-- ### Grouping sets in R -->
<!-- The data.table package has a really neat function that can produce quick pivots and subtotals as you would using grouping sets in SQL. More information can be found at [this R bloggers post](https://www.r-bloggers.com/2019/03/creating-blazing-fast-pivot-tables-from-r-with-data-table-now-with-subtotals-using-grouping-sets/){target="_blank" rel="noopener noreferrer"}, with handy written examples to walk you through. -->
<hr />
</div>
<div id="passing-variables-as-arguments" class="section level3">
<h3>Passing variables as arguments</h3>
<hr />
<p>This can be worked around by using a combination of <code>eval()</code> and <code>parse()</code>, as shown in the below function:</p>
<pre><code>showFilterLevels <- function(data, meta) {
filters <- meta %>%
filter(col_type == "Filter") %>%
pull(col_name)
levelsTable <- function(filter) {
return(eval(parse(text = paste0("data %>% select(", filter, ") %>% distinct()"))))
}
output <- lapply(filters, levelsTable)
return(output)
}</code></pre>
<hr />
</div>
<div id="reverse-additive-filters" class="section level3">
<h3>Reverse additive filters</h3>
<hr />
<p>You might want to filter your dataset based on multiple negative conditions. Normally to filter on multiple conditions, you would use <code>filter(condition1 & condition2)</code>. The “filter” function does not work well with negative conditions (i.e. filtering for cases where condition 1 and condition 2 are not met). Instead, you can use <code>subset(!(condition1 & condition2)</code>.</p>
<hr />
</div>
<div id="file-locations" class="section level3">
<h3>File locations</h3>
<hr />
<p>Struggling to get files to talk to one another, or get code to find and use another R script? Use <code>here::here()</code> and marvel at it’s wondrous ability to magic away issues.</p>
<hr />
</div>
<div id="interweaving-vectors" class="section level3">
<h3>Interweaving vectors</h3>
<hr />
<p>There’s an easy way to interweave multiple vectors into one single vector using <code>c(rbind())</code>. The example below shows two vectors, but you can have even more if you need.</p>
<pre><code>#Two vectors, x and y
x <- 1:3
y <- 4:6
#Run code to interweave
c(rbind(x, y))
#Output below
# [1] 1 4 2 5 3 6</code></pre>
<hr />
</div>
<div id="making-charts-interactive" class="section level3">
<h3>Making charts interactive</h3>
<hr />
<p>When pulling ggplot charts into RMarkdown reports, you can consider making them even more user-friendly and interactive with plotly. <a href="https://plotly.com/ggplot2" target="_blank" rel="noopener noreferrer">Further information on how to make your charts interactive with plotly</a> can be found online.</p>
<pre><code>#Simple ggplot chart called "p"
p <- ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) # Use hollow circles
#Apply ggplotly() to it to make it interactive
fig <- ggplotly(p)
</code></pre>
<hr />
</div>
<div id="replace-all-values-with-another" class="section level3">
<h3>Replace all values with another</h3>
<hr />
<p>Have you ever needed to replace every value in your data with another? This can come in handy when you are looking at suppression, e.g. converting all NAs to “z” or all values under a certain threshold to “c”.</p>
<pre><code>data %>% mutate_all(~ replace(., . == "Value to replace", "Replacement"))
</code></pre>
<hr />
</div>
<div id="temporary-groups" class="section level3">
<h3>Temporary groups</h3>
<hr />
<p>The group_by() function in dplyr is really useful, but can be fiddly if you only want to use it for one operation in a chunk of code. The with_groups() function from <a href="https://dplyr.tidyverse.org/reference/with_groups.html" target="_blank" rel="noopener noreferrer">dplyr</a> lets you do this, saving you having to group and ungroup data each time.</p>
<pre><code>data %>% mutate(annual_average = with_groups(time_period, mean))
</code></pre>
<hr />
</div>
<div id="finding-package-dependencies" class="section level3">
<h3>Finding package dependencies</h3>
<hr />
<p>Often we’ll take chunks of code and reuse them for new projects. This can lead to building up a long list of packages to install, not all of which end up being used in your new code. The <strong>NCmisc</strong> package is a really handy way to check which packages and functions are used in your code.</p>
<p>Firstly, load up all the packages the code has <code>library()</code> commands for, then run the following:</p>
<pre><code>list.functions.in.file('your-filename-here.R', alphabetic = TRUE)</code></pre>
<hr />
</div>
<div id="visualise-dependencies" class="section level3">
<h3>Visualise dependencies</h3>
<hr />
<p>The <a href="https://github.com/crsh/depgraph" target="_blank" rel="noopener noreferrer">depgraph package</a> allows you to plot a graph of all the dependencies in your R project, which can be a useful tool to help you cut down on the number of package dependencies. Briefly, in these graphs you can look for “hot spots” in the network (big bright dots), which represent packages that have many upstream dependencies but are potentially easy to remove because they have few downstream dependencies (that is, only your package depends on them).</p>
<pre><code>plot_dependency_graph(
pkg = multibridge_pkg
, suggests = FALSE
, option = "cividis"
)
</code></pre>
<hr />
</div>
<div id="reproducible-random-numbers" class="section level3">
<h3>Reproducible random numbers</h3>
<hr />
<p>The <a href="https://www.rdocumentation.org/packages/simEd/versions/2.0.0/topics/set.seed" target="_blank" rel="noopener noreferrer">set.seed()</a> function generates a sequence of random numbers, starting from the value you define in the brackets. This ensures you get the same sequence of random numbers each time you run set.seed() with the same value, which is helpful to test that your results are reproducible.</p>
<pre><code># random sampling
> sample(LETTERS, 5)
[1] "K" "N" "R" "Z" "G"
> sample(LETTERS, 5)
[1] "L" "P" "J" "E" "D"
# reproducible random sampling
> set.seed(42); sample(LETTERS, 5)
[1] "Q" "E" "A" "J" "D"
> set.seed(42); sample(LETTERS, 5)
[1] "Q" "E" "A" "J" "D"
</code></pre>
<hr />
</div>
<div id="automatic-logging" class="section level3">
<h3>Automatic logging</h3>
<hr />
<p>The <a href="https://cran.r-project.org/web/packages/tidylog/readme/README.html" target="_blank" rel="noopener noreferrer">tidylog package</a> is a really useful tool for providing automated feedback on dplyr and tidyr operations.</p>
<pre><code>library(tidylog)
filtered <- filter(mtcars, cyl == 4)
#> filter: removed 21 rows (66%), 11 rows remaining
mutated <- mutate(mtcars, new_var = wt ** 2)
#> mutate: new variable 'new_var' (double) with 29 unique values and 0% NA</code></pre>
<hr />
</div>
<div id="running-sql-scripts-from-r" class="section level3">
<h3>Running SQL scripts from R</h3>
<hr />
<p>R can be used to execute SQL scripts to extract data from a database as well as querying the database directly via R. For using R to execute a SQL script you’ll need the SQL script/s to be in your R Project and to make a connection via R to the database.</p>
<pre><code># Library calls ====
library(odbc)
library(DBI)
# DB connection ====
con <- DBI::dbConnect(odbc::odbc(),
Driver = "ODBC Driver 17 for SQL Server",
Server = "server_name",
Database = "database_name",
UID = "",
PWD = "",
Trusted_Connection = "Yes"
)
# Function to read in sql scripts ====
getSQL <- function(filepath){
con = file(filepath, "r")
sql.string <- ""
while (TRUE){
line <- readLines(con, n = 1)
if ( length(line) == 0 ){
break
}
line <- gsub("\\t", " ", line)
if(grepl("--",line) == TRUE){
line <- paste(sub("--","/*",line),"*/")
}
sql.string <- paste(sql.string, line)
}
close(con)
return(sql.string)
}
# Execute SQL query and pull into R ====
my_table <- dbGetQuery(con, getSQL("my_query_script.sql"))
</code></pre>
<p>If you’re struggling to get your SQL scripts to execute from R, try adding the following two lines to the top of your SQL script to force the formatting required for it to work:</p>
<pre><code>SET ANSI_PADDING OFF
SET NOCOUNT ON;</code></pre>
<hr />
</div>
<div id="cant-find-make-error" class="section level3">
<h3>Can’t find make error</h3>
<hr />
<!-- [Add later - example screenshot of error] -->
<p>This error is usually due to Rtools not being properly installed, this has become fairly common since it has been dropped from the software centre. There have been several requests to read it, though as of yet none have succeeded. For now there’s a workaround to this.</p>
<p>You can <a href="https://cran.r-project.org/bin/windows/Rtools/">install Rtools as a direct download from CRAN</a>. On there, pick the right version of RTools for your version of R, download the file and install.</p>
<div class="alert alert-dismissible alert-info">
<p>When downloading RTools you may need to confirm that you understand the risks of downloading an .exe file from an unknown source, this is common, as this is a trusted source, you can click accept.</p>
<p>When running the .exe file, Windows Defender may step in and act like it’s not going to let you progress any further. If this happens you’ll need to click “More info” and only then can you select something like “install anyway”.</p>
</div>
<p>Set the install location to <code>C:\rtools40</code> (it is often this by default)</p>
<p>Next you need to add it to the PATH variable. We can’t edit this without an admin password, though you can temporarily set it to allow you to restore from renv, by doing it from within the R console.</p>
<div class="alert alert-dismissible alert-info">
<p>This only sets it temporarily for as long as the R session is running, you’ll need to rerun each time you’re in an R session and need it.</p>
</div>
<blockquote>
<p>old_path <- Sys.getenv(“PATH”) Sys.setenv(PATH = paste(old_path, “C:\rtools40\usr\bin”, sep = “;”)) Sys.getenv(“PATH”)</p>
</blockquote>
<hr />
</div>
</div>
</div>
</div>