-
Notifications
You must be signed in to change notification settings - Fork 22
/
Copy pathhw11.tex
81 lines (50 loc) · 3.02 KB
/
hw11.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
\documentclass[12pt]{article}
\usepackage{fullpage,hyperref}\setlength{\parskip}{3mm}\setlength{\parindent}{0mm}
\begin{document}
\begin{center}\bf
Homework 11. Due by 5pm on Thursday 11/11.
Parallel statistical computing.
\end{center}
All modern computers, from a basic laptop to a node on a computing cluster, have multiple cores. Parallelizing your code (i.e., taking advantage of multiple cores) can enable computations too large for one core. Write brief answers to the following questions, by editing the tex file available at \url{https://github.com/ionides/810f21}, and submit the resulting pdf file via Canvas.
\begin{enumerate}
\item Trends in statistical computing are driven by trends in hardware. Why is this leading to a growing role for parallel computing?
\url{https://en.wikipedia.org/wiki/Parallel_computing}
YOUR ANSWER HERE.
\item Some key terms for parallel computing are: process, thread, core, node. Briefly define these in your own words.
\url{https://en.wikipedia.org/wiki/Parallel_computing}
YOUR ANSWER HERE.
\item What common statistical computing tasks are embarassingly parallel?
\url{https://en.wikipedia.org/wiki/Embarrassingly_parallel}
YOUR ANSWER HERE.
\item A basic tool for embarassingly parallel computing in R is \texttt{foreach}. This is now part of the \texttt{doParallel} library included in base R. Run the following R codes for generating $10^8$ standard normal random variables, on your laptop or some other machine. Explain the relative speeds. The ``elapsed'' component of the run time is the total time, in seconds, and is the primary outcome of interest. If you like, you can read more about foreach at
\url{https://cran.r-project.org/web/packages/foreach/vignettes/foreach.html}
\begin{verbatim}
library(doParallel)
registerDoParallel()
system.time(
rnorm(10^8)
) -> time0
system.time(
foreach(i=1:10) %dopar% rnorm(10^7)
) -> time1
system.time(
foreach(i=1:10^2) %dopar% rnorm(10^6)
) -> time2
system.time(
foreach(i=1:10^3) %dopar% rnorm(10^5)
) -> time3
system.time(
foreach(i=1:10^4) %dopar% rnorm(10^4)
) -> time4
rbind(time0,time1,time2,time3,time4)
\end{verbatim}
YOUR ANSWER HERE.
\item What common statistical computing tasks could benefit greatly from using simple parallelization such as \texttt{foreach}?
YOUR ANSWER HERE.
\item Once you are using multicore computing on your laptop or desktop, the next step for additional computing resources is greatlakes (\url{https://arc-ts.umich.edu/greatlakes/}), which we will use next week. Previous experience with cluster computing in this group ranges from novice to expert: briefly describe any previous experience you have had with computing on a cluster.
YOUR ANSWER HERE.
\item A popular data science parallel computing approach is Hadoop with MapReduce\\
(\url{https://en.wikipedia.org/wiki/Apache_Hadoop}). Do you have suggestions on what parallel statistical computing tasks are more appropriate for Hadoop than for \texttt{foreach}?
YOUR ANSWER HERE.
\end{enumerate}
\end{document}