Skip to content

Commit aea3bc4

Browse files
committed
Newer examples
0 parents  commit aea3bc4

File tree

148 files changed

+225635
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

148 files changed

+225635
-0
lines changed

.gitignore

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
*.lock.db
2+
.DS_Store
3+
*~
4+
.#*
5+
#*#
6+
.RHistory
7+
.Rhistory
8+
c:\\sw\\text.txt
9+
*.temp.xml
10+
temp.xml
11+
.Rdata
12+
.RData
13+
*_external_links.xml
14+
generated
15+
16+
17+
18+
# History files
19+
.Rhistory
20+
.Rapp.history
21+
22+
# Session Data files
23+
.RData
24+
25+
# Example code in package build process
26+
*-Ex.R
27+
28+
# Output files from R CMD build
29+
/*.tar.gz
30+
31+
# Output files from R CMD check
32+
/*.Rcheck/
33+
34+
# RStudio files
35+
.Rproj.user/
36+
37+
# produced vignettes
38+
vignettes/*.html
39+
vignettes/*.pdf
40+
41+
# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
42+
.httr-oauth
43+
44+
# knitr and R markdown default cache directories
45+
/*_cache/
46+
/cache/
47+
48+
# Temporary files created by R markdown
49+
*.utf8.md
50+
*.knit.md
51+
.Rproj.user

Bookdata/README.html

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<h1 id="example-code-and-data-for-practical-data-science-with-r-by-nina-zumel-and-john-mount-manning-2014.">Example code and data for &quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014.</h1>
2+
<ul>
3+
<li>The book: <a href="http://www.manning.com/zumel/">&quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014</a> (book copyright Manning Publications Co., all rights reserved)</li>
4+
<li>The support site: <a href="https://github.com/WinVector/zmPDSwR">GitHub WinVector/zmPDSwR</a></li>
5+
</ul>
6+
<h2 id="the-code-and-data-in-this-directory-supports-examples-from">The code and data in this directory supports examples from:</h2>
7+
<ul>
8+
<li>Chapter 8: Using Unsupervised Methods</li>
9+
</ul>
10+
<h2 id="original-data">Original data:</h2>
11+
<p>Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg original link http://www.informatik.uni-freiburg.de/~cziegler/BX/</p>
12+
<p>Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.</p>
13+
<p>Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):</p>
14+
<p>Improving Recommendation Lists Through Topic Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.</p>
15+
<p>http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf</p>
16+
<h2 id="derived-works-no-claim-of-license-on-these">Derived works (no claim of license on these):</h2>
17+
<ul>
18+
<li>bxBooks.RData : R-binary version of Book-Crossing dataset.</li>
19+
<li>bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating</li>
20+
</ul>
21+
<h2 id="our-additional-documentation-notes-code-and-example-data">Our additional documentation, notes, code, and example data:</h2>
22+
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</p>
23+
<ul>
24+
<li>read_bookcrossing.R : script to read in original data files and create bxBooks.RData</li>
25+
<li>create_bookdata.R : script to create the data file bookdata.tsv</li>
26+
</ul>

Bookdata/README.md

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
2+
# Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.
3+
4+
5+
* The book: ["Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014](http://www.manning.com/zumel/) (book copyright Manning Publications Co., all rights reserved)
6+
* The support site: [GitHub WinVector/zmPDSwR](https://github.com/WinVector/zmPDSwR)
7+
8+
9+
## The code and data in this directory supports examples from:
10+
* Chapter 8: Using Unsupervised Methods
11+
12+
13+
## Original data:
14+
Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg
15+
original link http://www.informatik.uni-freiburg.de/~cziegler/BX/
16+
17+
Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September
18+
2004) from the Book-Crossing community with kind permission from Ron
19+
Hornbaker, CTO of Humankind Systems. Contains 278,858 users
20+
(anonymized but with demographic information) providing 1,149,780
21+
ratings (explicit / implicit) about 271,379 books.
22+
23+
Freely available for research use when acknowledged with the
24+
following reference (further details on the dataset are given in this
25+
publication):
26+
27+
Improving Recommendation Lists Through Topic
28+
Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph
29+
A. Konstan, Georg Lausen; Proceedings of the 14th International World
30+
Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To
31+
appear.
32+
33+
http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf
34+
35+
36+
## Derived works (no claim of license on these):
37+
38+
* bxBooks.RData : R-binary version of Book-Crossing dataset.
39+
* bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating
40+
41+
## Our additional documentation, notes, code, and example data:
42+
43+
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
44+
45+
* read_bookcrossing.R : script to read in original data files and create bxBooks.RData
46+
* create_bookdata.R : script to create the data file bookdata.tsv
47+
48+
49+

Bookdata/bookdata.tsv.gz

10.7 MB
Binary file not shown.

Bookdata/bxBooks.RData

23 MB
Binary file not shown.

Bookdata/create_bookdata.R

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
load("bxBooks.RData")
2+
colnames(bxBooks) <- gsub(".", "_", colnames(bxBooks), fixed=T)
3+
colnames(bxBookRatings) <- gsub(".", "_", colnames(bxBookRatings), fixed=T)
4+
colnames(bxUsers) <- gsub(".", "_", colnames(bxUsers), fixed=T)
5+
6+
Sys.setlocale('LC_ALL','C') # to deal with the non-US characters
7+
# remove parentheticals, which are usually
8+
# at the end of the title. First get rid of the open paren
9+
booktokens <- gsub("(", "#", bxBooks$Book_Title, fixed=T)
10+
booktokens <- gsub("^#", "(", booktokens)
11+
booktokens <- gsub("#.*$", "", booktokens) # leaves a trailing white space
12+
cleantitles <- sub("[[:space:]]+$","",booktokens) # save these
13+
14+
booktokens <- tolower(cleantitles)
15+
Books <- data.frame(ISBN=bxBooks$ISBN, token=booktokens, title=cleantitles)
16+
17+
library(sqldf)
18+
# picks a unique isbn for every token -- this is the number of unique tokens
19+
bookmap <- sqldf('SELECT min(ISBN) as misbn,
20+
token
21+
FROM Books
22+
GROUP BY token')
23+
24+
# displaymap has a title for every unique token
25+
displaymap <- sqldf('SELECT Books.title as title,
26+
bookmap.token as token
27+
FROM Books,
28+
bookmap
29+
WHERE Books.ISBN=bookmap.misbn')
30+
31+
# bookdata1 is shorter than bxBookRatings because
32+
# some of the rated books are not in the bxBooks data
33+
bookdata1 <- sqldf('SELECT ratings.User_ID as userid,
34+
Books.token as token,
35+
ratings.Book_Rating as rating
36+
FROM Books,
37+
bxBookRatings as ratings
38+
WHERE ratings.ISBN=Books.ISBN')
39+
40+
# add the displayname
41+
bookdata <- merge(bookdata1, displaymap, by="token")
42+
43+
write.table(bookdata, file="bookdata.tsv",
44+
sep="\t", row.names=F, col.names=T)

Bookdata/read_bookcrossing.R

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
2+
# first: replace \" with '
3+
bxUsers <- read.table('BX-Users.csv',header=T,sep=';',comment.char='',stringsAsFactors=F)
4+
# first replace \" with blank
5+
bxBookRatings <- read.table('BX-Book-Ratings.csv',header=T,sep=';',comment.char='',stringsAsFactors=F)
6+
# first: replace \" with '
7+
bxBooks <- read.table('BX-Books.csv',header=T,sep=';',comment.char='',stringsAsFactors=F)

Buzz/.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
buzz.aux
2+
buzz.log
3+
buzz.out
4+
cache

Buzz/BuzzDataSetDoc.pdf

114 KB
Binary file not shown.

Buzz/PeerPresentation.pdf

99 KB
Binary file not shown.

Buzz/ProjectSponsorPresentation.pdf

124 KB
Binary file not shown.

Buzz/README.html

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<h1 id="example-code-and-data-for-practical-data-science-with-r-by-nina-zumel-and-john-mount-manning-2014.">Example code and data for &quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014.</h1>
2+
<ul>
3+
<li>The book: <a href="http://www.manning.com/zumel/">&quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014</a> (book copyright Manning Publications Co., all rights reserved)</li>
4+
<li>The support site: <a href="https://github.com/WinVector/zmPDSwR">GitHub WinVector/zmPDSwR</a></li>
5+
</ul>
6+
<h2 id="the-code-and-data-in-this-directory-supports-examples-from">The code and data in this directory supports examples from:</h2>
7+
<ul>
8+
<li>Chapter 10: Documentation and Deployment</li>
9+
<li>Chapter 11: Producing Effective Presentations</li>
10+
</ul>
11+
<h2 id="original-data">Original data:</h2>
12+
<p>10-13-2013 Data from: http://ama.liglab.fr/datasets/buzz/ Using: http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.data</p>
13+
<p>(described in http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.names )</p>
14+
<p>Crypto hashes: $ shasum TomsHardware-*.txt 5a1cc7863a9da8d6e8380e1446f25eec2032bd91 TomsHardware-Absolute-Sigma-500.data.txt 86f2c0f4fba4fb42fe4ee45b48078ab51dba227e TomsHardware-Absolute-Sigma-500.names.txt c239182c786baf678b55f559b3d0223da91e869c TomsHardware-Relative-Sigma-500.data.txt ec890723f91ae1dc87371e32943517bcfcd9e16a TomsHardware-Relative-Sigma-500.names.txt</p>
15+
<p>R objects produced by commands in rsteps.R saved in thRS500.Rdata</p>
16+
<p>11-6-2013</p>
17+
<p>Adding latex ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzz.pdf ) and markdown ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzzm.md ) versions of the documentation.</p>
18+
<h2 id="license-for-additional-documentation-notes-code-and-example-data">License for additional documentation, notes, code, and example data:</h2>
19+
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</p>
20+
<p>No guarantee, indemnification or claim of fitness is made regarding any of these items.</p>
21+
<p>No claim of license on works of others or derived data.</p>

Buzz/README.md

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
2+
# Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.
3+
4+
5+
* The book: ["Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014](http://www.manning.com/zumel/) (book copyright Manning Publications Co., all rights reserved)
6+
* The support site: [GitHub WinVector/zmPDSwR](https://github.com/WinVector/zmPDSwR)
7+
8+
9+
## The code and data in this directory supports examples from:
10+
* Chapter 10: Documentation and Deployment
11+
* Chapter 11: Producing Effective Presentations
12+
13+
14+
## Original data:
15+
16+
17+
10-13-2013
18+
Data from: http://ama.liglab.fr/datasets/buzz/
19+
Using:
20+
http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.data
21+
22+
(described in http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.names )
23+
24+
Crypto hashes:
25+
$ shasum TomsHardware-*.txt
26+
5a1cc7863a9da8d6e8380e1446f25eec2032bd91 TomsHardware-Absolute-Sigma-500.data.txt
27+
86f2c0f4fba4fb42fe4ee45b48078ab51dba227e TomsHardware-Absolute-Sigma-500.names.txt
28+
c239182c786baf678b55f559b3d0223da91e869c TomsHardware-Relative-Sigma-500.data.txt
29+
ec890723f91ae1dc87371e32943517bcfcd9e16a TomsHardware-Relative-Sigma-500.names.txt
30+
31+
32+
R objects produced by commands in rsteps.R saved in thRS500.Rdata
33+
34+
35+
11-6-2013
36+
37+
Adding latex ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzz.pdf ) and markdown ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzzm.md ) versions of the documentation.
38+
39+
40+
41+
## License for additional documentation, notes, code, and example data:
42+
43+
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
44+
45+
No guarantee, indemnification or claim of fitness is made regarding any of these items.
46+
47+
No claim of license on works of others or derived data.

0 commit comments

Comments
 (0)