-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathdocumentation.html
211 lines (187 loc) · 12.2 KB
/
documentation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
<!doctype html>
<html>
<head>
<title>Fave100 Documentation</title>
<style type="text/css">
body {
background-color: #FAFAFA;
color: #555;
font-size: 1.1em;
font-family: "calibri", sans-serif;
}
h2 {
text-decoration: underline;
}
#wrap {
display: block;
margin: 0 auto;
width: 960px;
}
</style>
</head>
<body>
<div id="wrap">
<h1>Fave100 Documentation</h1>
<a href="#devEnvironment">Setting Up Development Environment</a><br>
<a href="#musicbrainzVM">Setting Up MusicBrainz VM</a><br>
<a href="#updatingSongs">Updating Song Database</a><br>
<a href="#dedupeSongs">Removing duplicate songs</a><br>
<a href="#basicAppStructure">Basic app structure</a><br>
<a href="#codeRepos">Code Repositories</a><br>
<a href="#upgradeAE-GWT">Upgrading AppEngine and GWT version</a><br>
<a href="#testingAvatarUpload">Testing Avatar Upload Locally</a><br>
<a href="#refactoringGwtSwagger">Refactoring Generated GWT Swagger Client</a><br>
<a href="#remoteServerAdmin">Remote Server Admin</a>
<a name="devEnvironment"/>
<h2>Setting Up Development Environment</h2>
<p>The following instructions assume you are on a Windows setup.</p>
<ol>
<li>Follow instructions to download and install <a href="http://musicbrainz.org/doc/MusicBrainz_Server/Setup#MusicBrainz_Server_virtual_machine">MusicBrainz VM</a></li>
<li>Download latest version of Eclipse IDE for Java Developers (not EE version)</li>
<li>Install <a href="https://developers.google.com/eclipse/docs/getting_started">Google Plugin for Eclipse</a></li>
<li>Install <a href="https://github.com/ArcBees/gwtp-eclipse-plugin">GWTP Eclipse Plugin</a></li>
<li>Install <a href="http://msysgit.github.io">Git</a></li>
<li>Install <a href="https://code.google.com/p/gitextensions">Git Extensions</a></li>
<li>Clone Fave100 project: git clone https://github.com/yissachar/fave100</li>
<li>
<p>Remove JDO/JPA: Project->Properties->Google->App Engine</p>
<ul>
<li>Check enable local HRD</li>
<li>Uncheck use Datanucleus JDO/JPA</li>
</ul>
</li>
</ol>
<a name="musicbrainzVM"/>
<h2>Musicbrainz VM</h2>
<p>All the song data is obtained from Musicbrainz. A VM containing the Musicbrainz database (PostgreSQL) must be set up and periodically synced with the Musicbrainz data dumps (see <a href="#updatingSongs">Updating Song Database</a>).</p>
<p>In addition to the standard Musicbrainz tables, there are 4 custom Fave100 tables:</p>
<ul>
<li>autocomplete_search</li>
<li>autocomplete_search_staging</li>
<li>autocomplete_search_bad_parens</li>
<li>autocomplete_search_dupes</li>
</ul>
<h4>autocomplete_search</h4>
<p>autocomplete_search represents the entirety of the Fave100 song collection. It includes a unique integer ID for every song stored.</p>
<h4>autocomplete_search_staging</h4>
<p>autocomplete_search_staging is a temporary table that is rebuilt from scratch every time we wish to update the Fave100 song collection. It runs the entire Musicbrainz data set through a variety of filters and checks (for instance, de-duplicating the data) to give us the best results. New results that are obtained through this build (e.g. songs added to the Musicbrainz database since the last time that we updated autocomplete_search_staging) are then added to to autocomplete_search.</p>
<h4>autocomplete_search_bad_parens</h4>
<p>autocomplete_search_bad_parens contains a variety of entries that Fave100 considers to be duplicate with a non-parenthesis version when surrounded by parenthesis. For instance, "Stairway to Heaven" and "Stairway to Heaven (acoustic)" are considered by Fave100 to be the same song. autocomplete_search_bad_parens allows for entries to easily be added when they are noted</p>
<h4>autocomplete_search_dupes</h4>
<p>autocomplete_search_dupes contains manually added dupes and their links to the master song. When a song is flagged as a dupe it is inserted into this table. The AppEngine deduper uses this table to determine what songs to dedupe. When autocomplete_search is updated, autocomplete_search_dupes is consulted to determine that we do not add any songs that are flagged as dupes.</p>
<h3>Setting up MusicBrainz VM</h3>
<ol>
<li>Download and install VirtualBox</li>
<li>Setup a Ubunutu VM with ideally 8GB RAM with user musicbrainz</li>
<li>Start the VM</li>
<li>Follow instructions from https://github.com/metabrainz/musicbrainz-docker to setup musicbrainz</li>
<li>Make sure to publish service ports: https://github.com/metabrainz/musicbrainz-docker#publish-ports-of-all-services</li>
<li>Run: <code>cd /home/muzicbrainz</code></li>
<li>Run: <code>mkdir dumps</code></li>
<li>Run: <code>mkdir sql-backups</code></li>
<li>Run: <code>git clone https://[email protected]/yissachar/fave100-autocomplete.git</code></li>
<li>Run: <code>cd fave100-autocomplete</code></li>
<li>Run: <code>touch MUSICBRAINZ_LATEST</code></li>
<li>Run: <code>sudo apt-get update</code></li>
<li>Run: <code>sudo apt-get intall -y postgresql-client</code></li>
<li>Run: <code>psql -h localhost -U musicbrainz -d musicbrainz_db -f CreateTables.sql</code>. The password is "musicbrainz".</li>
<li>See <a href="#updatingSongs">Updating Song Database</a>. You can kick off an immediate update by running <code>python updateMusicbrainz.py</code></li>
</ol>
<a name="updatingSongs"/>
<h2>Updating Song Database</h2>
<ol>
<li>Verify that the latest Musicbrainz dump exists in /home/vagrant/musicbrainz/dumps/. The <a href="http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport">Musicbrainz server</a> is scanned daily with the <a href="https://bitbucket.org/yissachar/fave100-autocomplete/src/5749173c03b6be3c8c15300c03086e315385385e/updateMusicbrainz.py?at=master">updateMusicbrainz.py script</a> and new dumps are downloaded when found. Once a new dump is downloaded, it is loaded into the SQL database automatically.</li>
<li>Backup up autocomplete_search.* tables to shared storage (e.g. \\broadview\web-dev\) through DBeaver GUI</li>
<li>Run the <a href="https://github.com/hellofornow/fave100-tomcat/blob/master/src/main/java/com/caseware/fave100/Index.java">Lucene Indexer</a> to build a Lucene index from the PostgreSQL autocomplete_search song database.</li>
<li>From <a href="https://github.com/hellofornow/fave100-tomcat">Fave100 Tomcat</a> build the search WAR: "clean install" and then deploy it locally and run the unit tests "clean tomcat:deploy".</li>
<li>Make sure it passes the test before continuing</li>
<li>Delete the existing index from the S3 bucket and upload the new index</li>
<li>Restart the search server</li>
</ol>
<a name="dedupeSongs"/>
<h2>Removing Duplicate Songs</h2>
<p>At times we will come across songs in the database that we know to be duplicates but that cannot be caught by the database builder. When this happens we can manually tag the songs as duplicates using the following procedure:</p>
<ol>
<li>Run the <a href="https://bitbucket.org/yissachar/fave100-dedupe/src/36fa90d25c7b84fbb3fa2d7a5764b28f5cd8d9e4/src/main/java/com/fave100/dedupe/App.java?at=master">database deduper</a> with the following arguments: [ID_OF_DUPE] [ID_OF_MASTER]. This will delete the dupe from the actual song database and create a new entry in the dupe database indicating that the song is a dupe of the master song. When the song database is rebuilt, all songs in the dupe database will be ignored.</li>
<li>Run the <a href="https://bitbucket.org/yissachar/fave100-appengine-dedupe/src/8a2a90caaf8a5782906ea38bc33d006b9d8c6b51/src/main/java/com/fave100/appengine_dedupe/App.java?at=master">AppEngine deduper</a>. This will go through all songs in the dupe database that haven't yet been purged from AppEngine and replace them with the master song in each user's favelist. This does not need to be run after each dedupe, but it should be run before too many unpurged dupes accumulate. If we find ourselved constantly deduping songs, consider running the AppEngine deduper as a cron job.</li>
</ol>
<a name="basicAppStructure"/>
<h2>Basic App Structure</h2>
<p>
The app consists of 4 basic components:
</p>
<ul>
<li>GWTP Client</li>
<li>GWTP API Client Generator</li>
<li>AppEngine Server</li>
<li>Lucene Song Server</li>
</ul>
<h4>GWTP Client</h4>
<p>Manages all client-side actions and display. Interfaces with AppEngine through a REST API and with Lucene Song Server through JSONP</p>
<h4>GWTP API Client Generator</h4>
<p>Generates the GWTP API client from Swagger descriptor files. Must be re-run any time the API is changed.</p>
<h4>AppEngine Server</h4>
<p>AppEngine manages storage for all entities aside from Songs. For example, AppUser, Favelist, FaveItem, etc. are all AppEngine Datastore entities. Song entities are managed through references to their unique integer ID and retrieved through Lucene.</p>
<h4>Lucene Song Storage</h4>
<p>Lucene (Tomcat server hosted on Jelastic) manages storage of all Song entities. Lucene server provides search (autocomplete) and lookup (by unique integer ID) APIs.</p>
<a name="codeRepos"/>
<h2>Code Repositories</h2>
<p>
Fave100 Source consists of multiple repositories stored on different services.
</p>
<ul>
<li>
<b>Client-Server</b>
<a href="https://github.com/yissachar/fave100">https://github.com/yissachar/fave100</a>
<p>For the GWT client and the AppEngine server.</p>
</li>
<li>
<b>Lucene Search</b>
<a href="https://github.com/hellofornow/fave100-tomcat">https://github.com/hellofornow/fave100-tomcat</a>
<p>For building a Lucene index from the song database, as well as a Tomcat server for running search and lookup operations.</p>
</li>
<li>
<b>Song Database Building</b>
<a href="https://bitbucket.org/yissachar/fave100-autocomplete/src/">https://bitbucket.org/yissachar/fave100-autocomplete/src/</a>
<p>For building and updating the PostgreSQL song database.</p>
</li>
<li>
<b>Song Database Deduper</b>
<a href="https://bitbucket.org/yissachar/fave100-dedupe/src">https://bitbucket.org/yissachar/fave100-dedupe/src</a>
<p>For going through the PostgreSQL song database and removing any songs that have been marked as dupes</p>
</li>
<li>
<b>AppEngine Deduper</b>
<a href="https://bitbucket.org/yissachar/fave100-appengine-dedupe/src">https://bitbucket.org/yissachar/fave100-appengine-dedupe/src</a>
<p>For iterating through all AppEngine entities and de-duping any songs that point to a duplicate.</p>
</li>
</ul>
<a name="upgradeAE-GWT"/>
<h2>Upgrading AppEngine and GWT version</h2>
<ol>
<li>
<p>Find the latest SDKs from their download sites:</p>
<ul>
<li><a href="https://developers.google.com/appengine/downloads">AppEngine</a></li>
<li><a href="http://www.gwtproject.org/download.html">GWT</a></li>
</ul>
</li>
<li>In Eclipse set the SDKs to the new version</li>
<li>In Maven pom.xml update gwt.version and appengine.version properties</li>
</ol>
<a name="testingAvatarUpload"/>
<h2>Testing Avatar Upload Locally</h2>
<p>The local blobstore service has a few idiosyncrasies that must be followed in order for it to work correctly:</p>
<ol>
<li>The URL must be the name of your computer and not 127.0.0.1. Additionally, ensure that the URL host of the dev server matches the URL that the avatar upload is being POSTed to, otherwise they will not share sessions and the avatar upload will fail.</li>
<li>In dev mode, you cannot change an avatar locally once it has been uploaded, due to a bug with the dev server, which locks images. Instead use hosted mode to perform this operation.</li>
</ol>
<a name="refactoringGwtSwagger"/>
<h2>Refactoring Generated GWT Swagger Client</h2>
<p>Any changed you make to the generated GWT Swagger client will be lost the next time the client is generated. To get around this, you must first refactor the client to the state you want, and then change the client generator so that it produces the equivalent output.</p>
<a name="remoteServerAdmin"/>
<h2>Remote Server Admin</h2>
<p>Remote Server Administration can be accomplished through the use of the RemoteApi program (src/main/RemoteApi.java). Once valid credentials are provided, you will be able to execute code remotely on the connected server. Some helpful methods are provided for performing common tasks, but you are not restricted to using only those methods.</p>
</div>
</body>
</html>