This is the data donation project currently being led by Abdul Karim Obeid (@askoj), and supervised by Professor Axel Bruns, and Associate Professor Daniel Angus of the ADM+S, in collaboration with AlgorithmWatch.
Report Bug
·
Request Feature
Users install the Node.js based plugin from their respective web extension store. This source code is found at the top-level of this repository.
The infrastructure of this project is compiled as a cross-browser search plugin that operates on Google Chrome, Microsoft Edge, Mozilla Firefox, and Blink Opera. The plugin uses a boilerplate template by Bharani (see https://github.com/EmailThis/extension-boilerplate). It runs exactly as was originally conceived by AlgorithmWatch (https://github.com/algorithmwatch/australianSearchExperience), with the addition of some extended functionality. The plugin is designed such that it periodically scrapes data from a simulated search engine session and then sends the data up to our server.
If you would like to compile the unpacked extension, you will need a current installation of npm
. Navigate to the cloned folder and then run the command npm install
. This will install the necessary modules for the extension. Then run the command npm run buildmv2
(for Mozilla Firefox and Blink Opera) or npm run buildmv3
(for Google Chrome or Microsoft Edge), depending on which browser you will be using.
Once installed, the user is redirected to a registration page implemented in HTML / CSS / JavaScript (backend\acquisition-form
). When the user submits the form, it calls an API endpoint for the aw-datenspende-api
AWS Lambda function (backend\lambdas\aw-datenspende-api
) to generate user's de-identified demographic profile (aw-datenspende-users
AWS DynamoDB table) for use with the plugin. This process is vetted by the entries of the aw-datenspende-ip-cache
AWS DynamoDB table, which throttle excessive registrations.
When the plugin periodically runs, data is sent from the user's local machine to the API endpoint of the aw-datenspende-api
AWS Lambda function (backend\lambdas\aw-datenspende-api
), which records the metadata of the data donation in the aw-datenspende
AWS DynamoDB table, and the data of the data donation itself within the aw-datenspende-bucket
AWS S3 bucket. This process is vetted by the entries of the aw-datenspende-ip-cache
AWS DynamoDB table, which throttle excessive data donations.
On an hourly basis, the AWDatenspendeBQHourlyGoogleNews
, AWDatenspendeBQHourlyGoogleSearch
, AWDatenspendeBQHourlyGoogleVideos
, AWDatenspendeBQHourlyUserAppend
, and AWDatenspendeBQHourlyYoutube
AWS EventBridge rules are automatically executed. Each rule instantiates an instance of the aw-datenspende-bq-api
AWS Lambda function (backend\lambdas\aw-datenspende-bq-api
), by means of the following JSONs, where any respective line is given as the configured target input:
{ "platform" : "google_news" } /* AWDatenspendeBQHourlyGoogleNews */
{ "platform" : "google_search" } /* AWDatenspendeBQHourlyGoogleSearch */
{ "platform" : "google_videos" } /* AWDatenspendeBQHourlyGoogleVideos */
{ "platform" : "users" } /* AWDatenspendeBQHourlyUserAppend */
{ "platform" : "youtube" } /* AWDatenspendeBQHourlyYoutube */
Each respective instance of the AWS Lambda functions queries an entry within the aw-datenspende-bq-ticker
AWS DynamoDB table for the current index (as a time window) of data donations to push. To index said time windows, the aw-datenspende-bq-api
AWS Lambda function calls the aw-datenspende-pull
AWS Lambda function (backend\lambdas\aw-datenspende-pull
), which returns the necessary data back to it. The data is then sanitised and mapped to the necessary schema, before being stored in the aw-datenspende-bq-bucket
AWS S3 bucket. Lastly, it is pushed to the Google BigQuery infrastructure, for which the creation scripts of all necessary tables and their associated schemas can be found at backend\lambdas\aw-datenspende-bq-api\run.py
and backend\lambdas\aw-datenspende-bq-api\schemas\*
respectively.
On a daily basis, statistics about the number of current user registrations and data donations are compiled. This is achieved by the AWDatenspendeRunDaily
AWS EventBridge rule, which calls the aw-datenspende-quickstats
AWS Lambda function (backend\lambdas\aw-datenspende-quickstats
), by means of an empty JSON input. The resulting statistics are stored in the file entitled user_stats.json
, within the aw-datenspende-bucket
AWS S3 bucket for public access.
Within the entire Australian Search Experience infrastructure, there is no specification for the setup of the DynamoDB tables. This is because DynamoDB tables are always schemaless; all tables included in this project share the same configuration of a uuid
primary key. Beyond this, all configuration is inherited by the source code of the supporting infrastructures.
Abdul Obeid - @aobeid_1 - [email protected]
Project Link: https://github.com/ADMSCentre/australian-search-experience