Data processing and visualization is done via python scripts. In order to have everything working, you must have the following dependencies installed:
bokeh
: visualization librarypandas
: processing and querying csv datatqdm
: displaying a progress bar
You can install all these packages using conda
:
conda install bokeh pandas tqdm
To visualize the data, first process it. This can be done using the
process_data.py
script. It will parse the legs.csv
folder, extract all the
user ids, then ask which user you would like to process. Then it will ask which
journey of the selected user you would like to process. You can of course both
times say all.
It will then output the sensor data in one csv file per user and per journey, in
the processed_data
folder (which will be created if necessary). This folder
follows the following hierarchy:
./processed_data
user_id
leg_id
acc_readings.csv
bluetooth_scans.csv
gyro_readings.csv
locations_scans.csv=
magn_readings.csv
wifi_scans.csv
To start processing the data, call:
python process_data.py
Once the data has been processes, visualizing it can be done via the
visualization.py
script. The script will first ask which user’s journey and
which journey in particular you want to see. It will then plot the data from
each sensor into a separate figure. You have the option to visualize the plots
in a column layout, grid layout or both (default). This can be specified as command line
arguments.
To visualize the data, call:
python visualization.py
-c
or-column
to view column layoutpython visualization.py -c
-g
or-grid
to view grid layoutpython visualization.py -g
The script will open the generated plots in a new browser tab. You can play around (pan and zoom) with the graph in the grid mode (for some reason, tools are not available yet in the column view).
Additionally the script visualization_mode.py will generate graphs that compare features that are aggregated in the features.csv. Each graph will compare a given feature between all different transport modes.
To aggregate the data into windows of a fixed size call the script windowing.py (with the optional parameter “window_size” which is the window lenght in milliseconds). This will read out the folder “processed_data” so make sure you have all data processed before! The resulting windows will be saved in the file “features.csv”.
To test the accuracy of the features extracted from the data and saved in “features.csv” you can simply invoke machine.py which will split the data into test and training set and then perform ML and report the resulting accuracy.
List of all features used (in order):
- acc_mean (mean of the accelerator’s magnitude over all axis)
- avg_con_bt (average of connected bluetooth devices over all scans that were done within this window)
- gyro_mean (see acc_mean)
- max_speed
- avg_speed
- distance_travelled
- mag_mean (see acc_mean)
- acc_mixed_0 (fft magic, there’s 10 features per axis, so 30 features total per sensor)
- …
- acc_mixed_29
- gyro_mixed_0
- …
- gyro_mixed_29