Quick-Start
Application for browsing the data in the browser.
npm
6.4.1- modern browser
git clone https://github.com/siret/similant.git
cd similant
npm install
- Initialize BROWSER with empty data folder in
/public
foldercp empty-data data
npm start
- Browser should open http://localhost:3000/ in new tab.
All configuration files are stored in /public/data
folder. Example configuration is in /public/example
folder.
Configuration file descriptors.json
contains list of similarity models, it is represented as JSON array of JSON objects.
[
{
"id": "<model id>",
"name": "<model name>",
"url": "<url to model configuration>"
},
{
"id": "<model id>",
"name": "<model name>",
"url": "<url to model configuration>"
},
...
]
Configuration file descriptors/<model>.json
contains information about model and related clustering sizes.
{
"id": "<model id>",
"title": "<model name>",
"type": "<model type>",
"clusters": [
{
"id": "<clustering id>",
"size": <clustering size>,
"url": "<clustering url>"
},
{
"id": "<clustering id>",
"size": <clustering size>,
"url": "<clustering url>"
},
...
],
"data": {
"<record id>": <descriptor related fields>,
"<record id>": <descriptor related fields>,
...
},
...
<type related fields>
...
}
The supported types are:
time-series
with additional fieldaxis
(JSON array contains labels of time points).set-tokens
with additional fieldslabels
(JSON objects contains translate table from token to label)limit
(number of most frequent shown tokens)
Clustering files are usually placed in descriptors/<model>/<size>.json
and contains information about groups of records (clusters).
{
"<cluster id>": {
"id": "<cluster id>",
"pos": [ <x position>, <y position> ],
"radius": <cluster radius>,
"items": [
"<record id>",
"<record id>",
...
]
},
...
}
Database records configuration file is placed in individuals.json
and contains "all" information about every record in database in form of JSON object. Information are shown in left panel.
{
"<record_id>": {
"id": "<record_id>",
"data": {
"<key>": "<value>",
"<key>": "<value>",
...
}
},
"<record_id>": {
"id": "<record_id>",
"data": {
"<key>": "<value>",
"<key>": "<value>",
...
}
},
...
}
Targets configuration file is placed in targets.json
and is there located list of all available targets. Targets are shown in right panel.
[
{
"id": "<target id>",
"name": "<target name>",
"url": "<target configuration URL>"
},
{
"id": "<target id>",
"name": "<target name>",
"url": "<target configuration URL>"
},
...
]
Target configuration file contains information about current target and value for every record. It is usually located in /targets
folder.
{
"name": "<target name>",
"type": "<target type>",
"data": {
"<record id>": "<value>",
"<record id>": "<value>",
...
},
...
<type related fields>
...
}
The supported types are:
ordinal
with additional fieldaxis
(JSON array containing expected values),histogram
with additional fieldbins
(JSON array containing limits of histogram bins).
In the python
folder the script.py
for quick model generation is located.
python
3.6.8numpy
1.16.1scikit-learn
0.20.2scipy
1.2.0pandas
0.24.0
Script is prepared for CSV file (UTF-8 encoding) in following format.
id,data
<record id 1>,<descriptor value 1>,<descriptor value 2>,...,<descriptor value n_1>
<record id 2>,<descriptor value 1>,<descriptor value 2>,...,<descriptor value n_2>
...
<record id m>,<descriptor value 1>,<descriptor value 2>,...,<descriptor value n_m>
New models can be added using Python script:
python add_model.py -i <path to CSV file> --add
All options can be listed using following command:
python add_model.py -h