-
-
Notifications
You must be signed in to change notification settings - Fork 131
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
cf2fcc9
commit 7645461
Showing
7 changed files
with
215 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Introducing SWISH DataLab | ||
|
||
The SWISH DataLab addresses one of the main bottlenecks of data science, | ||
bringing data from different sources together, cleaning and selecting | ||
this data. Most pipelines use a general purpose programming language | ||
such as Python to clean and ingest the data into a linked data store or | ||
RDBMS after which the relevant data is selected and applicable machine | ||
learning is applied. In contrast, SWISH data management is based on | ||
Prolog, a _relational_ and _logic_ based language. External data sources | ||
such as RDBMS systems, Linked Data, CSV files, XML files, JSON, etc. are | ||
made available using a mixture of _adaptors_ that make the data | ||
available in Prolog's relational model without transferring the data and | ||
_ingestion_, which loads the data into Prolog. | ||
|
||
Subsequently, declarative rules are stated to define a clean and | ||
coherent view on the data that is targetted towards analysing this data. | ||
Due to the logic basis of Prolog this view is modular, concise and | ||
declarative, making it easy to maintain. SWI-Prolog's _tabling_ | ||
extension provides the same termination properties as DataLog as well as | ||
the same order indepency of rules within the subset Prolog shares with | ||
DataLog. Tabling also provides _caching_ results. At the same time, | ||
users have access to the more general Prolog language to code | ||
transformations that are not supported by DataLog. | ||
|
||
SWISH unites [SWI-Prolog](https://www.swi-prolog.org) and | ||
[R](https://www.r-project.org/) together behind a web based IDE that | ||
resembles [Jupyter](https://jupyter.org/) notebooks. This platform can | ||
be deployed on your laptop as well as on a server. The platform allows | ||
multiple data scientists to work on the same data simultaneously while | ||
rule sets can be reused and shared between users. This notably allows | ||
technical people to provide more complicated data transformation steps | ||
to domain experts. The platform can be configured to allow both | ||
authenticated users and anonymous users with limited access rights. | ||
Notebooks and programs are stored in a GIT-like repository and fully | ||
versioned. It is possible to create a snapshot of a query and all | ||
relevant programs for reliable reproduction of results. Data views | ||
defined in SWISH may be downloaded as CSV and can be accessed through a | ||
web based API. | ||
|
||
Using Prolog for data integration, cleaning and modelling started life | ||
as a valorisation project within [COMMIT/](https://www.commit-nl.nl/). A | ||
web enabled version of SWI-Prolog was pioneered by [Torbjörn | ||
Lager](https://www.gu.se/english/about_the_university/staff/?languageId=100001&userId=xlagto) | ||
The combination of Prolog and R has been pioneered by Nicos Angelopoulos | ||
at the NKI (Dutch Cancer Institute) in the life sciences domain. SWISH | ||
is in use at CWI to analyse user behaviour based on HTTP log data from | ||
the Dutch national library (Koninklijke Bibliotheek). Samer Abdallah | ||
(University College London) uses SWISH for analysing music. The core of | ||
SWISH is under active development and heavily tested as a shared Prolog | ||
teaching environment. | ||
|
||
Useful links: | ||
|
||
- [Download SWISH from GitHub](https://github.com/SWI-Prolog/swish) | ||
- [SWISH and R for Docker](https://hub.docker.com/u/swipl) | ||
- [SWISH for Prolog teaching](https://swish.swi-prolog.org) | ||
- [SWISH DataLab: A Web Interface for Data Exploration and Analysis, | ||
BNAIC 2016](https://www.springerprofessional.de/en/swish-datalab-a-web-interface-for-data-exploration-and-analysis/15059986) | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Clustered SWISH | ||
|
||
## Syncing the gitty store | ||
|
||
The gitty store is a directed graph of commits. Each commit is linked to | ||
a _data object_. Both commits and data objects are hashed by content and | ||
read-only. This implies they are easily replicated over the network. The | ||
replication takes two forms: | ||
|
||
- A node may _announce_ an object by sending the objects content as | ||
a series of chunks. | ||
- A node may _request_ for an entire object or a missing object | ||
chunk. Receiving nodes that have the object will broadcast the | ||
missing object. | ||
|
||
The real problem is updating the _head pointer_. This is a central | ||
database that defines the latest version of a file with a certain name. | ||
This notion must be syncronised. This is implemented as follows: | ||
|
||
- A node asks the cluster for their current head. | ||
- If all nodes agree on the current head we are done, but some | ||
nodes may not have the indicated file. | ||
- If some nodes have no head, _announce_ the head | ||
- Else | ||
- Ask all nodes to produce a backward path of commits that | ||
includes all reported heads from the other nodes. | ||
- Work out the last common hash, possibly by majority vote. | ||
- Work out the changes since this common hash. | ||
- If nodes agree or have no info, fine | ||
- If nodes disagree, go with the majority. | ||
- Propose the new head to all nodes that agreed on the majority | ||
path. These nodes will _accept_ if nothing changed since their | ||
report, blocking further changes for a specified time. | ||
- If all accept, send a new head notion. Else restart from the | ||
beginning. | ||
|
||
The above deals with a life cluster. Nodes that have missed a | ||
conversation or joined the network later may miss a file or the latest | ||
version of a file. | ||
|
||
## Remote syncing | ||
|
||
Remote syncing is necessary for both new cluster members and for cluster | ||
members that have been offline for some time. | ||
|
||
- Find the node with most changes using a request. | ||
- Ask this node to start the process. | ||
- Each cluster member checks it has the change. If not, it starts | ||
a negotiation using gitty_remote_head/2. | ||
|
||
## Profile management and login | ||
|
||
FIXME | ||
|
||
Remote sync of library(persistency)? | ||
|
||
- Realise a distributed ledger of changes. | ||
- Apply these. | ||
|
||
|
||
- Add serial to each event | ||
- Broadcast them | ||
- Adding an event | ||
- Propos | ||
|
||
|
||
## Email notifications | ||
|
||
FIXME | ||
|
||
## Chat subsystem | ||
|
||
### Maintain a global overview of visitor count | ||
|
||
Visitor change messages cary a `local_visitors` and `visitors` field and | ||
are relayed. Nodes receiving such a message uses the `local_visitors` to | ||
update their count of visitors on that node. Nodes composing such a | ||
message count the local visitors and add the known totals from the other | ||
nodes. | ||
|
||
### Subscribed files | ||
|
||
WSID joining a file, leaving a file or logging out is broadcasted and | ||
each node maintains a view of the remote users by WSID. | ||
|
||
FIXME: need to deal with joining nodes and missed updates. | ||
|
||
### Profile changes | ||
|
||
Profile changes, login, logout are sent to all nodes and each nodes | ||
sends them to the browsers that have the WSID watching some file. | ||
|
||
### Chat syncing | ||
|
||
- Find the last message of all nodes for DocID. | ||
- If Serial-ID matches, we are done | ||
- Else | ||
- Ask each node for the history as chat(Serial,ID,Time) triples. | ||
- Asses agreement (= no info or same) | ||
- If all agree, send an sync request for the serial range that | ||
is not known everywhere. | ||
- Else, send an agreement _serial_ and a list of Serial-ID | ||
pairs constructed from a chronologically ordered list of | ||
chat messages about which there is no agreement. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Running SWISH using Redis | ||
|
||
## Background | ||
|
||
- https://docs.gitlab.com/ee/administration/redis/replication_and_failover_external.html |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
- Usage for swish.swi-prolog.org | ||
- Period: Oct 29 2017 - Nov 26 2017 | ||
- Visitors: 41433 | ||
- Unique visitors: 15375 (based on IP) | ||
- Queries: 738498 | ||
- Community: | ||
- Google (Feb 7, 2018) | ||
- "link:swish.swi-prolog.org": 9010 results | ||
- SWISH Prolog: 26.800 results | ||
- GitHub: 6 contributors, 226 stars, 55 forks | ||
- Docker: | ||
- swipl/swish: 121 pulls | ||
- swipl/rserve: 43 pulls (R docker for use with SWISH) | ||
- Commercial use | ||
- Simularity (http://simularity.com/, satellite image analysis) | ||
- Public sites running SWISH with extended versions of Prolog | ||
- http://cplint.ml.unife.it/ | ||
Machine learning and R support | ||
- http://lpsdemo.interprolog.com/ | ||
"LPS is a logic and computer language for representing the thoughts | ||
and for controlling the behaviour of an intelligent machine situated | ||
in a changing world." | ||
- Publications | ||
- Torbjörn Lager, Jan Wielemaker: | ||
Pengines: Web Logic Programming Made Easy. TPLP 14 | ||
- Jan Wielemaker, Torbjörn Lager, Fabrizio Riguzzi: | ||
SWISH: SWI-Prolog for Sharing. IULP 2015. Extended version submitted | ||
to TPLP (Theory and Practice of Logic Programming journal). | ||
- Veruska Zamborlini, Jan Wielemaker, Marcos Da Silveira, Cédric Pruski, | ||
Annette ten Teije, Frank van Harmelen: SWISH for Prototyping Clinical | ||
Guideline Interactions Theory. SWAT4LS 2016 | ||
- Wouter Beek, Jan Wielemaker: | ||
SWISH: An Integrated Semantic Web Notebook. International | ||
Semantic Web Conference (Posters & Demos) 2016 | ||
- Tessel Bogaard, Jan Wielemaker, Laura Hollink, Jacco van Ossenbruggen: | ||
SWISH DataLab: A Web Interface for Data Exploration and Analysis. BNCAI 2016 | ||
- Marco Alberti, Elena Bellodi, Giuseppe Cota, Fabrizio Riguzzi, | ||
Riccardo Zese: cplint on SWISH: Probabilistic Logical Inference | ||
with a Web Browser. Intelligenza Artificiale |