Skip to content

Commit

Permalink
Merge pull request #31 from tetherless-world/master
Browse files Browse the repository at this point in the history
Whyis 1.0 release
  • Loading branch information
jpmccu authored Apr 7, 2018
2 parents df09e97 + c6adcc9 commit 03b48d3
Show file tree
Hide file tree
Showing 41 changed files with 7,488 additions and 183 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Emacs backup files
*~

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand All @@ -15,7 +18,7 @@ develop-eggs/
downloads/
eggs/
.eggs/
lib/
#lib/
lib64/
parts/
sdist/
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Satoru
# Whyis

Satoru is a nano-scale knowledge graph publishing, management, and analysis framework.
Satoru aims to support domain-aware management and curation of knowledge from many different sources. Its primary goal is to enable creation of useful domain- and data-driven knowledge graphs. Knowledge can be contributed and managed through direct user interaction, statistical analysis, or data ingestion from many different kinds of data sources. Every contribution to the knowledge graph is managed as a separate entity so that its provenance (publication status, attribution, and justification) is transparent and can be managed and used.
Whyis is a nano-scale knowledge graph publishing, management, and analysis framework.
Whyis aims to support domain-aware management and curation of knowledge from many different sources. Its primary goal is to enable creation of useful domain- and data-driven knowledge graphs. Knowledge can be contributed and managed through direct user interaction, statistical analysis, or data ingestion from many different kinds of data sources. Every contribution to the knowledge graph is managed as a separate entity so that its provenance (publication status, attribution, and justification) is transparent and can be managed and used.

Satoru manages its fragments of knowledge as nanopublications, which can be viewed as the smallest publishable unit. They are fragments of knowledge graphs that have secondary graphs associated with them to contain provenance and publication information. Knowledge graph systems need to manage the provenance of its contents. By using existing recommended standards like RDF, OWL, and SPARQL, nanopublications are able to provide flexible expansion and integration options without the limitations of custom database management tools. They also have the flexibility to capture any level of granularity of information required by the application.
Whyis manages its fragments of knowledge as nanopublications, which can be viewed as the smallest publishable unit. They are fragments of knowledge graphs that have secondary graphs associated with them to contain provenance and publication information. Knowledge graph systems need to manage the provenance of its contents. By using existing recommended standards like RDF, OWL, and SPARQL, nanopublications are able to provide flexible expansion and integration options without the limitations of custom database management tools. They also have the flexibility to capture any level of granularity of information required by the application.

Every entity in the resource is visible through its own Uniform Resource Identifier (URI), and is available as machine-readable linked data. When a user accesses the URI, all the nanopublications about it are aggregated together into a single graph. This approach gives users the ability to control access to this knowledge. It also provides the ability to control the publishing workflow. Rather than publishing everything immediately, nanopublications can be contributed, curated and approved, and then finally published either individually or in collections. Knowledge graph developers can flexibly control the ways in which the entities are shown to users by their type or other constraints. We provide default views for knowledge graph authoring, including for ontology development and also allow developers to provide customized views. Our plan is to base our new enhanced Nanomine on the Satoru infrastructure to enable more flexibility and extensibility.
Every entity in the resource is visible through its own Uniform Resource Identifier (URI), and is available as machine-readable linked data. When a user accesses the URI, all the nanopublications about it are aggregated together into a single graph. This approach gives users the ability to control access to this knowledge. It also provides the ability to control the publishing workflow. Rather than publishing everything immediately, nanopublications can be contributed, curated and approved, and then finally published either individually or in collections. Knowledge graph developers can flexibly control the ways in which the entities are shown to users by their type or other constraints. We provide default views for knowledge graph authoring, including for ontology development and also allow developers to provide customized views. Our plan is to base our new enhanced Nanomine on the Whyis infrastructure to enable more flexibility and extensibility.

# Nano-scale?

Expand Down
9 changes: 7 additions & 2 deletions Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Vagrant.configure(2) do |config|
vb.name = "whyis-dev"
# VM HARDWARE SPECS

vb.customize ["modifyvm", :id, "--memory", "4096"]
vb.customize ["modifyvm", :id, "--memory", "6144"]
vb.customize ["modifyvm", :id, "--cpus", "2"]
vb.customize ["modifyvm", :id, "--clipboard", "bidirectional"]
vb.customize ["modifyvm", :id, "--cpuexecutioncap", "80"]
Expand All @@ -39,6 +39,11 @@ Vagrant.configure(2) do |config|
config.vm.network "private_network", ip: "192.168.33.36"
config.ssh.forward_agent = true

# Needed in order to run screen
# https://www.vagrantup.com/docs/vagrantfile/ssh_settings.html
# http://stackoverflow.com/questions/27545745/start-screen-detached-in-a-vagrant-box-with-ssh-how
config.ssh.pty = true

# Create a public network, which generally matched to bridged network.
# Bridged networks make the machine appear as another physical device on
# your network.
Expand All @@ -59,7 +64,7 @@ Vagrant.configure(2) do |config|
# vb.gui = true
#
# # Customize the amount of memory on the VM:
vb.memory = "2048"
vb.memory = "6144"
end
#
# View the documentation for the provider you are using for more
Expand Down
5 changes: 2 additions & 3 deletions agents/nlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,8 @@ def process(self, i, o):
if assertion is not None:
npub.pubinfo.add((npub.assertion.identifier, prov.wasRevisionOf, assertion))
npub.assertion.add((concept, sio.InverseDocumentFrequency, Literal(idf)))





class EntityResolver(autonomic.UpdateChangeService):
activity_class = whyis.EntityResolution

Expand Down
28 changes: 22 additions & 6 deletions autonomic.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

import database

import tempfile

whyis = rdflib.Namespace('http://vocab.rpi.edu/whyis/')
whyis = rdflib.Namespace('http://vocab.rpi.edu/whyis/')
np = rdflib.Namespace("http://www.nanopub.org/nschema#")
Expand Down Expand Up @@ -49,7 +51,10 @@ def explain(self, nanopub, i, o):
def getInstances(self, graph):
if hasattr(graph.store, "nsBindings"):
graph.store.nsBindings = {}
return [graph.resource(i) for i, in graph.query(self.get_query(), initNs=self.app.NS.prefixes)]
prefixes = self.app.NS.prefixes
if hasattr(self, 'prefixes'):
prefixes = self.prefixes
return [graph.resource(i) for i, in graph.query(self.get_query(), initNs=prefixes)]

def process_graph(self, inputGraph):
instances = self.getInstances(inputGraph)
Expand Down Expand Up @@ -386,11 +391,13 @@ def explain(self, nanopub, i, o):
nanopub.pubinfo.add((nanopub.assertion.identifier, prov.wasAttributedTo, i.identifier))

def process(self, i, o):

query_store = database.create_query_store(self.app.db.store)
db_graph = rdflib.ConjunctiveGraph(store=query_store)
db_graph.NS = self.app.NS
setlr.actions[whyis.sparql] = db_graph
setl_graph = i.graph
#setlr.run_samples = True
resources = setlr._setl(setl_graph)
# retire old copies
old_np_map = {}
Expand All @@ -407,11 +414,16 @@ def process(self, i, o):
out = resources[output_graph]
out_conjunctive = rdflib.ConjunctiveGraph(store=out.store, identifier=output_graph)
#print "Generated graph", out.identifier, len(out), len(out_conjunctive)
nanopub_prepare_graph = rdflib.ConjunctiveGraph(store="Sleepycat")
nanopub_prepare_graph_tempdir = tempfile.mkdtemp()
nanopub_prepare_graph.store.open(nanopub_prepare_graph_tempdir, True)

mappings = {}

for new_np in self.app.nanopub_manager.prepare(out_conjunctive, mappings=mappings):
to_publish = []
triples = 0
for new_np in self.app.nanopub_manager.prepare(out_conjunctive, mappings=mappings, store=nanopub_prepare_graph.store):
self.explain(new_np, i, o)
print "Publishing", new_np.identifier
orig = [orig for orig, new in mappings.items() if new == new_np.assertion.identifier]
if len(orig) == 0:
continue
Expand All @@ -421,9 +433,13 @@ def process(self, i, o):
new_np.pubinfo.add((new_np.assertion.identifier, prov.wasQuotedFrom, orig))
if orig in old_np_map:
new_np.pubinfo.add((new_np.assertion.identifier, prov.wasRevisionOf, old_np_map[orig]))
print "Nanopub assertion has", len(new_np.assertion), "statements."
self.app.nanopub_manager.publish(new_np)
print 'Published'
print "Publishing %s with %s assertions." % (new_np.identifier, len(new_np.assertion))
to_publish.append(new_np)

#triples += len(new_np)
#if triples > 10000:
self.app.nanopub_manager.publish(*to_publish)
print 'Published'

class Deductor(UpdateChangeService):
def __init__(self, where, construct, explanation, resource="?resource", prefixes=None): # prefixes should be
Expand Down
18 changes: 14 additions & 4 deletions commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import rdflib
from nanopub import Nanopublication
from cookiecutter.main import cookiecutter
import tempfile

np = rdflib.Namespace("http://www.nanopub.org/nschema#")

Expand Down Expand Up @@ -51,22 +52,31 @@ def run(self, input_file, file_format="trig", was_revision_of=None):
print "Could not find active nanopublication to revise:", was_revision_of
return
was_revision_of = wasRevisionOf
g = rdflib.ConjunctiveGraph(identifier=rdflib.BNode().skolemize())
g = rdflib.ConjunctiveGraph(identifier=rdflib.BNode().skolemize(), store="Sleepycat")
graph_tempdir = tempfile.mkdtemp()
g.store.open(graph_tempdir, True)
#g = rdflib.ConjunctiveGraph(identifier=rdflib.BNode().skolemize())

g1 = g.parse(location=input_file, format=file_format, publicID=flask.current_app.NS.local)
if len(list(g1.subjects(rdflib.RDF.type, np.Nanopublication))) == 0:
if len(list(g.subjects(rdflib.RDF.type, np.Nanopublication))) == 0:
print "Could not find existing nanopublications.", len(g1), len(g)
new_np = Nanopublication(store=g1.store)
new_np.add((new_np.identifier, rdflib.RDF.type, np.Nanopublication))
new_np.add((new_np.identifier, np.hasAssertion, g1.identifier))
new_np.add((g1.identifier, rdflib.RDF.type, np.Assertion))

nanopub_prepare_graph = rdflib.ConjunctiveGraph(store="Sleepycat")
nanopub_prepare_graph_tempdir = tempfile.mkdtemp()
nanopub_prepare_graph.store.open(nanopub_prepare_graph_tempdir, True)
nanopubs = []
for npub in flask.current_app.nanopub_manager.prepare(g):
for npub in flask.current_app.nanopub_manager.prepare(g, store=nanopub_prepare_graph.store):
if was_revision_of is not None:
for r in was_revision_of:
print "Marking as revision of", r
npub.pubinfo.add((npub.assertion.identifier, flask.current_app.NS.prov.wasRevisionOf, r))
print 'Prepared', npub.identifier
nanopubs.append(npub)
flask.current_app.nanopub_manager.publish(npub)
flask.current_app.nanopub_manager.publish(*nanopubs)
print "Published", npub.identifier

class RetireNanopub(Command):
Expand Down
8 changes: 7 additions & 1 deletion config-template/{{cookiecutter.location}}/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@

site_name = "{{cookiecutter.project_name}}",

site_header_image = '/static/images/random_network.png',

site_description = '',

root_path = '/apps/whyis',

# use TESTING mode?
Expand Down Expand Up @@ -150,7 +154,9 @@
# service=autonomic.Crawler(predicates=[skos.broader, skos.narrower, skos.related]),
# schedule=dict(hour="1")
# )
]
],

base_rate_probability = 0.5
)


Expand Down
12 changes: 8 additions & 4 deletions config_defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import logging
from datetime import timedelta

project_name = "satoru"
project_name = "whyis"
import importer

import autonomic
Expand All @@ -23,7 +23,7 @@
# use DEBUG mode?
DEBUG = False,

site_name = "Satoru Knowledge Graph",
site_name = "Whyis Knowledge Graph",

# use TESTING mode?
TESTING = False,
Expand All @@ -43,9 +43,13 @@
'depot.storage_path' : '/data/files'
},
vocab_file = "vocab.ttl",
SATORU_TEMPLATE_DIR = None,
SATORU_CDN_DIR = None,
WHYIS_TEMPLATE_DIR = None,
WHYIS_CDN_DIR = None,

DEFAULT_ANONYMOUS_READ = False,

site_header_image = '/static/images/random_network.png',

# LOGGING
LOGGER_NAME = "%s_log" % project_name,
LOG_FILENAME = "/var/log/%s/output-%s.log" % (project_name,str(datetime.now()).replace(' ','_')),
Expand Down
7 changes: 4 additions & 3 deletions database.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def engine_from_config(config, prefix):
default_query_method=POST,
returnFormat=JSON,
node_to_sparql=node_to_sparql)
def publish(data, graphs):
def publish(data, *graphs):
s = requests.session()
s.keep_alive = False
result = s.post(store.endpoint,
Expand All @@ -59,8 +59,9 @@ def publish(data, graphs):
graph.store.open(config[prefix+"store"], create=True)
else:
graph = ConjunctiveGraph(identifier=defaultgraph)
def publish(data, graphs):
graph.addN(nanopub.quads())
def publish(data, *graphs):
for nanopub in graphs:
graph.addN(nanopub.quads())
graph.store.publish = publish

return graph
Expand Down
47 changes: 46 additions & 1 deletion default_vocab.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,9 @@ whyis:hasLabel dc:identifier "label";

rdfs:Resource whyis:hasView "resource_view.html";
whyis:hasRelated "related_nodes.json";
whyis:hasOutgoing "outgoing_resource.json";
whyis:hasIncoming "incoming_resource.json";
whyis:hasExplore "explore.html";
whyis:hasDescribe "describe.json";
whyis:hasNanopublications "nanopublications.json";
whyis:hasLabel "label_view.html".
Expand All @@ -374,10 +377,52 @@ owl:Ontology rdfs:label "Ontology";
np:Nanopublication a owl:Class;
whyis:hasView "nanopublication_view.html".

# <search> a whyis:searchView.

# whyis:searchView whyis:hasView "search.html".

# <searchView> a whyis:searchView.

# whyis:searchView whyis:hasView "search-view.html".

<searchApi> a whyis:searchApi .

whyis:searchApi whyis:hasView "search-api.json".

<search> a whyis:search .

whyis:search whyis:hasView "search-view.html".

# whyis:search whyis:hasView "search-view.html";
# whyis:searchApi "search-api.json".

# whyis:searchApi rdfs:subPropertyOf whyis:hasView;
# dc:identifier "searchApi".

rdfs:label flaskld:fieldName "name" .

<Home> a whyis:HomePage.

whyis:HomePage a owl:Class;
whyis:hasView "home_view.html".
whyis:hasView "home_view.html";
whyis:latestView "latest.json";
whyis:resolveView "resolve.json".

whyis:latestView rdfs:subPropertyOf whyis:hasView;
dc:identifier "latest".

<https://www.iana.org/assignments/media-types/text/csv>
whyis:bipartiteView "bipartite_graph.svg".

whyis:bipartiteView rdfs:subPropertyOf whyis:hasView;
dc:identifier "bipartite".

whyis:resolveView rdfs:subPropertyOf whyis:hasView;
dc:identifier "resolve".

whyis:hasExplore rdfs:subPropertyOf whyis:hasView;
dc:identifier "explore".
whyis:hasOutgoing rdfs:subPropertyOf whyis:hasView;
dc:identifier "outgoing".
whyis:hasIncoming rdfs:subPropertyOf whyis:hasView;
dc:identifier "incoming".
Loading

0 comments on commit 03b48d3

Please sign in to comment.