Merge pull request #31 from tetherless-world/master

Whyis 1.0 release
tetherless-world · Apr 7, 2018 · 03b48d3 · 03b48d3
2 parents df09e97 + c6adcc9
commit 03b48d3
Show file tree

Hide file tree

Showing 41 changed files with 7,488 additions and 183 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,6 @@
+# Emacs backup files
+*~
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
@@ -15,7 +18,7 @@ develop-eggs/
 downloads/
 eggs/
 .eggs/
-lib/
+#lib/
 lib64/
 parts/
 sdist/

diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
-# Satoru
+# Whyis
 
-Satoru is a nano-scale knowledge graph publishing, management, and analysis framework. 
-Satoru aims to support domain-aware management and curation of knowledge from many different sources. Its primary goal is to enable creation of useful domain- and data-driven knowledge graphs. Knowledge can be contributed and managed through direct user interaction, statistical analysis, or data ingestion from many different kinds of data sources. Every contribution to the knowledge graph is managed as a separate entity so that its provenance (publication status, attribution, and justification) is transparent and can be managed and used. 
+Whyis is a nano-scale knowledge graph publishing, management, and analysis framework. 
+Whyis aims to support domain-aware management and curation of knowledge from many different sources. Its primary goal is to enable creation of useful domain- and data-driven knowledge graphs. Knowledge can be contributed and managed through direct user interaction, statistical analysis, or data ingestion from many different kinds of data sources. Every contribution to the knowledge graph is managed as a separate entity so that its provenance (publication status, attribution, and justification) is transparent and can be managed and used. 
 
-Satoru manages its fragments of knowledge as nanopublications, which can be viewed as the smallest publishable unit. They are fragments of knowledge graphs that have secondary graphs associated with them to contain provenance and publication information. Knowledge graph systems need to manage the provenance of its contents. By using existing recommended standards like RDF, OWL, and SPARQL, nanopublications are able to provide flexible expansion and integration options without the limitations of custom database management tools. They also have the flexibility to capture any level of granularity of information required by the application.
+Whyis manages its fragments of knowledge as nanopublications, which can be viewed as the smallest publishable unit. They are fragments of knowledge graphs that have secondary graphs associated with them to contain provenance and publication information. Knowledge graph systems need to manage the provenance of its contents. By using existing recommended standards like RDF, OWL, and SPARQL, nanopublications are able to provide flexible expansion and integration options without the limitations of custom database management tools. They also have the flexibility to capture any level of granularity of information required by the application.
 
-Every entity in the resource is visible through its own Uniform Resource Identifier (URI), and is available as machine-readable linked data. When a user accesses the URI, all the nanopublications about it are aggregated together into a single graph. This approach gives users the ability to control access to this knowledge. It also provides the ability to control the publishing workflow. Rather than publishing everything immediately, nanopublications can be contributed, curated and approved, and then finally published either individually or in collections. Knowledge graph developers can flexibly control the ways in which the entities are shown to users by their type or other constraints. We provide default views for knowledge graph authoring, including for ontology development and also allow developers to provide customized views. Our plan is to base our new enhanced Nanomine on the Satoru infrastructure to enable more flexibility and extensibility.
+Every entity in the resource is visible through its own Uniform Resource Identifier (URI), and is available as machine-readable linked data. When a user accesses the URI, all the nanopublications about it are aggregated together into a single graph. This approach gives users the ability to control access to this knowledge. It also provides the ability to control the publishing workflow. Rather than publishing everything immediately, nanopublications can be contributed, curated and approved, and then finally published either individually or in collections. Knowledge graph developers can flexibly control the ways in which the entities are shown to users by their type or other constraints. We provide default views for knowledge graph authoring, including for ontology development and also allow developers to provide customized views. Our plan is to base our new enhanced Nanomine on the Whyis infrastructure to enable more flexibility and extensibility.
 
 # Nano-scale?
 

diff --git a/Vagrantfile b/Vagrantfile
@@ -30,7 +30,7 @@ Vagrant.configure(2) do |config|
        vb.name = "whyis-dev"
        # VM HARDWARE SPECS
 
-       vb.customize ["modifyvm", :id, "--memory", "4096"]
+       vb.customize ["modifyvm", :id, "--memory", "6144"]
        vb.customize ["modifyvm", :id, "--cpus", "2"]
        vb.customize ["modifyvm", :id, "--clipboard", "bidirectional"]
        vb.customize ["modifyvm", :id, "--cpuexecutioncap", "80"]
@@ -39,6 +39,11 @@ Vagrant.configure(2) do |config|
   config.vm.network "private_network", ip: "192.168.33.36"
   config.ssh.forward_agent = true
 
+  # Needed in order to run screen
+  # https://www.vagrantup.com/docs/vagrantfile/ssh_settings.html
+  # http://stackoverflow.com/questions/27545745/start-screen-detached-in-a-vagrant-box-with-ssh-how
+  config.ssh.pty = true
+
   # Create a public network, which generally matched to bridged network.
   # Bridged networks make the machine appear as another physical device on
   # your network.
@@ -59,7 +64,7 @@ Vagrant.configure(2) do |config|
   #   vb.gui = true
   #
   #   # Customize the amount of memory on the VM:
-    vb.memory = "2048"
+    vb.memory = "6144"
   end
   #
   # View the documentation for the provider you are using for more

diff --git a/agents/nlp.py b/agents/nlp.py
@@ -66,9 +66,8 @@ def process(self, i, o):
             if assertion is not None:
                 npub.pubinfo.add((npub.assertion.identifier, prov.wasRevisionOf, assertion))
             npub.assertion.add((concept, sio.InverseDocumentFrequency, Literal(idf)))
-
-
-
+
+
 class EntityResolver(autonomic.UpdateChangeService):
     activity_class = whyis.EntityResolution
 

diff --git a/autonomic.py b/autonomic.py
@@ -10,6 +10,8 @@
 
 import database
 
+import tempfile
+
 whyis = rdflib.Namespace('http://vocab.rpi.edu/whyis/')
 whyis = rdflib.Namespace('http://vocab.rpi.edu/whyis/')
 np = rdflib.Namespace("http://www.nanopub.org/nschema#")
@@ -49,7 +51,10 @@ def explain(self, nanopub, i, o):
     def getInstances(self, graph):
         if hasattr(graph.store, "nsBindings"):
             graph.store.nsBindings = {}
-        return [graph.resource(i) for i, in graph.query(self.get_query(), initNs=self.app.NS.prefixes)]
+        prefixes = self.app.NS.prefixes
+        if hasattr(self, 'prefixes'):
+            prefixes = self.prefixes
+        return [graph.resource(i) for i, in graph.query(self.get_query(), initNs=prefixes)]
 
     def process_graph(self, inputGraph):
         instances = self.getInstances(inputGraph)
@@ -386,11 +391,13 @@ def explain(self, nanopub, i, o):
             nanopub.pubinfo.add((nanopub.assertion.identifier, prov.wasAttributedTo, i.identifier))
 
     def process(self, i, o):
+
         query_store = database.create_query_store(self.app.db.store)
         db_graph = rdflib.ConjunctiveGraph(store=query_store)
         db_graph.NS = self.app.NS
         setlr.actions[whyis.sparql] = db_graph
         setl_graph = i.graph
+        #setlr.run_samples = True
         resources = setlr._setl(setl_graph)
         # retire old copies
         old_np_map = {}
@@ -407,11 +414,16 @@ def process(self, i, o):
             out = resources[output_graph]
             out_conjunctive = rdflib.ConjunctiveGraph(store=out.store, identifier=output_graph)
             #print "Generated graph", out.identifier, len(out), len(out_conjunctive)
+            nanopub_prepare_graph = rdflib.ConjunctiveGraph(store="Sleepycat")
+            nanopub_prepare_graph_tempdir = tempfile.mkdtemp()
+            nanopub_prepare_graph.store.open(nanopub_prepare_graph_tempdir, True)
+
             mappings = {}
 
-            for new_np in self.app.nanopub_manager.prepare(out_conjunctive, mappings=mappings):
+            to_publish = []
+            triples = 0
+            for new_np in self.app.nanopub_manager.prepare(out_conjunctive, mappings=mappings, store=nanopub_prepare_graph.store):
                 self.explain(new_np, i, o)
-                print "Publishing", new_np.identifier
                 orig = [orig for orig, new in mappings.items() if new == new_np.assertion.identifier]
                 if len(orig) == 0:
                     continue
@@ -421,9 +433,13 @@ def process(self, i, o):
                     new_np.pubinfo.add((new_np.assertion.identifier, prov.wasQuotedFrom, orig))
                     if orig in old_np_map:
                         new_np.pubinfo.add((new_np.assertion.identifier, prov.wasRevisionOf, old_np_map[orig]))
-                print "Nanopub assertion has", len(new_np.assertion), "statements."
-                self.app.nanopub_manager.publish(new_np)
-                print 'Published'
+                print "Publishing %s with %s assertions." % (new_np.identifier, len(new_np.assertion))
+                to_publish.append(new_np)
+
+            #triples += len(new_np)
+            #if triples > 10000:
+            self.app.nanopub_manager.publish(*to_publish)
+            print 'Published'
 
 class Deductor(UpdateChangeService):
     def __init__(self, where, construct, explanation, resource="?resource", prefixes=None): # prefixes should be 

diff --git a/commands.py b/commands.py
@@ -13,6 +13,7 @@
 import rdflib
 from nanopub import Nanopublication
 from cookiecutter.main import cookiecutter
+import tempfile
 
 np = rdflib.Namespace("http://www.nanopub.org/nschema#")
 
@@ -51,22 +52,31 @@ def run(self, input_file, file_format="trig", was_revision_of=None):
                 print "Could not find active nanopublication to revise:", was_revision_of
                 return
             was_revision_of = wasRevisionOf
-        g = rdflib.ConjunctiveGraph(identifier=rdflib.BNode().skolemize())
+        g = rdflib.ConjunctiveGraph(identifier=rdflib.BNode().skolemize(), store="Sleepycat")
+        graph_tempdir = tempfile.mkdtemp()
+        g.store.open(graph_tempdir, True)
+        #g = rdflib.ConjunctiveGraph(identifier=rdflib.BNode().skolemize())
 
         g1 = g.parse(location=input_file, format=file_format, publicID=flask.current_app.NS.local)
-        if len(list(g1.subjects(rdflib.RDF.type, np.Nanopublication))) == 0:
+        if len(list(g.subjects(rdflib.RDF.type, np.Nanopublication))) == 0:
+            print "Could not find existing nanopublications.", len(g1), len(g)
             new_np = Nanopublication(store=g1.store)
             new_np.add((new_np.identifier, rdflib.RDF.type, np.Nanopublication))
             new_np.add((new_np.identifier, np.hasAssertion, g1.identifier))
             new_np.add((g1.identifier, rdflib.RDF.type, np.Assertion))
+
+        nanopub_prepare_graph = rdflib.ConjunctiveGraph(store="Sleepycat")
+        nanopub_prepare_graph_tempdir = tempfile.mkdtemp()
+        nanopub_prepare_graph.store.open(nanopub_prepare_graph_tempdir, True)
         nanopubs = []
-        for npub in flask.current_app.nanopub_manager.prepare(g):
+        for npub in flask.current_app.nanopub_manager.prepare(g, store=nanopub_prepare_graph.store):
             if was_revision_of is not None:
                 for r in was_revision_of:
                     print "Marking as revision of", r
                     npub.pubinfo.add((npub.assertion.identifier, flask.current_app.NS.prov.wasRevisionOf, r))
+            print 'Prepared', npub.identifier
             nanopubs.append(npub)
-        flask.current_app.nanopub_manager.publish(npub)
+        flask.current_app.nanopub_manager.publish(*nanopubs)
         print "Published", npub.identifier
 
 class RetireNanopub(Command):

diff --git a/config-template/{{cookiecutter.location}}/config.py b/config-template/{{cookiecutter.location}}/config.py
@@ -27,6 +27,10 @@
 
     site_name = "{{cookiecutter.project_name}}",
 
+    site_header_image = '/static/images/random_network.png',
+
+    site_description = '',
+
     root_path = '/apps/whyis',
 
     # use TESTING mode?
@@ -150,7 +154,9 @@
 #               service=autonomic.Crawler(predicates=[skos.broader, skos.narrower, skos.related]),
 #               schedule=dict(hour="1")
 #              )
-    ]
+    ],
+
+    base_rate_probability = 0.5
 )
 
 

diff --git a/config_defaults.py b/config_defaults.py
@@ -4,7 +4,7 @@
 import logging
 from datetime import timedelta
 
-project_name = "satoru"
+project_name = "whyis"
 import importer
 
 import autonomic
@@ -23,7 +23,7 @@
     # use DEBUG mode?
     DEBUG = False,
 
-    site_name = "Satoru Knowledge Graph",
+    site_name = "Whyis Knowledge Graph",
 
     # use TESTING mode?
     TESTING = False,
@@ -43,9 +43,13 @@
         'depot.storage_path' : '/data/files'
     },
     vocab_file = "vocab.ttl",
-    SATORU_TEMPLATE_DIR = None,
-    SATORU_CDN_DIR = None,
+    WHYIS_TEMPLATE_DIR = None,
+    WHYIS_CDN_DIR = None,
 
+    DEFAULT_ANONYMOUS_READ = False,
+
+    site_header_image = '/static/images/random_network.png',
+
     # LOGGING
     LOGGER_NAME = "%s_log" % project_name,
     LOG_FILENAME = "/var/log/%s/output-%s.log" % (project_name,str(datetime.now()).replace(' ','_')),

diff --git a/database.py b/database.py
@@ -39,7 +39,7 @@ def engine_from_config(config, prefix):
                                   default_query_method=POST,
                                   returnFormat=JSON,
                                   node_to_sparql=node_to_sparql)
-        def publish(data, graphs):
+        def publish(data, *graphs):
             s = requests.session()
             s.keep_alive = False
             result = s.post(store.endpoint,
@@ -59,8 +59,9 @@ def publish(data, graphs):
         graph.store.open(config[prefix+"store"], create=True)
     else:
         graph = ConjunctiveGraph(identifier=defaultgraph)
-        def publish(data, graphs):
-            graph.addN(nanopub.quads())
+        def publish(data, *graphs):
+            for nanopub in graphs:
+                graph.addN(nanopub.quads())
         graph.store.publish = publish
 
     return graph

diff --git a/default_vocab.ttl b/default_vocab.ttl
@@ -358,6 +358,9 @@ whyis:hasLabel dc:identifier "label";
 
 rdfs:Resource whyis:hasView "resource_view.html";
   whyis:hasRelated "related_nodes.json";
+  whyis:hasOutgoing "outgoing_resource.json";
+  whyis:hasIncoming "incoming_resource.json";
+  whyis:hasExplore "explore.html";
   whyis:hasDescribe "describe.json";
   whyis:hasNanopublications "nanopublications.json";
   whyis:hasLabel "label_view.html".
@@ -374,10 +377,52 @@ owl:Ontology rdfs:label "Ontology";
 np:Nanopublication a owl:Class;
     whyis:hasView "nanopublication_view.html".
 
+# <search> a whyis:searchView.
+
+# whyis:searchView whyis:hasView "search.html".
+
+# <searchView> a whyis:searchView.
+
+# whyis:searchView whyis:hasView "search-view.html".
+
+<searchApi> a whyis:searchApi . 
+
+whyis:searchApi whyis:hasView "search-api.json".
+
+<search> a whyis:search .
+
+whyis:search whyis:hasView "search-view.html".
+
+# whyis:search whyis:hasView "search-view.html";
+#   whyis:searchApi "search-api.json".
+
+#   whyis:searchApi rdfs:subPropertyOf whyis:hasView;
+#   dc:identifier "searchApi".
 
 rdfs:label flaskld:fieldName "name" .
 
 <Home> a whyis:HomePage.
 
 whyis:HomePage a owl:Class;
-  whyis:hasView "home_view.html".
+  whyis:hasView "home_view.html";
+  whyis:latestView "latest.json";
+  whyis:resolveView "resolve.json".
+
+whyis:latestView rdfs:subPropertyOf whyis:hasView;
+  dc:identifier "latest".
+
+<https://www.iana.org/assignments/media-types/text/csv>
+  whyis:bipartiteView "bipartite_graph.svg".
+
+whyis:bipartiteView rdfs:subPropertyOf whyis:hasView;
+  dc:identifier "bipartite".
+
+whyis:resolveView rdfs:subPropertyOf whyis:hasView;
+  dc:identifier "resolve".
+
+whyis:hasExplore rdfs:subPropertyOf whyis:hasView;
+  dc:identifier "explore".
+whyis:hasOutgoing rdfs:subPropertyOf whyis:hasView;
+  dc:identifier "outgoing".
+whyis:hasIncoming rdfs:subPropertyOf whyis:hasView;
+  dc:identifier "incoming".