Skip to content

Commit

Permalink
Upload initial coal-sds project
Browse files Browse the repository at this point in the history
  • Loading branch information
lewismc committed Mar 9, 2017
1 parent b4d6381 commit e5d4828
Show file tree
Hide file tree
Showing 204 changed files with 31,220 additions and 1 deletion.
46 changes: 45 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,46 @@
# coal-sds
An Apache OODT-powered Science Data System for COAL

An [Apache OODT](http://oodt.apache.org)-powered Science Data System (SDS) for [COAL](https://github.com/capstone-coal).

# Introduction
coal-sds is an end-to-end SDS capable of managing the data lifecycle
(aquisition, cataloging, archival, retrieval, processing, etc.) required for COAL.
The [Apache OODT](http://oodt.apache.org)-poweredSDS itself consists of
several components which when run as services, allow users to really explore COAL in its entirety.

# Requirements
* Java Development Kit (JDK) 1.8+
* JAVA_HOME set

# Installation

## build oodt
```
$ mvn clean package <OPTIONAL PROFILES> # see optional build profiles below
```
Typically efficient and effective cataloguing is achieved by passing the ```-Pfm-solr-catalog``` option
as this allows all data flowing into the SDS to be persisted into [Apache Solr](http://lucene.apache.org/solr).

## deploy oodt
```
$ tar zxf distribution/target/${PROJECT_ARTIFACT_ID}-distribution-*-bin.tar.gz -C /my/deployment/directory/oodt
```
---
NOTE: For other build configurations, add the following arguments:
(default) : bin, crawler, data, extensions,
filemgr (Lucene), logs, pcs, resmgr,
tomcat, workflow, pge

-Pfm-solr-catalog : default components, filemgr (Solr),
solr, tomcat/webapps/solr

# Run
```
$ cd /my/deployment/directory/oodt
$ cd bin
$ ./oodt start
```

# License
coal-sds is licensed permissively under the [Apache License v2.0](https://www.apache.org/licenses/LICENSE-2.0)
a copy of which ships with this code.
83 changes: 83 additions & 0 deletions crawler/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.github.capstone-coal</groupId>
<artifactId>coal-sds</artifactId>
<version>0.1</version>
<relativePath>../pom.xml</relativePath>
</parent>
<name>Crawler (Apache OODT)</name>
<artifactId>coal-sds-crawler</artifactId>
<packaging>jar</packaging>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2-beta-2</version>
<configuration>
<descriptors>
<descriptor>src/main/assembly/assembly.xml</descriptor>
</descriptors>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>

<dependencies>
<dependency>
<groupId>com.github.capstone-coal</groupId>
<artifactId>coal-sds-extensions</artifactId>
<version>${project.parent.version}</version>
<type>jar</type>
<scope>runtime</scope>
<exclusions>
<exclusion>
<groupId>org.apache.oodt</groupId>
<artifactId>cas-filemgr</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.oodt</groupId>
<artifactId>cas-crawler</artifactId>
<version>${oodt.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.2</version>
<scope>test</scope>
</dependency>
</dependencies>

</project>
75 changes: 75 additions & 0 deletions crawler/src/main/assembly/assembly.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<assembly>
<id>bin</id>
<formats>
<format>tar.gz</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<baseDirectory>crawler</baseDirectory>
<includeSiteDirectory>false</includeSiteDirectory>
<fileSets>
<fileSet>
<directory>${basedir}</directory>
<outputDirectory>.</outputDirectory>
<includes>
<include>LICENSE.txt</include>
<include>CHANGES.txt</include>
</includes>
</fileSet>
<fileSet>
<directory>${basedir}/src/main/resources/bin</directory>
<outputDirectory>crawler/bin</outputDirectory>
<includes/>
<fileMode>755</fileMode>
</fileSet>
<fileSet>
<directory>${basedir}/src/main/resources/logs</directory>
<outputDirectory>crawler/logs</outputDirectory>
<includes>
<include>REMOVE.log</include>
</includes>
</fileSet>
<fileSet>
<directory>${basedir}/src/main/resources/etc</directory>
<outputDirectory>crawler/etc</outputDirectory>
<includes>
<include>**.properties</include>
</includes>
</fileSet>
<fileSet>
<directory>${basedir}/src/main/resources/policy</directory>
<outputDirectory>crawler/policy</outputDirectory>
<includes/>
</fileSet>
<fileSet>
<directory>target/site/apidocs</directory>
<filtered>false</filtered>
<outputDirectory>doc</outputDirectory>
<excludes/>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<outputDirectory>crawler/lib</outputDirectory>
<unpack>false</unpack>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveDependencies>true</useTransitiveDependencies>
<unpackOptions/>
</dependencySet>
</dependencySets>
</assembly>
18 changes: 18 additions & 0 deletions crawler/src/main/resources/bin/crawlctl
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE.txt file distributed with
# this work for additional information regarding copyright ownership. The ASF
# licenses this file to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
##########################################################################

java -Djava.util.logging.config.file=../etc/logging.properties -Djava.ext.dirs=../lib org.apache.oodt.cas.crawl.daemon.CrawlDaemonController $@
76 changes: 76 additions & 0 deletions crawler/src/main/resources/bin/crawler_launcher
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# OS specific support. $var _must_ be set to either true or false.
cygwin=false
os400=false
darwin=false
case "`uname`" in
CYGWIN*) cygwin=true;;
OS400*) os400=true;;
Darwin*) darwin=true;;
esac

# resolve links - $0 may be a softlink
PRG="$0"

while [ -h "$PRG" ]; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG=`dirname "$PRG"`/"$link"
fi
done

# Get standard environment variables
PRGDIR=`dirname "$PRG"`

# Only set OODT_HOME if not already set
[ -z "$OODT_HOME" ] && OODT_HOME=`cd "$PRGDIR/../.." ; pwd`

# Get OODT environment set up
if [ -r "$OODT_HOME"/bin/env.sh ]; then
. "$OODT_HOME"/bin/env.sh
fi

# Only set CRAWLER_HOME if not already set
if [ -z "$CRAWLER_HOME" ]; then
CRAWLER_HOME="$OODT_HOME"/crawler
export CRAWLER_HOME
fi

# For Cygwin, ensure paths are in UNIX format before anything is touched
if $cygwin; then
[ -n "$JAVA_HOME" ] && JAVA_HOME=`cygpath --unix "$JAVA_HOME"`
[ -n "$JRE_HOME" ] && JRE_HOME=`cygpath --unix "$JRE_HOME"`
[ -n "$OODT_HOME" ] && OODT_HOME=`cygpath --unix "$OODT_HOME"`
[ -n "$CRAWLER_HOME" ] && CRAWLER_HOME=`cygpath --unix "$CRAWLER_HOME"`
[ -n "$CLASSPATH" ] && CLASSPATH=`cygpath --path --unix "$CLASSPATH"`
fi

# In case this script was run from somewhere else cd to this directory
cd "$CRAWLER_HOME"/bin

"$_RUNJAVA" $JAVA_OPTS $OODT_OPTS \
-Djava.ext.dirs="$CRAWLER_HOME"/lib \
-Djava.util.logging.config.file="$CRAWLER_HOME"/etc/logging.properties \
-Dorg.apache.oodt.cas.crawl.bean.repo=file:"$CRAWLER_HOME"/policy/crawler-config.xml \
-Dorg.apache.oodt.cas.cli.action.spring.config=file:"$CRAWLER_HOME"/policy/cmd-line-actions.xml \
-Dorg.apache.oodt.cas.cli.option.spring.config=file:"$CRAWLER_HOME"/policy/cmd-line-options.xml \
org.apache.oodt.cas.crawl.CrawlerLauncher "$@"
63 changes: 63 additions & 0 deletions crawler/src/main/resources/etc/logging.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE.txt file distributed with
# this work for additional information regarding copyright ownership. The ASF
# licenses this file to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


# Specify the handlers to create in the root logger
# (all loggers are children of the root logger)
# The following creates two handlers
handlers = java.util.logging.ConsoleHandler, java.util.logging.FileHandler

# Set the default logging level for the root logger
.level = ALL

# Set the default logging level for new ConsoleHandler instances
java.util.logging.ConsoleHandler.level = ALL
java.util.logging.FileHandler.level = ALL

# Set the default formatter for new ConsoleHandler instances
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter

# default file output is in user's home directory.
java.util.logging.FileHandler.pattern = ../logs/cas_crawler%g.log
java.util.logging.FileHandler.limit = 50000
java.util.logging.FileHandler.count = 5
java.util.logging.FileHandler.append = true
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter

# Set the default logging level for the subsystems

org.apache.oodt.cas.crawl.level = ALL

org.apache.oodt.cas.crawl.action.level = ALL

org.apache.oodt.cas.crawl.typedetection.level = ALL

org.apache.oodt.cas.crawl.util.level = ALL

org.apache.oodt.cas.crawl.config.level = ALL

# control the underlying commons-httpclient transport layer for xmlrpc
org.apache.commons.httpclient.level = INFO
httpclient.wire.header.level = INFO
httpclient.wire.level = INFO

org.springframework.beans.level = SEVERE
org.springframework.core.level = SEVERE
org.springframework.level = SEVERE
org.springframework.beans.factory.level = SEVERE
org.springframework.beans.factory.config.level = SEVERE
org.springframework.beans.factory.config.PropertyPlaceholderConfigurer.level = SEVERE
org.apache.oodt.cas.crawl.util.CasPropertyPlaceholderConfigurer.level = SEVERE
sun.net.level = SEVERE
18 changes: 18 additions & 0 deletions crawler/src/main/resources/logs/REMOVE.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


You can remove this file. It was only included to ensure that the log directory for this
distribution was created on assembly.
Loading

0 comments on commit e5d4828

Please sign in to comment.