- Overview - What is the Dataverse module?
- [License(#license)]
- Module Description - What does the module do?
- Versioning
- Setup - The basics of getting started with Dataverse module
- Before you begin - Pre setup conditions
- Configuring the infrastructure - Installing dataverse
- Hieradata - Using hieradata
- Known issues - Known issues
- To do - Or not to do
- [Contributors(#contributors)]
The Dataverse module allows you to install Dataverse with Puppet.
GPLv3 - Copyright (C) 2015 International Institute of Social History <socialhistory.org>.
Dataverse is an open source (code is available on GitHub) web application to share, preserve, cite, explore and analyze research data. It facilitates making data available to others, and allows you to replicate others work. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive appropriate credit via a data citation with a persistent identifier (e.g., DOI, or Handle).The module offers support for basic management of common security settings.
This module will install dataverse with default settings and allow customisation of those settings.
This installation is code in a repository that comes with a version. It matches the same release version of the IQSS/Dataverse repository at github that existed at that time. That does not mean the version of this module is not compatible with higher, more recent release of dataverse. It is just that you may have to apply configurations or patches to keep up with the new dataverse version post puppet installation.
What this Dataverse setup affects:
- package/service/configuration files for Dataverse, R and R packages, PostgreSQL, Solr and TwoRavens.
What this Dataverse setup does not affect:
You need to run the final database and api scripts to populate your dataverse instance. For example, if you deploy on a development environment with the default settings run:
sudo -u dvnApp psql dvndb -f /opt/dataverse/scripts/database/reference_data.sql /opt/dataverse/scripts/api/setup-all.sh
optimizations such as the best settings for java and PostgreSQL.
Shibboleth configurations (as they have an experimental status), although the packages are installed.
Introductory Questions
Before getting started, you will want to consider:
- Do you need a development environment ?
- Are you installing an acceptance environment or production environment ?
Your answers to these questions will determine on which nodes you will install what classes and their parameters.
These are the dependencies the puppet module needs:
a first-time update of each of the host's package repository ( e.g. using 'apt-get update' or 'yum update' )
the unzip utility
This module depends on the altered fatmcgav-glassfish module which is not part of the puppet forge repository. There is a [PR:fatmcgav/fatmcgav-glassfish#51] to incorporate them. Until that time use the fork and install it on your puppet master or puppetless client environment:
$ wget -O fatmcgav-glassfish-0.6.0.tar.gz https://github.com/IISH/fatmcgav-glassfish/archive/dataverse.tar.gz $ puppet module install fatmcgav-glassfish-0.6.0.tar.gz
Or use Puppet Library to acquire the module.
To install Dataverse and TwoRavens with all the out-of-the-box settings use:
class {
require => Package['unzip'],
ensure => present;
}->class {
To install on different machines you can deploy per server per component. E.g.:
Server A-1: class { 'iqss::globals':}->class { 'iqss::dataverse': }
Server A-2: class { 'iqss::globals':}->class { 'iqss::dataverse': }
Server A-3: class { 'iqss::globals':}->class { 'iqss::dataverse': }
Server B-1: class { 'iqss::globals':}->class { 'iqss::database': }
Server C-1: class { 'iqss::globals':}->class { 'iqss::solr': }
Server D-1: class { 'iqss::globals':}->class { 'iqss::tworavens': }
Server E-1: class { 'iqss::globals':}->class { 'iqss::rserve': }
###Classes and Defined Types
This module modifies configuration files and directories.
####Class: Iqss::Globals
This class allows you to configure the global configuration that contain settings shared amongst classes, most notably the database settings. Example:
class {
ensure => present,
dataverse_fqdn => 'mysite.org',
database_name => 'dataverse',
database_user => 'dataverse',
database_password => 'Cárammë',
It also contains settings for
Removes all other Apache configs and vhosts. Setting this to 'false' is a stopgap measure to allow the apache module to coexist with existing or otherwise-managed configuration. Defaults to 'true'.
If the Dataverse server has multiple DNS names, this option specifies the one to be used as the “official” host name. For example, you may want to have dataverse.foobar.edu, and not the less appealling server-123.socsci.foobar.edu to appear exclusively in all the registered global identifiers, Data Deposit API records, etc. Defaults to 'localhost'.
Do note that whenever the system needs to form a service URL, by default, it will be formed with https:// and port 443. I.e., https://{dataverse.fqdn}/
If that does not suit your setup, use the Iqss::Dataverse::dataverse_site_url
The domain of the database. Defaults to Globals:dataverse_fqdn
The port of the database. Defaults to '5432'.
The name of the database. Defaults to 'dvndb'.
The name of the database owner. Defaults to 'dvnApp'.
The password for the database user. Defaults to 'dvnAppPass'.
####Class: Iqss::Dataverse
This class installs Glassfish, the domain settings and depending on the configuration builds a war or pulls a war distribution from a repository. Example:
class {
repository => 'git',
This will create three services:
- The glassfish service: $ service dataverse start|stop|status
- An R-daemon: $ service rserve start|stop|status
- The Apache web server
It also contains settings for
A JVM option: the time in minutes for a password reset. Defaults to '60'.
The location of the uploaded files and their tabular derivatives. Defaults to '/home/glassfish/dataverse/files'.
The Rserve service hostname. Defaults to Globals:dataverse_fqdn
The password needed to access the Rserve service. Defaults to 'rserve'.
The Rserve service port. Defaults to 'rserve'. Defaults to '6311'.
The serve service user. Defaults to 'serve'.
Set this to override the default URL construction behaviour of the Global::dataverse_fqdn
setting with a custom value.
The DOI endpoint for the EZID Service. Defaults to 'https://ezid.cdlib.org'.
The username to connect to the EZID Service. Defaults to 'apitest'.
The password to connect to the EZID Service. Defaults to 'apitest'.
The Glassfish parent directory. Defaults to '/home/glassfish'.
The domain name. Defaults to 'domain1'.
The e-mail -from field in the mail header. Defaults to 'do-not-reply@localhost'.
An array of jvm options. Defaults to ["-Xmx1024m", "-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"].
The mail relay hostname. Defaults to Globals:dataverse_fqdn
The user name that is allowed by the mail relay to sent mails. Defaults to 'dataversenotify'.
Key-value pairs sent with to the mail relay, such as credentials. Defaults to dummy values 'username=a_username:password=a_password'.
The service handle to submit start, stop, status commands. E.g. service dataverse start. Defaults to 'dataverse'
The download path of the glassfish package. Defaults to '/opt/glassfish'.
The user running the glassfish domain. Defaults to 'glassfish'.
The Glassfish J2EE Application server version. Defaults to '4.1'.
This indicates there the package comes from. It can be 'git' to build a war from the IQSS repository; or the repository url of a Dataverse war file. Defaults to 'https://github.com/IQSS/dataverse/releases/download/v4.0.1/dataverse-4.0.1.war'.
####Class: Iqss::Database
Installs Postgresql, the database user and database. For example:
class {
name => 'dataverse',
user => 'dataverse',
password => 'secret',
Use the Iqss::Globals class to override settings. This will create a running Postgresql server with the database, users and access policies.
It also contains settings for
When 'true' the user can create databases. Defaults to 'false'.
When 'true' the user can create roles. Defaults to 'false'.
The access rules that determine who can connect to what database from where. Defaults to:
IPv4 local connections => {
description => 'Open up a IP4 connection from localhost',
type => 'host',
database => 'dvndb',
user => 'dvnApp',
address => '',
auth_method => 'md5'
IPv6 local connections => {
description => 'Open up a IP6 connection from localhost',
type => 'host',
database => 'dvndb',
user => 'dvnApp',
address => '::1/128',
auth_method => 'md5'
The url connection string. Defaults to 'localhost'.
The fact the user can login or not. Defaults to 'true'.
Name of the database. Inherited by iqss::globals::database_name. Defaults to 'dvndb'.
The user password. Defaults to 'dvnAppPass'.
The connection port to the database. Defaults to '5432'
When 'true' this role can replicate. Defaults to 'false'.
When 'true' this role is a superuser. Defaults to 'false'.
The user name. Defaults to 'dvnApp'.
The version of Postgresql. Defaults to '9.3'.
If true
this will setup the official PostgreSQL repositories on your host. Defaults to true
This will set the default encoding encoding for all databases created with this module. On certain operating systems this will be used during the template1
initialization as well so it becomes a default outside of the module as well. Defaults to 'UTF-8'.
This will set the default database locale for all databases created with this module. Defaults to 'en_US.UTF-8'.
This value defaults to localhost
, meaning the postgres server will only accept connections from localhost. If you'd like to be able to connect to postgres from remote machines, you can override this setting. A value of *
will tell postgres to accept connections from any remote machine. Alternately, you can specify a comma-separated list of hostnames or IP addresses. (For more info, have a look at the postgresql.conf
file from your system's postgres package).
####Class: Iqss::Solr
Installs Solr. Example:
class { 'iqss::solr':
version => '4.7.1',
This will create a Jetty server with a running Solr instance : $ service solr stop|status|start
It also contains settings for
The solr core. Defaults to 'collection1'.
The Jetty home directory which contains start.jar. Defaults to '/home/solr-4.6.0/example'
Use as host to accept all connections. Defaults to Globals:dataverse_fqdn
JVM options for Jetty. Defaults to '-Xmx512m'.
The port Jetty will bind to. Defaults to '8983'.
The user running the Jetty Solr instance. Defaults to 'solr'.
The Solr home used for the jvm setting -Dsolr.solr.home. Defaults to '/home/solr-4.6.0/example/solr'.
The home directory of Solr. Defaults to '/home/solr-4.6.0'.
The download url for solr. Preferably a mirror. Defaults to 'http://archive.apache.org/dist/lucene/solr'.
The Apache Solr version. Defaults to '4.6.0'.
####Class: Iqss::Tworavens
This class installs the Apache RApache handler and the Tworavens web application. For example:
class {
tworavens_package => 'https://github.com/IQSS/TwoRavens/archive/master.zip',
parent_dir => '/var/www/html',
It also contains settings for
The public domain name of the TwoRavens web application. Defaults to 'localhost'.
The download url of TwoRavens. Defaults to 'https://github.com/IQSS/TwoRavens/archive/v0.1.zip'.
The installation directory of the TwoRavens web application. Defaults to '/var/www/html'.
The port TwoRavens can be accessed on. Defaults to '9999'.
The protocol TwoRavens can be accessed on. Defaults to 'https'.
The rapache version to be installed. Defaults to '1.2.6'.
The domain name of the dataverse web application this TwoRavens web application will connect to. Defaults to 'localhost'.
The port of the dataverse web application. Defaults to '9999'.
The url to a dataverse web application. Defaults to 'https://dataverse_fqdn
This example shows how all default settings can be set with a hieradata document. Note that you can also inject values like the R packages or package_repo:
"iqss::database::createdb": false,
"iqss::database::createrole": false,
"iqss::database::encoding": "UTF-8",
"iqss::database::listen_addresses": "*",
"iqss::database::locale": "en_US.UTF-8",
"iqss::database::login": true,
"iqss::database::manage_package_repo": true,
"iqss::database::hba_rule": {
"IPv4 local connections": {
"description": "Open up a IP4 connection from localhost",
"type": "host",
"database": "dvndb",
"user": "dvnApp",
"address": "",
"auth_method": "md5"
"IPv6 local connections": {
"description": "Open up a IP6 connection from localhost",
"type": "host",
"database": "dvndb",
"user": "dvnApp",
"address": "::1/128",
"auth_method": "md5"
"iqss::database::superuser": false,
"iqss::database::version": "9.3",
"iqss::dataverse::glassfish_domain_name": "domain1",
"iqss::dataverse::glassfish_parent_dir": "/home/glassfish",
"iqss::dataverse::glassfish_service_name": "dataverse",
"iqss::dataverse::glassfish_tmp_dir": "/opt/glassfish",
"iqss::dataverse::glassfish_user": "glassfish",
"iqss::dataverse::glassfish_version": "4.1",
"iqss::dataverse::repository": "https://github.com/IQSS/dataverse/releases/download/v4.0.1/dataverse-4.0.1.war",
"iqss::dataverse::dataverse_site_url": "https://localhost:9999",
"iqss::dataverse::dataverse_files_directory": "/home/glassfish/dataverse/files",
"iqss::dataverse::dataverse_rserve_host": "localhost",
"iqss::dataverse::dataverse_rserve_port": "6311",
"iqss::dataverse::dataverse_rserve_user": "rserve",
"iqss::dataverse::dataverse_rserve_password": "rserve",
"iqss::dataverse::dataverse_auth_password_reset_timeout_in_minutes": "60",
"iqss::dataverse::doi_username": "apitest",
"iqss::dataverse::doi_password": "apitest",
"iqss::dataverse::doi_baseurlstring": "https\://ezid.cdlib.org",
"iqss::dataverse::glassfish_fromaddress": "do-not-reply@localhost",
"iqss::dataverse::glassfish_jvmoption": [
"iqss::dataverse::glassfish_mailhost": "localhost",
"iqss::dataverse::glassfish_mailuser": "dataversenotify",
"iqss::dataverse::glassfish_mailproperties": "username=a_username:password=a_password",
"iqss::globals::apache2_purge_configs": true,
"iqss::globals::database_host": "localhost",
"iqss::globals::dataverse_fqdn": "localhost",
"iqss::globals::dataverse_port": "9999",
"iqss::globals::database_port": 5432,
"iqss::globals::database_name": "dvndb",
"iqss::globals::database_user": "dvnApp",
"iqss::globals::database_password": "dvnAppPass",
"iqss::rserve::packages_r": [ "devtools", "DescTools", "R2HTML", "Rserve", "VGAM", "AER", "dplyr", "quantreg", "geepack", "maxLik", "Amelia", "Rook", "jsonlite", "rjson"],
"iqss::rserve::packages_zelig": "https://github.com/IQSS/Zelig/archive/master.zip",
"iqss::rserve::user": "rserve",
"iqss::rserve::package_repo": "http://cran.r-project.org",
"iqss::solr::url": "http://archive.apache.org/dist/lucene/solr",
"iqss::solr::version": "4.6.0",
"iqss::solr::solr_parent_dir": "/home/solr-4.6.0",
"iqss::solr::jetty_user": "solr",
"iqss::solr::jetty_host": "localhost",
"iqss::solr::jetty_port": "8983",
"iqss::solr::jetty_java_options": "-Xmx512m",
"iqss::solr::jetty_home": "/home/solr-4.6.0/example",
"iqss::solr::solr_home": "/home/solr-4.6.0/example/solr",
"iqss::solr::core": "collection1",
"iqss::tworavens::rapache_version": "1.2.6",
"iqss::tworavens::package": "https://github.com/IQSS/TwoRavens/archive/v0.1.zip",
"iqss::tworavens::parent_dir": "/var/www/html",
"iqss::tworavens::domain": "localhost",
"iqss::tworavens::tworavens_dataverse_port": "9999",
"iqss::tworavens::port": "9999",
"iqss::tworavens::protocol": "https"
- The Rserve service does not automatically start when it is installed for the first time. Not sure why not.
- When installing TwoRavens for the first time, apache does not restart and read in the configuration.
- The Rserver config uses a static file in files/dataverse/conf/R/ but it should be a template.
- Test with Centos 7 ?
- Shibboleth
- Lucien van Wouw, IISH