diff --git a/NEWS b/NEWS index f23fe1172..37a94b3e1 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,7 @@ Magpie 3.0 includes some new software support, but primarily removes a lot of older support. Legacy support Removed +- Mahout Backwards Compatibility Notes diff --git a/README.md b/README.md index 303b02347..7069952d8 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,11 @@ Magpie ------ Magpie contains a number of scripts for running Big Data software in -HPC environments. Thus far, Hadoop, Spark, Hbase, Storm, Pig, Mahout, -Phoenix, Kafka, Zeppelin, Zookeeper, and Alluxio are supported. It currently -supports running over the parallel file system Lustre and running over -any generic network filesytem. There is scheduler/resource manager -support for Slurm, Moab, Torque, and LSF. +HPC environments. Thus far, Hadoop, Spark, Hbase, Storm, Pig, +Phoenix, Kafka, Zeppelin, Zookeeper, and Alluxio are supported. It +currently supports running over the parallel file system Lustre and +running over any generic network filesytem. There is +scheduler/resource manager support for Slurm, Moab, Torque, and LSF. Some of the features presently supported: @@ -80,8 +80,6 @@ Zookeeper - 3.4.X Storm - 0.9.X, 0.10.X, 1.0.X, 1.1.X, 1.2.X -Mahout - 0.11.X, 0.12.X, 0.13.0 - Phoenix - 4.5.X, 4.6.0, 4.7.0, 4.8.X, 4.9.0, 4.10.1, 4.11.0, 4.12.0, 4.13.X, 4.14.0 @@ -117,6 +115,8 @@ Removed in Magpie 2.0 Removed in Magpie 3.0 + - Mahout + Documentation ------------- diff --git a/doc/README b/doc/README index 993dbb012..b7d3c036f 100644 --- a/doc/README +++ b/doc/README @@ -3,7 +3,7 @@ Magpie Magpie contains a number of scripts for running Big Data software in HPC environments. Thus far, Hadoop, Spark, Hbase, Hive, Storm, Pig, -Mahout, Phoenix, Kafka, Zeppelin, and Zookeeper are supported. It +Phoenix, Kafka, Zeppelin, and Zookeeper are supported. It currently supports running over the parallel file system Lustre and running over any generic network filesytem. There is scheduler/resource manager support for Slurm, Moab, Torque, and LSF. @@ -252,8 +252,6 @@ Storm - 0.9.3, 0.9.4, 0.9.5, 0.9.6, 0.9.7, 0.10.0, 0.10.1, 0.10.2, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.2.1, 1.2.2, 1.2.3 -Mahout - 0.11.0+, 0.11.1+, 0.11.2+, 0.12.0+, 0.12.1+, 0.12.2, 0.13.0 - Phoenix - 4.5.0-Hbase-1.0+, 4.5.0-Hbase-1.1+, 4.5.1-Hbase-1.0+, 4.5.1-Hbase-1.1+, 4.5.2-HBase-1.0+, 4.5.2-HBase-1.1+, 4.6.0-Hbase-1.0+, 4.6.0-Hbase-1.1, 4.7.0-Hbase-1.0+, @@ -288,9 +286,6 @@ to be a good starting point to use in running jobs. Pig 0.13.X, 0.14.X w/ Hadoop 2.6.X Pig 0.15.X -> 0.17.X w/ Hadoop 2.7.X -Mahout 0.11.X w/ Hadoop 2.7.X -Mahout 0.12.X w/ Hadoop 2.7.X - Hbase 0.98.X w/ Hadoop 2.2.X, Zookeeper 3.4.X Hbase 0.99.X -> 1.6.X w/ Hadoop 2.7.X, Zookeeper 3.4.X @@ -376,7 +371,7 @@ considered on the side of experimental. - Experimental - Packages: Kafka, Zeppelin, Mahout, Hive, TensorFlow w/ & w/o + Packages: Kafka, Zeppelin, Hive, TensorFlow w/ & w/o Horovod, Ray Documentation @@ -388,7 +383,6 @@ files. Hadoop - See README.hadoop Pig - See README.pig -Mahout - See README.mahout Hbase - See README.hbase Hive - See README.hive Spark - See README.spark diff --git a/doc/README.mahout b/doc/README.mahout deleted file mode 100644 index d43233a42..000000000 --- a/doc/README.mahout +++ /dev/null @@ -1,126 +0,0 @@ -Instructions For Using Mahout ------------------------------ - -0) If necessary, download your favorite version of Mahout from the - Apache download site and install it into a location where it's - accessible on all cluster nodes. Usually this is on a NFS home - directory. - - Make sure that the version of Mahout you install is compatible with - the version of Hadoop you are using. - - See below in 'Mahout Patching' about patches that may be necessary - for Mahout depending on your environment and Mahout version. - - See 'Convenience Scripts' in README about - misc/magpie-download-and-setup.sh, which may make the - downloading and patching easier. - -1) Select an appropriate submission script for running your job. You - can find them in the directory submission-scripts/, with Slurm - Sbatch scripts using srun in script-sbatch-srun, Moab Msub+Slurm - scripts using srun in script-msub-slurm-srun, Moab Msub+Torque - scripts using pdsh in script-msub-torque-pdsh, and LSF scripts - using mpirun in script-lsf-mpirun. - - You'll likely want to start with the base hadoop+mahout script - (e.g. magpie.sbatch-srun-hadoop-and-mahout) for your - scheduler/resource manager. If you wish to configure more, you can - choose to start with the base script (e.g. magpie.sbatch-srun) - which contains all configuration options. - -2) Setup your job essentials at the top of the submission script. As - an example, the following are the essentials for running with Moab. - - #MSUB -l nodes : Set how many nodes you want in your job - - #MSUB -l walltime : Set the time for this job to run - - #MSUB -l partition : Set the job partition - - #MSUB -q : Set to batch queue - - MOAB_JOBNAME : Set your job name. - - MAGPIE_SCRIPTS_HOME : Set where your scripts are - - MAGPIE_LOCAL_DIR : For scratch space files - - MAGPIE_JOB_TYPE : This should be set to 'mahout' initially - - JAVA_HOME : B/c you need to ... - -3) Setup the essentials for Mahout. - - MAHOUT_SETUP : Set to yes - - MAHOUT_VERSION : Set appropriately. - - MAHOUT_HOME : Where your mahout code is. Typically in an NFS mount. - - MAHOUT_LOCAL_DIR : A small place for conf files and log files local to - each node. Typically /tmp directory. - -4) Select how your job will run by setting MAGPIE_JOB_TYPE and/or - MAHOUT_JOB. Initially, you'll likely want to set MAGPIE_JOB_TYPE - to 'mahout' and setting MAHOUT_JOB to 'clustersyntheticcontrol'. - This will allow you to run a pre-written job to make sure things - are setup correctly. - - After this, you may want to run with MAGPIE_JOB_TYPE set to - 'interactive' to play around and figure things out. See - instructions under README.hadoop for 'interactive' mode. - - Once in your session, you can do as you please. For example, you - can launch a mahout job (bin/mahout ...). There will also be - instructions in your job output on how to tear the session down - cleanly if you wish to end your job early. - - Once you have figured out how you wish to run your job, you will - likely want to run with MAGPIE_JOB_TYPE set to 'script' mode. - Create a script that will run your job/calculation automatically, - set it in MAGPIE_JOB_SCRIPT, and then run your job. - - See "Exported Environment Variables" in README for information on - common exported environment variables that may be useful in - scripts. - - See below in "Mahout Exported Environment Variables", for information - on Mahout specific exported environment variables that may be useful - in scripts. - -5) Mahout requires Hadoop, so ensure the Hadoop is configured and also in - your submission script. See README.hadoop for Hadoop setup instructions. - -6) Submit your job into the cluster by running "sbatch -k - ./magpie.sbatchfile" for Slurm, "msub ./magpie.msubfile" for - Moab, or "bsub < ./magpie.lsffile" for LSF. - Add any other options you see fit. - -7) Look at your job output file to see your output. There will also - be some notes/instructions/tips in the output file for viewing the - status of your job in a web browser, environment variables you wish - to set if interacting with it, etc. - - See "General Advanced Usage" in README for additional tips. - -Mahout Exported Environment Variables -------------------------------------- - -There are presently no Mahout specific environment variables. - -See "Hadoop Exported Environment Variables" in README.hadoop, for -Hadoop environment variables that may be useful. - -Mahout Patching ---------------- -- Patch to support alternate scratch space directories in Mahout - examples is needed. - - The Mahout examples assume the user will always use /tmp for scratch - space. A patch to support alternate scratch space (such as the one - Magpie defines) is needed if alternate /tmp or scratch space - directories are used. - - Patches for all of these issues can be found in the patches/mahout/ - directory. diff --git a/magpie-check-inputs b/magpie-check-inputs index a0770f2d2..af21d0b9c 100755 --- a/magpie-check-inputs +++ b/magpie-check-inputs @@ -189,7 +189,7 @@ __Magpie_check_deprecated_configs () { # Flag deprecated settings for user # -oldmodes="HADOOP_MODE PIG_MODE MAHOUT_MODE HBASE_MODE PHOENIX_MODE SPARK_MODE KAFKA_MODE ZEPPELIN_MODE STORM_MODE ZOOKEEPER_MODE" +oldmodes="HADOOP_MODE PIG_MODE HBASE_MODE PHOENIX_MODE SPARK_MODE KAFKA_MODE ZEPPELIN_MODE STORM_MODE ZOOKEEPER_MODE" oldprojects="HADOOP_UDA_SETUP TACHYON_SETUP" oldfeatures="HDFS_FEDERATION_NAMENODE_COUNT HADOOP_PER_JOB_HDFS_PATH ZOOKEEPER_PER_JOB_DATA_DIR HADOOP_RAWNETWORKFS_BLOCKSIZE" oldvars="SPARK_USE_YARN MAGPIE_SCRIPT_PATH MAGPIE_SCRIPT_ARGS" @@ -257,7 +257,6 @@ if [ "${MAGPIE_JOB_TYPE}" != "hadoop" ] \ && [ "${MAGPIE_JOB_TYPE}" != "hbase" ] \ && [ "${MAGPIE_JOB_TYPE}" != "phoenix" ] \ && [ "${MAGPIE_JOB_TYPE}" != "pig" ] \ - && [ "${MAGPIE_JOB_TYPE}" != "mahout" ] \ && [ "${MAGPIE_JOB_TYPE}" != "spark" ] \ && [ "${MAGPIE_JOB_TYPE}" != "kafka" ] \ && [ "${MAGPIE_JOB_TYPE}" != "zeppelin" ] \ @@ -273,7 +272,7 @@ if [ "${MAGPIE_JOB_TYPE}" != "hadoop" ] \ && [ "${MAGPIE_JOB_TYPE}" != "interactive" ] \ && [ "${MAGPIE_JOB_TYPE}" != "setuponly" ] then - echo "MAGPIE_JOB_TYPE must be set to hadoop, hbase, pig, mahout, phoenix, spark, \ + echo "MAGPIE_JOB_TYPE must be set to hadoop, hbase, pig, phoenix, spark, \ kafka, zeppelin, storm, hive, zookeeper, tensorflow, tensorflow-horovod, ray, \ alluxio, testall, script, interactive, or setuponly" exit 1 @@ -368,7 +367,7 @@ nodecount=${MAGPIE_NODE_COUNT} # nodecountmaster is a counter to count the master only once nodecountmaster=1 -magpieprojects="HADOOP PIG MAHOUT HBASE PHOENIX SPARK KAFKA ZEPPELIN STORM HIVE ZOOKEEPER TENSORFLOW TENSORFLOW_HOROVOD RAY ALLUXIO" +magpieprojects="HADOOP PIG HBASE PHOENIX SPARK KAFKA ZEPPELIN STORM HIVE ZOOKEEPER TENSORFLOW TENSORFLOW_HOROVOD RAY ALLUXIO" for project in ${magpieprojects} do @@ -379,7 +378,6 @@ done # Did user turn on SOMETHING to run # # Pig is not "something", b/c it runs on top of hadoop -# Mahout is not "something", b/c it runs on top of hadoop if [ "${HADOOP_SETUP}" != "yes" ] \ && [ "${HBASE_SETUP}" != "yes" ] \ @@ -402,7 +400,7 @@ fi # If java required, was it set to something reasonable -magpieprojects_java="HADOOP PIG MAHOUT HBASE PHOENIX SPARK KAFKA ZEPPELIN STORM HIVE ZOOKEEPER ALLUXIO" +magpieprojects_java="HADOOP PIG HBASE PHOENIX SPARK KAFKA ZEPPELIN STORM HIVE ZOOKEEPER ALLUXIO" for project in ${magpieprojects_java} do setupvar="${project}_SETUP" @@ -415,7 +413,7 @@ done # Did user turn on something matching job run type -magpieprojects="HADOOP PIG MAHOUT HBASE PHOENIX SPARK KAFKA ZEPPELIN STORM HIVE ZOOKEEPER TENSORFLOW TENSORFLOW_HOROVOD RAY ALLUXIO" +magpieprojects="HADOOP PIG HBASE PHOENIX SPARK KAFKA ZEPPELIN STORM HIVE ZOOKEEPER TENSORFLOW TENSORFLOW_HOROVOD RAY ALLUXIO" for project in ${magpieprojects} do @@ -433,7 +431,6 @@ done if [ "${MAGPIE_JOB_TYPE}" == "testall" ] \ && [ "${HADOOP_SETUP}" != "yes" ] \ && [ "${PIG_SETUP}" != "yes" ] \ - && [ "${MAHOUT_SETUP}" != "yes" ] \ && [ "${HBASE_SETUP}" != "yes" ] \ && [ "${SPARK_SETUP}" != "yes" ] \ && [ "${KAFKA_SETUP}" != "yes" ] \ @@ -681,41 +678,6 @@ then fi fi -# -# Check Mahout Inputs -# - -if [ "${MAHOUT_SETUP}" == "yes" ] -then - __Magpie_check_must_be_set "JAVA_HOME" "for Mahout" - - __Magpie_check_must_be_set "MAHOUT_VERSION" "to run Mahout" - - __Magpie_check_if_version_format_correct "MAHOUT_VERSION" - - __Magpie_check_must_be_set_and_is_directory "MAHOUT_HOME" "to run Mahout" - - __Magpie_check_must_be_set "MAHOUT_LOCAL_DIR" "to run Mahout" - - __Magpie_check_is_enabled "Hadoop" "Mahout" - - if [ "${MAHOUT_JOB}" != "clustersyntheticcontrol" ] - then - echo "MAHOUT_JOB must be set to clustersyntheticcontrol" - exit 1 - fi - - if (! Magpie_hadoop_setup_type_enables_yarn \ - || ! Magpie_hadoop_setup_type_enables_hdfs \ - || ! Magpie_hadoop_filesystem_mode_is_hdfs_type) \ - && [ "${MAGPIE_JOB_TYPE}" == "mahout" ] \ - && [ "${MAHOUT_JOB}" == "clustersyntheticcontrol" ] - then - echo "HADOOP_SETUP_TYPE must be set to MR for MAHOUT_JOB=${MAHOUT_JOB}" - exit 1 - fi -fi - # # Check Hbase Inputs # diff --git a/magpie-run b/magpie-run index fa9119db3..bdcb69108 100755 --- a/magpie-run +++ b/magpie-run @@ -36,7 +36,6 @@ source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-alluxio source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-hadoop source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-hbase source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-kafka -source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-mahout source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-pig source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-phoenix source ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-project-hive @@ -123,8 +122,6 @@ Magpie_run_start_alluxio # After Hadoop setup, requires Hadoop Magpie_run_start_pig -# After Hadoop setup, requires Hadoop -Magpie_run_start_mahout # After Zookeeper setup, requires Zookeeper # Will set magpie_run_hbase_should_be_torndown & magpie_run_hbase_setup_successful appropriately @@ -200,9 +197,6 @@ then elif [ "${MAGPIE_JOB_TYPE}" == "pig" ] then Magpie_run_pig - elif [ "${MAGPIE_JOB_TYPE}" == "mahout" ] - then - Magpie_run_mahout elif [ "${MAGPIE_JOB_TYPE}" == "hadoop" ] then Magpie_run_hadoop diff --git a/magpie-setup-core b/magpie-setup-core index 726329949..dd7809c65 100755 --- a/magpie-setup-core +++ b/magpie-setup-core @@ -66,7 +66,7 @@ fi # Setup primary conf, log, and local scratchspace directories for projects # -magpieprojects="HADOOP PIG MAHOUT HBASE HIVE PHOENIX SPARK KAFKA ZEPPELIN STORM ZOOKEEPER RAY ALLUXIO" +magpieprojects="HADOOP PIG HBASE HIVE PHOENIX SPARK KAFKA ZEPPELIN STORM ZOOKEEPER RAY ALLUXIO" for project in ${magpieprojects} do diff --git a/magpie-setup-projects b/magpie-setup-projects index 6ddafa9a4..34bc15898 100755 --- a/magpie-setup-projects +++ b/magpie-setup-projects @@ -42,14 +42,6 @@ then fi fi -if [ "${MAHOUT_SETUP}" == "yes" ] -then - (${MAGPIE_SCRIPTS_HOME}/magpie/setup/magpie-setup-project-mahout) - if [ $? -ne 0 ] ; then - exit 1 - fi -fi - if [ "${HBASE_SETUP}" == "yes" ] then (${MAGPIE_SCRIPTS_HOME}/magpie/setup/magpie-setup-project-hbase) diff --git a/magpie/exports/magpie-exports-dirs b/magpie/exports/magpie-exports-dirs index 9223d8f47..670655a23 100755 --- a/magpie/exports/magpie-exports-dirs +++ b/magpie/exports/magpie-exports-dirs @@ -146,38 +146,6 @@ Magpie_make_pig_local_dirs_node_specific () { __Magpie_make_pig_local_dirs "specific" } -Magpie_make_mahout_local_dirs () { - local which=$1 - - if [ "${MAHOUT_SETUP}" == "yes" ] - then - if [ "${MAHOUT_LOCAL_JOB_DIR}X" == "X" ] - then - Magpie_output_internal_error "Magpie_make_mahout_local_dirs called without MAHOUT_LOCAL_JOB_DIR set" - exit 1 - fi - - export MAHOUT_CONF_DIR=${MAHOUT_LOCAL_JOB_DIR}/conf - export MAHOUT_LOCAL_SCRATCHSPACE_DIR=${MAHOUT_LOCAL_JOB_DIR}/scratch - - if [ "${which}" == "specific" ] - then - Magpie_get_magpie_hostname - myhostname=${magpie_hostname} - export MAHOUT_CONF_DIR=`echo $MAHOUT_CONF_DIR | sed "s/MAGPIEHOSTNAMESUBSTITUTION/${myhostname}/g"` - export MAHOUT_LOCAL_SCRATCHSPACE_DIR=`echo $MAHOUT_LOCAL_SCRATCHSPACE_DIR | sed "s/MAGPIEHOSTNAMESUBSTITUTION/${myhostname}/g"` - fi - fi -} - -Magpie_make_mahout_local_dirs_unspecified () { - Magpie_make_mahout_local_dirs "generic" -} - -Magpie_make_mahout_local_dirs_node_specific () { - Magpie_make_mahout_local_dirs "specific" -} - __Magpie_make_hbase_local_dirs () { local which=$1 @@ -588,18 +556,6 @@ then Magpie_make_pig_local_dirs_node_specific fi -if [ "${MAHOUT_SETUP}" == "yes" ] -then - if [ "${MAGPIE_NO_LOCAL_DIR}" == "yes" ] - then - export MAHOUT_LOCAL_JOB_DIR=${MAHOUT_LOCAL_DIR}/MAGPIEHOSTNAMESUBSTITUTION/${MAGPIE_JOB_NAME}/${MAGPIE_JOB_ID}/mahout - else - export MAHOUT_LOCAL_JOB_DIR=${MAHOUT_LOCAL_DIR}/${MAGPIE_JOB_NAME}/${MAGPIE_JOB_ID}/mahout - fi - # MAHOUT_LOCAL_SCRATCHSPACE_DIR & MAHOUT_CONF_DIR set - Magpie_make_mahout_local_dirs_node_specific -fi - if [ "${HBASE_SETUP}" == "yes" ] then if [ "${MAGPIE_NO_LOCAL_DIR}" == "yes" ] diff --git a/magpie/job/magpie-job-magpie-testall b/magpie/job/magpie-job-magpie-testall index a6125dfa9..88a16de71 100755 --- a/magpie/job/magpie-job-magpie-testall +++ b/magpie/job/magpie-job-magpie-testall @@ -45,14 +45,6 @@ then (${MAGPIE_SCRIPTS_HOME}/magpie/job/magpie-job-pig-testpig) fi -if [ "${MAHOUT_SETUP}" == "yes" ] -then - echo "*******************************************************" - echo "* Running Mahout Clustersyntheticcontrol" - echo "*******************************************************" - (${MAGPIE_SCRIPTS_HOME}/magpie/job/magpie-job-mahout-clustersyntheticcontrol) -fi - if [ "${HBASE_SETUP}" == "yes" ] then echo "*******************************************************" diff --git a/magpie/job/magpie-job-mahout-clustersyntheticcontrol b/magpie/job/magpie-job-mahout-clustersyntheticcontrol deleted file mode 100755 index 246b1e326..000000000 --- a/magpie/job/magpie-job-mahout-clustersyntheticcontrol +++ /dev/null @@ -1,48 +0,0 @@ -#!/bin/bash -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -# This script is for running the mahout clustersyntheticcontrol sanity -# test. For the most part, it shouldn't be editted. See job -# submission files for configuration details. - -source ${MAGPIE_SCRIPTS_HOME}/magpie/lib/magpie-lib-paths - -# This is a job, no loading export files or libs except for minimal convenience ones - -export MAHOUT_WORK_DIR="${MAHOUT_LOCAL_SCRATCHSPACE_DIR}/mahout" - -if [ ! -d "${MAHOUT_WORK_DIR}" ] -then - mkdir -p ${MAHOUT_WORK_DIR} - if [ $? -ne 0 ] ; then - echo "mkdir failed making ${MAHOUT_WORK_DIR}" - exit 1 - fi -fi - -command="${MAHOUT_HOME}/${mahoutexamplesprefix}/${mahoutcmdprefix}/cluster-syntheticcontrol.sh" -echo "Running $command" >&2 -echo "1" | $command - -exit 0 diff --git a/magpie/lib/magpie-lib-defaults b/magpie/lib/magpie-lib-defaults index e6aff98fc..8b1bfb0a5 100755 --- a/magpie/lib/magpie-lib-defaults +++ b/magpie/lib/magpie-lib-defaults @@ -60,11 +60,6 @@ then : fi -if [ "${MAHOUT_SETUP}" == "yes" ] -then - : -fi - if [ "${HBASE_SETUP}" == "yes" ] then default_hbase_master_port="60000" diff --git a/magpie/lib/magpie-lib-paths b/magpie/lib/magpie-lib-paths index 007298444..623d4decc 100755 --- a/magpie/lib/magpie-lib-paths +++ b/magpie/lib/magpie-lib-paths @@ -38,12 +38,6 @@ then pigcmdprefix="bin" fi -if [ "${MAHOUT_SETUP}" == "yes" ] -then - mahoutexamplesprefix="examples" - mahoutcmdprefix="bin" -fi - if [ "${HBASE_SETUP}" == "yes" ] then hbasesetupscriptprefix="bin" diff --git a/magpie/lib/magpie-lib-setup b/magpie/lib/magpie-lib-setup index 2c9bde3a9..470361b55 100755 --- a/magpie/lib/magpie-lib-setup +++ b/magpie/lib/magpie-lib-setup @@ -210,7 +210,7 @@ Magpie_calculate_stop_timeouts () { } # Count how many big data systems we're using that can run jobs -# Pig and Mahout are wrappers around Hadoop, so it doesn't count +# Pig wraps around Hadoop, so it doesn't count __Magpie_calculate_canrunjobscount () { __canrunjobscount=0 diff --git a/magpie/run/magpie-run-project-mahout b/magpie/run/magpie-run-project-mahout deleted file mode 100755 index 9c46fd1fd..000000000 --- a/magpie/run/magpie-run-project-mahout +++ /dev/null @@ -1,121 +0,0 @@ -#!/bin/bash -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -# These are functions to be called by magpie-run - -source ${MAGPIE_SCRIPTS_HOME}/magpie/exports/magpie-exports-submission-type -source ${MAGPIE_SCRIPTS_HOME}/magpie/exports/magpie-exports-dirs -source ${MAGPIE_SCRIPTS_HOME}/magpie/exports/magpie-exports-user -source ${MAGPIE_SCRIPTS_HOME}/magpie/lib/magpie-lib-log -source ${MAGPIE_SCRIPTS_HOME}/magpie/lib/magpie-lib-paths - -Magpie_run_start_mahout () { - if [ "${MAHOUT_SETUP}" == "yes" ] && [ "${magpie_run_prior_startup_successful}" == "true" ] - then - if [ "${magpie_run_hadoop_setup_successful}" == "0" ] - then - Magpie_output_internal_warning "Attempt to setup Mahout without Hadoop being setup" - fi - - if [ "${MAHOUT_OPTS}X" != "X" ] - then - if ! echo ${MAHOUT_OPTS} | grep -q -E "java.io.tmpdir" - then - export MAHOUT_OPTS="${MAHOUT_OPTS} -Djava.io.tmpdir=${MAHOUT_LOCAL_SCRATCHSPACE_DIR}/tmp" - fi - else - export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_SCRATCHSPACE_DIR}/tmp" - fi - - echo "*******************************************************" - echo "*" - echo "* Mahout Information" - echo "*" - echo "* To run Mahout directly, follow above instructions for" - echo "* to your allocation then:" - echo "*" - if echo $MAGPIE_SHELL | grep -q csh - then - echo "* setenv MAHOUT_HOME \"${MAHOUT_HOME}\"" - if [ "${MAHOUT_OPTS}X" != "X" ] - then - echo "* setenv MAHOUT_OPTS \"${MAHOUT_OPTS}\"" - fi - else - echo "* export MAHOUT_HOME=\"${MAHOUT_HOME}\"" - if [ "${MAHOUT_OPTS}X" != "X" ] - then - echo "* export MAHOUT_OPTS=\"${MAHOUT_OPTS}\"" - fi - fi - echo "*" - echo "* Then you can do as you please. For example to run a job:" - echo "*" - echo "* \$MAHOUT_HOME/${mahoutcmdprefix}/mahout ..." - echo "*" - if [ "${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT}X" != "X" ] - then - echo "* If running interactively, sourcing" - echo "*" - echo "* ${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT}" - echo "*" - echo "* will set most common environment variables for your job." - echo "*" - fi - echo "*******************************************************" - - if [ "${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT}X" != "X" ] - then - if echo $MAGPIE_SHELL | grep -q csh - then - echo "setenv MAHOUT_HOME \"${MAHOUT_HOME}\"" >> ${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT} - if [ "${MAHOUT_OPTS}X" != "X" ] - then - echo "setenv MAHOUT_OPTS \"${MAHOUT_OPTS}\"" >> ${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT} - fi - else - echo "export MAHOUT_HOME=\"${MAHOUT_HOME}\"" >> ${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT} - if [ "${MAHOUT_OPTS}X" != "X" ] - then - echo "export MAHOUT_OPTS=\"${MAHOUT_OPTS}\"" >> ${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT} - fi - fi - echo "" >> ${MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT} - fi - fi -} - -Magpie_run_mahout () { - if [ "${MAHOUT_JOB}" == "clustersyntheticcontrol" ] - then - echo "*******************************************************" - echo "* Running Clustersyntheticcontrol" - echo "*******************************************************" - ${MAGPIE_SCRIPTS_HOME}/magpie/run/magpie-run-execute script ${MAGPIE_SCRIPTS_HOME}/magpie/job/magpie-job-mahout-clustersyntheticcontrol & - local scriptpid=$! - Magpie_wait_script_sigusr2_on_job_timeout ${scriptpid} - else - Magpie_output_internal_error "MAHOUT_JOB = ${MAHOUT_JOB} not handled" - fi -} diff --git a/magpie/setup/magpie-setup-project-mahout b/magpie/setup/magpie-setup-project-mahout deleted file mode 100755 index b49d9d5b3..000000000 --- a/magpie/setup/magpie-setup-project-mahout +++ /dev/null @@ -1,36 +0,0 @@ -#!/bin/bash -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -# This script sets up configuration files for jobs. For the most -# part, it shouldn't be editted. See job submission files for -# configuration details. - -if [ "${MAHOUT_SETUP}" != "yes" ] -then - exit 0 -fi - -# achu: nothing to config or setup right now, maybe later - -exit 0 diff --git a/misc/magpie-download-and-setup.sh b/misc/magpie-download-and-setup.sh index dacdbd80d..c51bcd87c 100755 --- a/misc/magpie-download-and-setup.sh +++ b/misc/magpie-download-and-setup.sh @@ -18,7 +18,6 @@ HADOOP_DOWNLOAD="N" HBASE_DOWNLOAD="N" HIVE_DOWNLOAD="N" PIG_DOWNLOAD="N" -MAHOUT_DOWNLOAD="N" ZOOKEEPER_DOWNLOAD="N" SPARK_DOWNLOAD="N" STORM_DOWNLOAD="N" @@ -64,7 +63,6 @@ HADOOP_PACKAGE="hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz" HBASE_PACKAGE="hbase/1.6.0/hbase-1.6.0-bin.tar.gz" HIVE_PACKAGE="hive/2.3.0/apache-hive-2.3.0.tar.gz" PIG_PACKAGE="pig/pig-0.17.0/pig-0.17.0.tar.gz" -MAHOUT_PACKAGE="mahout/0.13.0/apache-mahout-distribution-0.13.0.tar.gz" ZOOKEEPER_PACKAGE="zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz" SPARK_PACKAGE="spark/spark-3.0.3/spark-3.0.3-bin-hadoop3.2.tgz" SPARK_HADOOP_PACKAGE="hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz" @@ -191,15 +189,6 @@ then # No pig patches at the moment fi -if [ "${MAHOUT_DOWNLOAD}" == "Y" ] -then - __download_apache_package "${MAHOUT_PACKAGE}" - - MAHOUT_PACKAGE_BASEDIR=$(echo `basename ${MAHOUT_PACKAGE}` | sed 's/\(.*\)\.\(.*\)\.\(.*\)/\1/g') - __apply_patches_if_exist ${MAHOUT_PACKAGE_BASEDIR} \ - ${MAGPIE_SCRIPTS_HOME}/patches/mahout/${MAHOUT_PACKAGE_BASEDIR}.patch -fi - if [ "${ZOOKEEPER_DOWNLOAD}" == "Y" ] then __download_apache_package "${ZOOKEEPER_PACKAGE}" diff --git a/patches/mahout/apache-mahout-distribution-0.11.0.patch b/patches/mahout/apache-mahout-distribution-0.11.0.patch deleted file mode 100644 index 482efcd4f..000000000 --- a/patches/mahout/apache-mahout-distribution-0.11.0.patch +++ /dev/null @@ -1,140 +0,0 @@ -diff -pruN apache-mahout-distribution-0.11.0-orig/examples/bin/classify-20newsgroups.sh apache-mahout-distribution-0.11.0/examples/bin/classify-20newsgroups.sh ---- apache-mahout-distribution-0.11.0-orig/examples/bin/classify-20newsgroups.sh 2015-08-05 21:32:16.000000000 -0700 -+++ apache-mahout-distribution-0.11.0/examples/bin/classify-20newsgroups.sh 2015-11-19 13:47:23.357731000 -0800 -@@ -36,7 +36,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - algorithm=( cnaivebayes-MapReduce naivebayes-MapReduce cnaivebayes-Spark naivebayes-Spark sgd clean) - if [ -n "$1" ]; then - choice=$1 -@@ -105,7 +109,7 @@ if ( [ "x$alg" == "xnaivebayes-MapReduc - echo "Copying 20newsgroups data to HDFS" - set +e - $DFSRM ${WORK_DIR}/20news-all -- $DFS -mkdir ${WORK_DIR} -+ $DFS -mkdir -p ${WORK_DIR} - $DFS -mkdir ${WORK_DIR}/20news-all - set -e - if [ $HVERSION -eq "1" ] ; then -diff -pruN apache-mahout-distribution-0.11.0-orig/examples/bin/classify-wikipedia.sh apache-mahout-distribution-0.11.0/examples/bin/classify-wikipedia.sh ---- apache-mahout-distribution-0.11.0-orig/examples/bin/classify-wikipedia.sh 2015-08-05 21:32:16.000000000 -0700 -+++ apache-mahout-distribution-0.11.0/examples/bin/classify-wikipedia.sh 2015-11-19 13:47:23.434652000 -0800 -@@ -42,7 +42,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-wiki -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-wiki -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - algorithm=( CBayes BinaryCBayes clean) - if [ -n "$1" ]; then - choice=$1 -@@ -107,7 +111,7 @@ if [ "x$alg" == "xCBayes" ] || [ "x$alg" - echo "Copying wikipedia data to HDFS" - set +e - $DFSRM ${WORK_DIR}/wikixml -- $DFS -mkdir ${WORK_DIR} -+ $DFS -mkdir -p ${WORK_DIR} - set -e - $DFS -put ${WORK_DIR}/wikixml ${WORK_DIR}/wikixml - fi -diff -pruN apache-mahout-distribution-0.11.0-orig/examples/bin/cluster-reuters.sh apache-mahout-distribution-0.11.0/examples/bin/cluster-reuters.sh ---- apache-mahout-distribution-0.11.0-orig/examples/bin/cluster-reuters.sh 2015-08-05 21:32:16.000000000 -0700 -+++ apache-mahout-distribution-0.11.0/examples/bin/cluster-reuters.sh 2015-11-19 13:47:23.455636000 -0800 -@@ -43,6 +43,12 @@ if [ ! -e $MAHOUT ]; then - exit 1 - fi - -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi -+ - algorithm=( kmeans fuzzykmeans lda streamingkmeans clean) - if [ -n "$1" ]; then - choice=$1 -@@ -59,8 +65,6 @@ fi - echo "ok. You chose $choice and we'll use ${algorithm[$choice-1]} Clustering" - clustertype=${algorithm[$choice-1]} - --WORK_DIR=/tmp/mahout-work-${USER} -- - if [ "x$clustertype" == "xclean" ]; then - rm -rf $WORK_DIR - $DFSRM $WORK_DIR -@@ -98,7 +102,7 @@ if [ ! -e ${WORK_DIR}/reuters-out-seqdir - set +e - $DFSRM ${WORK_DIR}/reuters-sgm - $DFSRM ${WORK_DIR}/reuters-out -- $DFS -mkdir ${WORK_DIR}/ -+ $DFS -mkdir -p ${WORK_DIR}/ - $DFS -mkdir ${WORK_DIR}/reuters-sgm - $DFS -mkdir ${WORK_DIR}/reuters-out - $DFS -put ${WORK_DIR}/reuters-sgm ${WORK_DIR}/reuters-sgm -diff -pruN apache-mahout-distribution-0.11.0-orig/examples/bin/cluster-syntheticcontrol.sh apache-mahout-distribution-0.11.0/examples/bin/cluster-syntheticcontrol.sh ---- apache-mahout-distribution-0.11.0-orig/examples/bin/cluster-syntheticcontrol.sh 2015-08-05 21:32:16.000000000 -0700 -+++ apache-mahout-distribution-0.11.0/examples/bin/cluster-syntheticcontrol.sh 2015-11-17 14:45:32.296259000 -0800 -@@ -48,7 +48,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - - echo "creating work directory at ${WORK_DIR}" - mkdir -p ${WORK_DIR} -diff -pruN apache-mahout-distribution-0.11.0-orig/examples/bin/factorize-movielens-1M.sh apache-mahout-distribution-0.11.0/examples/bin/factorize-movielens-1M.sh ---- apache-mahout-distribution-0.11.0-orig/examples/bin/factorize-movielens-1M.sh 2015-08-05 21:32:16.000000000 -0700 -+++ apache-mahout-distribution-0.11.0/examples/bin/factorize-movielens-1M.sh 2015-11-17 14:45:32.298261000 -0800 -@@ -43,7 +43,12 @@ fi - export MAHOUT_LOCAL=true - MAHOUT="$MAHOUT_HOME/bin/mahout" - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi -+ - echo "creating work directory at ${WORK_DIR}" - mkdir -p ${WORK_DIR}/movielens - -@@ -77,4 +82,4 @@ shuf ${WORK_DIR}/recommendations/part-m- - echo -e "\n\n" - - echo "removing work directory" --rm -rf ${WORK_DIR} -\ No newline at end of file -+rm -rf ${WORK_DIR} -diff -pruN apache-mahout-distribution-0.11.0-orig/examples/bin/factorize-netflix.sh apache-mahout-distribution-0.11.0/examples/bin/factorize-netflix.sh ---- apache-mahout-distribution-0.11.0-orig/examples/bin/factorize-netflix.sh 2015-08-05 21:32:16.000000000 -0700 -+++ apache-mahout-distribution-0.11.0/examples/bin/factorize-netflix.sh 2015-11-17 14:45:32.301252000 -0800 -@@ -45,7 +45,11 @@ fi - - MAHOUT="../../bin/mahout" - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - - START_PATH=`pwd` - diff --git a/patches/mahout/apache-mahout-distribution-0.11.1.patch b/patches/mahout/apache-mahout-distribution-0.11.1.patch deleted file mode 100644 index 52cb2aab2..000000000 --- a/patches/mahout/apache-mahout-distribution-0.11.1.patch +++ /dev/null @@ -1,140 +0,0 @@ -diff -pruN apache-mahout-distribution-0.11.1-orig/examples/bin/classify-20newsgroups.sh apache-mahout-distribution-0.11.1/examples/bin/classify-20newsgroups.sh ---- apache-mahout-distribution-0.11.1-orig/examples/bin/classify-20newsgroups.sh 2015-11-06 11:14:37.000000000 -0800 -+++ apache-mahout-distribution-0.11.1/examples/bin/classify-20newsgroups.sh 2015-11-19 13:47:17.293792000 -0800 -@@ -36,7 +36,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - algorithm=( cnaivebayes-MapReduce naivebayes-MapReduce cnaivebayes-Spark naivebayes-Spark sgd clean) - if [ -n "$1" ]; then - choice=$1 -@@ -105,7 +109,7 @@ if ( [ "x$alg" == "xnaivebayes-MapReduc - echo "Copying 20newsgroups data to HDFS" - set +e - $DFSRM ${WORK_DIR}/20news-all -- $DFS -mkdir ${WORK_DIR} -+ $DFS -mkdir -p ${WORK_DIR} - $DFS -mkdir ${WORK_DIR}/20news-all - set -e - if [ $HVERSION -eq "1" ] ; then -diff -pruN apache-mahout-distribution-0.11.1-orig/examples/bin/classify-wikipedia.sh apache-mahout-distribution-0.11.1/examples/bin/classify-wikipedia.sh ---- apache-mahout-distribution-0.11.1-orig/examples/bin/classify-wikipedia.sh 2015-11-06 11:14:37.000000000 -0800 -+++ apache-mahout-distribution-0.11.1/examples/bin/classify-wikipedia.sh 2015-11-19 13:47:17.295790000 -0800 -@@ -42,7 +42,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-wiki -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-wiki -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - algorithm=( CBayes BinaryCBayes clean) - if [ -n "$1" ]; then - choice=$1 -@@ -110,7 +114,7 @@ if [ "x$alg" == "xCBayes" ] || [ "x$alg" - echo "Copying wikipedia data to HDFS" - set +e - $DFSRM ${WORK_DIR}/wikixml -- $DFS -mkdir ${WORK_DIR} -+ $DFS -mkdir -p ${WORK_DIR} - set -e - $DFS -put ${WORK_DIR}/wikixml ${WORK_DIR}/wikixml - fi -diff -pruN apache-mahout-distribution-0.11.1-orig/examples/bin/cluster-reuters.sh apache-mahout-distribution-0.11.1/examples/bin/cluster-reuters.sh ---- apache-mahout-distribution-0.11.1-orig/examples/bin/cluster-reuters.sh 2015-11-06 11:14:37.000000000 -0800 -+++ apache-mahout-distribution-0.11.1/examples/bin/cluster-reuters.sh 2015-11-19 13:47:17.298785000 -0800 -@@ -43,6 +43,12 @@ if [ ! -e $MAHOUT ]; then - exit 1 - fi - -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi -+ - algorithm=( kmeans fuzzykmeans lda streamingkmeans clean) - if [ -n "$1" ]; then - choice=$1 -@@ -59,8 +65,6 @@ fi - echo "ok. You chose $choice and we'll use ${algorithm[$choice-1]} Clustering" - clustertype=${algorithm[$choice-1]} - --WORK_DIR=/tmp/mahout-work-${USER} -- - if [ "x$clustertype" == "xclean" ]; then - rm -rf $WORK_DIR - $DFSRM $WORK_DIR -@@ -98,7 +102,7 @@ if [ ! -e ${WORK_DIR}/reuters-out-seqdir - set +e - $DFSRM ${WORK_DIR}/reuters-sgm - $DFSRM ${WORK_DIR}/reuters-out -- $DFS -mkdir ${WORK_DIR}/ -+ $DFS -mkdir -p ${WORK_DIR}/ - $DFS -mkdir ${WORK_DIR}/reuters-sgm - $DFS -mkdir ${WORK_DIR}/reuters-out - $DFS -put ${WORK_DIR}/reuters-sgm ${WORK_DIR}/reuters-sgm -diff -pruN apache-mahout-distribution-0.11.1-orig/examples/bin/cluster-syntheticcontrol.sh apache-mahout-distribution-0.11.1/examples/bin/cluster-syntheticcontrol.sh ---- apache-mahout-distribution-0.11.1-orig/examples/bin/cluster-syntheticcontrol.sh 2015-11-06 11:14:37.000000000 -0800 -+++ apache-mahout-distribution-0.11.1/examples/bin/cluster-syntheticcontrol.sh 2015-11-18 14:22:51.460789000 -0800 -@@ -48,7 +48,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - - echo "creating work directory at ${WORK_DIR}" - mkdir -p ${WORK_DIR} -diff -pruN apache-mahout-distribution-0.11.1-orig/examples/bin/factorize-movielens-1M.sh apache-mahout-distribution-0.11.1/examples/bin/factorize-movielens-1M.sh ---- apache-mahout-distribution-0.11.1-orig/examples/bin/factorize-movielens-1M.sh 2015-11-06 11:14:37.000000000 -0800 -+++ apache-mahout-distribution-0.11.1/examples/bin/factorize-movielens-1M.sh 2015-11-18 14:22:51.483766000 -0800 -@@ -43,7 +43,12 @@ fi - export MAHOUT_LOCAL=true - MAHOUT="$MAHOUT_HOME/bin/mahout" - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi -+ - echo "creating work directory at ${WORK_DIR}" - mkdir -p ${WORK_DIR}/movielens - -@@ -77,4 +82,4 @@ shuf ${WORK_DIR}/recommendations/part-m- - echo -e "\n\n" - - echo "removing work directory" --rm -rf ${WORK_DIR} -\ No newline at end of file -+rm -rf ${WORK_DIR} -diff -pruN apache-mahout-distribution-0.11.1-orig/examples/bin/factorize-netflix.sh apache-mahout-distribution-0.11.1/examples/bin/factorize-netflix.sh ---- apache-mahout-distribution-0.11.1-orig/examples/bin/factorize-netflix.sh 2015-11-06 11:14:37.000000000 -0800 -+++ apache-mahout-distribution-0.11.1/examples/bin/factorize-netflix.sh 2015-11-18 14:22:51.502751000 -0800 -@@ -45,7 +45,11 @@ fi - - MAHOUT="../../bin/mahout" - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - - START_PATH=`pwd` - diff --git a/patches/mahout/apache-mahout-distribution-0.11.2.patch b/patches/mahout/apache-mahout-distribution-0.11.2.patch deleted file mode 100644 index 8dde01e3e..000000000 --- a/patches/mahout/apache-mahout-distribution-0.11.2.patch +++ /dev/null @@ -1,131 +0,0 @@ -diff -pruN apache-mahout-distribution-0.11.2-orig/examples/bin/classify-20newsgroups.sh apache-mahout-distribution-0.11.2/examples/bin/classify-20newsgroups.sh ---- apache-mahout-distribution-0.11.2-orig/examples/bin/classify-20newsgroups.sh 2016-03-11 13:45:54.000000000 -0800 -+++ apache-mahout-distribution-0.11.2/examples/bin/classify-20newsgroups.sh 2016-04-04 16:50:04.158327000 -0700 -@@ -36,7 +36,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - algorithm=( cnaivebayes-MapReduce naivebayes-MapReduce cnaivebayes-Spark naivebayes-Spark sgd clean) - if [ -n "$1" ]; then - choice=$1 -@@ -105,7 +109,7 @@ if ( [ "x$alg" == "xnaivebayes-MapReduc - echo "Copying 20newsgroups data to HDFS" - set +e - $DFSRM ${WORK_DIR}/20news-all -- $DFS -mkdir ${WORK_DIR} -+ $DFS -mkdir -p ${WORK_DIR} - $DFS -mkdir ${WORK_DIR}/20news-all - set -e - if [ $HVERSION -eq "1" ] ; then -diff -pruN apache-mahout-distribution-0.11.2-orig/examples/bin/classify-wikipedia.sh apache-mahout-distribution-0.11.2/examples/bin/classify-wikipedia.sh ---- apache-mahout-distribution-0.11.2-orig/examples/bin/classify-wikipedia.sh 2016-03-11 13:45:54.000000000 -0800 -+++ apache-mahout-distribution-0.11.2/examples/bin/classify-wikipedia.sh 2016-04-04 16:50:04.177304000 -0700 -@@ -42,7 +42,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-wiki -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-wiki -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - algorithm=( CBayes BinaryCBayes clean) - if [ -n "$1" ]; then - choice=$1 -@@ -110,7 +114,7 @@ if [ "x$alg" == "xCBayes" ] || [ "x$alg" - echo "Copying wikipedia data to HDFS" - set +e - $DFSRM ${WORK_DIR}/wikixml -- $DFS -mkdir ${WORK_DIR} -+ $DFS -mkdir -p ${WORK_DIR} - set -e - $DFS -put ${WORK_DIR}/wikixml ${WORK_DIR}/wikixml - fi -diff -pruN apache-mahout-distribution-0.11.2-orig/examples/bin/cluster-reuters.sh apache-mahout-distribution-0.11.2/examples/bin/cluster-reuters.sh ---- apache-mahout-distribution-0.11.2-orig/examples/bin/cluster-reuters.sh 2016-03-11 13:45:54.000000000 -0800 -+++ apache-mahout-distribution-0.11.2/examples/bin/cluster-reuters.sh 2016-04-04 16:53:22.193090000 -0700 -@@ -43,7 +43,11 @@ if [ ! -e $MAHOUT ]; then - exit 1 - fi - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - - algorithm=( kmeans fuzzykmeans lda streamingkmeans clean) - if [ -n "$1" ]; then -@@ -98,7 +102,7 @@ if [ ! -e ${WORK_DIR}/reuters-out-seqdir - set +e - $DFSRM ${WORK_DIR}/reuters-sgm - $DFSRM ${WORK_DIR}/reuters-out -- $DFS -mkdir ${WORK_DIR}/ -+ $DFS -mkdir -p ${WORK_DIR}/ - $DFS -mkdir ${WORK_DIR}/reuters-sgm - $DFS -mkdir ${WORK_DIR}/reuters-out - $DFS -put ${WORK_DIR}/reuters-sgm ${WORK_DIR}/reuters-sgm -diff -pruN apache-mahout-distribution-0.11.2-orig/examples/bin/cluster-syntheticcontrol.sh apache-mahout-distribution-0.11.2/examples/bin/cluster-syntheticcontrol.sh ---- apache-mahout-distribution-0.11.2-orig/examples/bin/cluster-syntheticcontrol.sh 2016-03-11 13:45:54.000000000 -0800 -+++ apache-mahout-distribution-0.11.2/examples/bin/cluster-syntheticcontrol.sh 2016-04-04 16:50:04.201290000 -0700 -@@ -48,7 +48,11 @@ START_PATH=`pwd` - # Set commands for dfs - source ${START_PATH}/set-dfs-commands.sh - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - - echo "creating work directory at ${WORK_DIR}" - mkdir -p ${WORK_DIR} -diff -pruN apache-mahout-distribution-0.11.2-orig/examples/bin/factorize-movielens-1M.sh apache-mahout-distribution-0.11.2/examples/bin/factorize-movielens-1M.sh ---- apache-mahout-distribution-0.11.2-orig/examples/bin/factorize-movielens-1M.sh 2016-03-11 13:45:54.000000000 -0800 -+++ apache-mahout-distribution-0.11.2/examples/bin/factorize-movielens-1M.sh 2016-04-04 16:50:04.205282000 -0700 -@@ -43,7 +43,12 @@ fi - export MAHOUT_LOCAL=true - MAHOUT="$MAHOUT_HOME/bin/mahout" - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi -+ - echo "creating work directory at ${WORK_DIR}" - mkdir -p ${WORK_DIR}/movielens - -@@ -77,4 +82,4 @@ shuf ${WORK_DIR}/recommendations/part-m- - echo -e "\n\n" - - echo "removing work directory" --rm -rf ${WORK_DIR} -\ No newline at end of file -+rm -rf ${WORK_DIR} -diff -pruN apache-mahout-distribution-0.11.2-orig/examples/bin/factorize-netflix.sh apache-mahout-distribution-0.11.2/examples/bin/factorize-netflix.sh ---- apache-mahout-distribution-0.11.2-orig/examples/bin/factorize-netflix.sh 2016-03-11 13:45:54.000000000 -0800 -+++ apache-mahout-distribution-0.11.2/examples/bin/factorize-netflix.sh 2016-04-04 16:50:04.214270000 -0700 -@@ -45,7 +45,11 @@ fi - - MAHOUT="../../bin/mahout" - --WORK_DIR=/tmp/mahout-work-${USER} -+if [[ -z "$MAHOUT_WORK_DIR" ]]; then -+ WORK_DIR=/tmp/mahout-work-${USER} -+else -+ WORK_DIR=$MAHOUT_WORK_DIR -+fi - - START_PATH=`pwd` - diff --git a/patches/mahout/apache-mahout-distribution-0.12.0.patch b/patches/mahout/apache-mahout-distribution-0.12.0.patch deleted file mode 100644 index eeb06cfe8..000000000 --- a/patches/mahout/apache-mahout-distribution-0.12.0.patch +++ /dev/null @@ -1,12 +0,0 @@ -diff -pruN apache-mahout-distribution-0.12.0-orig/examples/bin/cluster-syntheticcontrol.sh apache-mahout-distribution-0.12.0/examples/bin/cluster-syntheticcontrol.sh ---- apache-mahout-distribution-0.12.0-orig/examples/bin/cluster-syntheticcontrol.sh 2016-04-11 05:21:52.000000000 -0700 -+++ apache-mahout-distribution-0.12.0/examples/bin/cluster-syntheticcontrol.sh 2016-05-25 15:07:18.310822000 -0700 -@@ -76,7 +76,7 @@ if [ "$HADOOP_HOME" != "" ] && [ "$MAHOU - echo "Uploading Synthetic control data to HDFS" - $DFSRM ${WORK_DIR}/testdata - $DFS -mkdir ${WORK_DIR}/testdata -- $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata -+ $DFS -put ${WORK_DIR}/synthetic_control.data testdata - echo "Successfully Uploaded Synthetic control data to HDFS " - - ../../bin/mahout org.apache.mahout.clustering.syntheticcontrol."${clustertype}".Job diff --git a/patches/mahout/apache-mahout-distribution-0.12.1.patch b/patches/mahout/apache-mahout-distribution-0.12.1.patch deleted file mode 100644 index 7c63a6f39..000000000 --- a/patches/mahout/apache-mahout-distribution-0.12.1.patch +++ /dev/null @@ -1,12 +0,0 @@ -diff -pruN apache-mahout-distribution-0.12.1-orig/examples/bin/cluster-syntheticcontrol.sh apache-mahout-distribution-0.12.1/examples/bin/cluster-syntheticcontrol.sh ---- apache-mahout-distribution-0.12.1-orig/examples/bin/cluster-syntheticcontrol.sh 2016-05-18 14:45:00.000000000 -0700 -+++ apache-mahout-distribution-0.12.1/examples/bin/cluster-syntheticcontrol.sh 2016-05-25 15:07:44.949157000 -0700 -@@ -76,7 +76,7 @@ if [ "$HADOOP_HOME" != "" ] && [ "$MAHOU - echo "Uploading Synthetic control data to HDFS" - $DFSRM ${WORK_DIR}/testdata - $DFS -mkdir ${WORK_DIR}/testdata -- $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata -+ $DFS -put ${WORK_DIR}/synthetic_control.data testdata - echo "Successfully Uploaded Synthetic control data to HDFS " - - ../../bin/mahout org.apache.mahout.clustering.syntheticcontrol."${clustertype}".Job diff --git a/submission-scripts/script-lsf-mpirun/magpie.lsf-mpirun b/submission-scripts/script-lsf-mpirun/magpie.lsf-mpirun index 8ecf38706..8cdd10486 100644 --- a/submission-scripts/script-lsf-mpirun/magpie.lsf-mpirun +++ b/submission-scripts/script-lsf-mpirun/magpie.lsf-mpirun @@ -88,8 +88,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # # "pig" - Run a job according to the settings of PIG_JOB. # -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# # "spark" - Run a job according to the settings of SPARK_JOB. # # "kafka" - Run a job according to the settings of KAFKA_JOB. @@ -111,7 +109,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # For Hbase, testall will run performanceeval # For Phoenix, testall will run performanceeval # For Pig, testall will run testpig -# For Mahout, testall will run clustersyntheticcontrol # For Spark, testall will run sparkpi # For Kafka, testall will run performance # For Zeppelin, testall will run checkzeppelinup @@ -944,60 +941,6 @@ export PIG_JOB="testpig" # # export PIG_OPTS="-Djava.io.tmpdir=${PIG_LOCAL_JOB_DIR}/tmp" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=no - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - ############################################################################ # Hbase Core Configurations ############################################################################ diff --git a/submission-scripts/script-lsf-mpirun/magpie.lsf-mpirun-hadoop-and-mahout b/submission-scripts/script-lsf-mpirun/magpie.lsf-mpirun-hadoop-and-mahout deleted file mode 100644 index 89fcd9fcb..000000000 --- a/submission-scripts/script-lsf-mpirun/magpie.lsf-mpirun-hadoop-and-mahout +++ /dev/null @@ -1,914 +0,0 @@ -#!/bin/sh -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -############################################################################ -# LSF Customizations -############################################################################ - -# Node count. Node count should include one node for the -# head/management/master node. For example, if you want 8 compute -# nodes to process data, specify 9 nodes below. -# -# If including Zookeeper, include expected Zookeeper nodes. For -# example, if you want 8 Hadoop compute nodes and 3 Zookeeper nodes, -# specify 12 nodes (1 master, 8 Hadoop, 3 Zookeeper) -# -# Also take into account additional nodes needed for other services. -# -# Many of the below can be configured on the command line. If you are -# more comfortable specifying these on the command line, feel free to -# delete the customizations below. - -#BSUB -n -#BSUB -o "lsf-%J.out" - -# Note defaults of MAGPIE_STARTUP_TIME & MAGPIE_SHUTDOWN_TIME, this -# timelimit should be a fair amount larger than them combined. -#BSUB -W - -# Job name. This will be used in naming directories for the job. -#BSUB -J - -# Queue to launch job in -#BSUB -q - -## LSF Values -# Generally speaking, don't touch the following, misc other configuration - -#BSUB -R "span[ptile=1]" -#BSUB -x - -# Need to tell Magpie how you are submitting this job -export MAGPIE_SUBMISSION_TYPE="lsfmpirun" - -############################################################################ -# Magpie Configurations -############################################################################ - -# Directory your launching scripts/files are stored -# -# Normally an NFS mount, someplace magpie can be reached on all nodes. -export MAGPIE_SCRIPTS_HOME="${HOME}/magpie" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" - -# Magpie job type -# -# "hadoop" - Run a job according to the settings of HADOOP_JOB. -# -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# -# "testall" - Run a job that runs all basic sanity tests for all -# software that is configured to be setup. This is a good -# way to sanity check that everything has been setup -# correctly and the way you like. -# -# For Hadoop, testall will run terasort -# For Mahout, testall will run clustersyntheticcontrol -# -# "script" - Run arbitraty script, as specified by MAGPIE_JOB_SCRIPT. -# You can find example job scripts in examples/. -# -# "interactive" - manually interact with job run to submit jobs, -# peruse data (e.g. HDFS), move data, etc. See job -# output for instructions to access your job -# allocation. -# -# "setuponly" - do not launch any daemons or services, only setup -# configuration files. Useful for debugging or -# development. -# -export MAGPIE_JOB_TYPE="mahout" - -# Specify script and arguments to execute for "script" mode in -# MAGPIE_JOB_TYPE -# -# export MAGPIE_JOB_SCRIPT="${HOME}/my-job-script" - -# Specify script startup / shutdown time window -# -# Specifies the amount of time to give startup / shutdown activities a -# chance to succeed before Magpie will give up (or in the case of -# shutdown, when the resource manager/scheduler may kill the running -# job). Defaults to 30 minutes for startup, 30 minutes for shutdown. -# -# The startup time in particular may need to be increased if you have -# a large amount of data. As an example, HDFS may need to spend a -# significant amount of time determine all of the blocks in HDFS -# before leaving safemode. -# -# The stop time in particular may need to be increased if you have a -# large amount of cleanup to be done. HDFS will save its NameSpace -# before shutting down. Hbase will do a compaction before shutting -# down. -# -# The startup & shutdown window must together be smaller than the -# timelimit specified for the job. -# -# MAGPIE_STARTUP_TIME and MAGPIE_SHUTDOWN_TIME at minimum must be 5 -# minutes. If MAGPIE_POST_JOB_RUN is specified below, -# MAGPIE_SHUTDOWN_TIME must be at minimum 10 minutes. -# -# export MAGPIE_STARTUP_TIME=30 -# export MAGPIE_SHUTDOWN_TIME=30 - -# Magpie One Time Run -# -# Normally, Magpie assumes that when a user runs a job, data created -# and stored within that job will be desired to be accessed again. For -# example, data created and stored within HDFS will be accessed again. -# -# Under a number of scenarios, this may not be desired. For example -# during testing. -# -# To improve useability and performance, setting MAGPIE_ONE_TIME_RUN -# below to yes will have two effects on the Magpie job. -# -# 1) A number of data paths (such as for HDFS) will be put into unique -# paths for this job. Therefore, no other job should be able to -# access the data again. This is particularly useful if you wish -# to run performance tests with this job script over and over -# again. -# -# Magpie will not remove data that was written, so be sure to clean up -# your directories later. -# -# 2) In order to improve job throughout, Magpie will take shortcuts by -# not properly tearing down the job. As data corruption should not be -# a concern on job teardown, the job can complete more quickly. -# -# export MAGPIE_ONE_TIME_RUN=yes - -# Convenience Scripts -# -# Specify script to be executed to before / after your job. It is run -# on all nodes. -# -# Typically the pre-job script is used to set something up or get -# debugging info. It can also be used to determine if system -# conditions meet the expectations of your job. The primary job -# running script (magpie-run) will not be executed if the -# MAGPIE_PRE_JOB_RUN exits with a non-zero exit code. -# -# The post-job script is typically used for cleaning up something or -# gathering info (such as logs) for post-debugging/analysis. If it is -# set, MAGPIE_SHUTDOWN_TIME above must be > 5. -# -# See example magpie-example-pre-job-script and -# magpie-example-post-job-script for ideas of what you can do w/ these -# scripts -# -# Multiple scripts can be specified separated by comma. Arguments can -# be passed to scripts as well. -# -# A number of convenient scripts are available in the -# ${MAGPIE_SCRIPTS_HOME}/scripts directory. -# -# export MAGPIE_PRE_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" -# -# Similar to the MAGPIE_PRE_JOB_RUN and MAGPIE_POST_JOB_RUN, scripts can be -# run after the stack is setup but prior to the script or interactive mode -# begins. This enables frontends and other processes that depend on the stack -# to be started up and torn down. In similar fashion the cleanup will be done -# immediately after the script or interactive mode exits before the stack is -# shutdown. -# -# export MAGPIE_PRE_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" - -# Environment Variable Script -# -# When working with Magpie interactively by logging into the master -# node of your job allocation, many environment variables may need to -# be set. For example, environment variables for config file -# directories (e.g. HADOOP_CONF_DIR, HBASE_CONF_DIR, etc.) and home -# directories (e.g. HADOOP_HOME, HBASE_HOME, etc.) and more general -# environment variables (e.g. JAVA_HOME) may need to be set before you -# begin interacting with your big data setup. -# -# The standard job output from Magpie provides instructions on all the -# environment variables typically needed to interact with your job. -# However, this can be tedious if done by hand. -# -# If the environment variable specified below is set, Magpie will -# create the file and put into it every environment variable that -# would be useful when running your job interactively. That way, it -# can be sourced easily if you will be running your job interactively. -# It can also be loaded or used by other job scripts. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT="${HOME}/my-job-env" - -# Environment Variable Shell Type -# -# Magpie outputs environment variables in help output and -# MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT based on your SHELL environment -# variable. -# -# If you would like to output in a different shell type (perhaps you -# have programmed scripts in a different shell), specify that shell -# here. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT_SHELL="/bin/bash" - -# Remote Shell -# -# Magpie requires a passwordless remote shell command to launch -# necessary daemons across your job allocation. Magpie defaults to -# ssh, but it may be an alternate command in some environments. An -# alternate ssh-equivalent remote command can be specified by setting -# MAGPIE_REMOTE_CMD below. -# -# If using ssh, Magpie requires keys to be setup ahead of time so it -# can be executed without passwords. -# -# Specify options to the remote shell command if necessary. -# -# export MAGPIE_REMOTE_CMD="ssh" -# export MAGPIE_REMOTE_CMD_OPTS="" - -############################################################################ -# General Configuration -############################################################################ - -# Necessary for most projects -export JAVA_HOME="/usr/lib/jvm/jre-1.7.0/" - -############################################################################ -# Hadoop Core Configurations -############################################################################ - -# Should Hadoop be run -# -# Specify yes or no. Defaults to no. -# -export HADOOP_SETUP=yes - -# Set Hadoop Setup Type -# -# Will inform scripts on how to setup config files and what daemons to -# launch/setup. -# -# MR - Launch HDFS and Yarn -# YARN - Enable only Yarn -# HDFS - Enable only HDFS -# -# HDFS only may be useful when you want to use HDFS with other big -# data software, such as Hbase, and do not care for MapReduce or Yarn. -# It only works with HDFS based HADOOP_FILESYSTEM_MODE, such as -# "hdfs", "hdfsoverlustre", or "hdfsovernetworkfs". -# -# YARN only may be useful when you need Yarn setup for scheduling, but -# will not be using HDFS. For example, you may be reading from a -# networked file system directly. This option requires -# HADOOP_FILESYSTEM_MODE to 'rawnetworkfs'. -# -export HADOOP_SETUP_TYPE="MR" - -# Version -# -# Make sure the version for Mapreduce version 1 or 2 matches whatever -# you set in HADOOP_SETUP_TYPE -# -export HADOOP_VERSION="2.9.1" - -# Path to your Hadoop build/binaries -# -# Make sure the build for MapReduce or HDFS version 1 or 2 matches -# whatever you set in HADOOP_SETUP_TYPE. -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -export HADOOP_HOME="${HOME}/hadoop-${HADOOP_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -# This will not be used for storing intermediate files or -# distributed cache files. See HADOOP_LOCALSTORE above for that. -# -export HADOOP_LOCAL_DIR="/tmp/${USER}/hadoop" - -# Directory where alternate Hadoop configuration templates are stored -# -# If you wish to tweak the configuration files used by Magpie, set -# HADOOP_CONF_FILES below, copy configuration templates from -# $MAGPIE_SCRIPTS_HOME/conf/hadoop into HADOOP_CONF_FILES, and modify -# as you desire. Magpie will still use configuration files in -# $MAGPIE_SCRIPTS_HOME/conf/hadoop if any of the files it needs are -# not found in HADOOP_CONF_FILES. -# -# export HADOOP_CONF_FILES="${HOME}/myconf" - -# Daemon Heap Max -# -# Heap maximum for Hadoop daemons (i.e. Resource Manger, NodeManager, -# DataNode, History Server, etc.), specified in megs. Special case -# for Namenode, see below. -# -# If not specified, defaults to Hadoop default of 1000 -# -# May need to be increased if you are scaling large, get OutofMemory -# errors, or perhaps have a lot of cores on a node. -# -# export HADOOP_DAEMON_HEAP_MAX=2000 - -# Daemon Namenode Heap Max -# -# Heap maximum for Hadoop Namenode daemons specified in megs. -# -# If not specified, defaults to HADOOP_DAEMON_HEAP_MAX above. -# -# Unlike most Hadoop daemons, namenode may need more memory if there -# are a very large number of files in your HDFS setup. A general rule -# of thumb is a 1G heap for each 100T of data. -# -# export HADOOP_NAMENODE_DAEMON_HEAP_MAX=2000 - -# Environment Extra -# -# Specify extra environment information that should be passed into -# Hadoop. This file will simply be appended into the hadoop-env.sh -# and (if appropriate) yarn-env.sh. -# -# By default, a reasonable estimate for max user processes and open -# file descriptors will be calculated and put into hadoop-env.sh and -# (if appropriate) yarn-env.sh. However, it's always possible they may -# need to be set differently. Everyone's cluster/situation can be -# slightly different. -# -# See the example example-environment-extra extra for examples on -# what you can/should do with adding extra environment settings. -# -# export HADOOP_ENVIRONMENT_EXTRA_PATH="${HOME}/hadoop-my-environment" - -############################################################################ -# Hadoop Job/Run Configurations -############################################################################ - -# Set hadoop job for MAGPIE_JOB_TYPE = hadoop -# -# "terasort" - run terasort. Useful for making sure things are setup -# the way you like. -# -# There are additional configuration options for this -# listed below. -# -# "upgradehdfs" - upgrade your version of HDFS. Most notably this is -# used when you are switching to a newer Hadoop -# version and the HDFS version would be inconsistent -# without upgrading. Only works with HDFS versions >= -# 2.2.0. -# -# Please set your job time to be quite large when -# performing this upgrade. If your job times out and -# this process does not complete fully, it can leave -# HDFS in a bad state. -# -# Beware, once you upgrade it'll be difficult to rollback. -# -# "decommissionhdfsnodes" - decrease your HDFS over Lustre or HDFS -# over NetworkFS node size just as if you -# were on a cluster with local disk. Launch -# your job with the current present node -# size and set -# HADOOP_DECOMMISSION_HDFS_NODE_SIZE to the -# smaller node size to decommission into. -# Only works on Hadoop versions >= 2.3.0. -# -# Please set your job time to be quite large -# when performing this update. If your job -# times out and this process does not -# complete fully, it can leave HDFS in a bad -# state. -# -export HADOOP_JOB="terasort" - -# Tasks per Node -# -# If not specified, a reasonable estimate will be calculated based on -# number of CPUs on the system. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_MAX_TASKS_PER_NODE=8 - -# Default Map tasks for Job -# -# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE * compute -# nodes. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_MAP_TASKS=8 - -# Default Reduce tasks for Job -# -# If not specified, defaults to # compute nodes (i.e. 1 reducer per -# node) -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_REDUCE_TASKS=8 - -# Heap size for JVM -# -# Specified in M. If not specified, a reasonable estimate will be -# calculated based on total memory available and number of CPUs on the -# system. -# -# HADOOP_CHILD_MAP_HEAPSIZE and HADOOP_CHILD_REDUCE_HEAPSIZE are for -# Yarn -# -# If HADOOP_CHILD_MAP_HEAPSIZE is not specified, it is assumed to be -# HADOOP_CHILD_HEAPSIZE. -# -# If HADOOP_CHILD_REDUCE_HEAPSIZE is not specified, it is assumed to -# be 2X the HADOOP_CHILD_MAP_HEAPSIZE. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_CHILD_HEAPSIZE=2048 -# export HADOOP_CHILD_MAP_HEAPSIZE=2048 -# export HADOOP_CHILD_REDUCE_HEAPSIZE=4096 - -# Container Buffer -# -# Specify the amount of overhead each Yarn container will have over -# the heap size. Specified in M. If not specified, a reasonable -# estimate will be calculated based on total memory available. -# -# export HADOOP_CHILD_MAP_CONTAINER_BUFFER=256 -# export HADOOP_CHILD_REDUCE_CONTAINER_BUFFER=512 - -# Mapreduce Slowstart, indicating percent of maps that should complete -# before reducers begin. -# -# If not specified, defaults to 0.05 -# -# export HADOOP_MAPREDUCE_SLOWSTART=0.05 - -# Container Memory -# -# Memory on compute nodes for containers. Typically "nice-chunk" less -# than actual memory on machine, b/c machine needs memory for its own -# needs (kernel, daemons, etc.). Specified in megs. -# -# If not specified, a reasonable estimate will be calculated based on -# total memory on the system. -# -# export YARN_RESOURCE_MEMORY=32768 - -# Check Memory Limits -# -# Should physical and virtual memory limits be enforced for containers. -# This can be helpful in cases where the OS (Centos/Redhat) is aggressive -# at allocating virtual memory and causes the vmem-to-pmem ratio to be -# hit. Defaults to true -# -# export YARN_VMEM_CHECK="false" -# export YARN_PMEM_CHECK="false" - -# Compression -# -# Should compression of outputs and intermediate data be enabled. -# Specify yes or no. Defaults to no. -# -# Effectively, is time spend compressing data going to save you time -# on I/O. Sometimes yes, sometimes no. -# -# export HADOOP_COMPRESSION=yes - -# IO Sort Factors + MB -# -# The number of streams of files to sort while reducing and the memory -# amount to use while sorting. This is a quite advanced mechanism -# taking into account many factors. If not specified, some reasonable -# number will be calculated. -# -# export HADOOP_IO_SORT_FACTOR=10 -# export HADOOP_IO_SORT_MB=100 - -# Parallel Copies -# -# The default number of parallel transfers run by reduce during the -# copy(shuffle) phase. If not specified, some reasonable number will -# be calculated. -# export HADOOP_PARALLEL_COPIES=10 - -############################################################################ -# Hadoop Terasort Configurations -############################################################################ - -# Terasort size -# -# For "terasort" mode. -# -# Specify terasort size in units of 100. Specify 10000000000 for -# terabyte, for actual benchmarking -# -# Specify something small, for basic sanity tests. -# -# Defaults to 50000000. -# -# export HADOOP_TERASORT_SIZE=50000000 - -# Terasort map count -# -# For "terasort" mode during the teragen of data. -# -# If not specified, will be computed to a reasonable number given -# HADOOP_TERASORT_SIZE and the block size of the the filesyste you are -# using (e.g. for HDFS the HADOOP_HDFS_BLOCKSIZE) -# -# export HADOOP_TERAGEN_MAP_COUNT=4 - -# Terasort reducer count -# -# For "terasort" mode during the actual terasort of data. -# -# If not specified, will be compute node count * 2. -# -# export HADOOP_TERASORT_REDUCER_COUNT=4 - -# Terasort cache -# -# For "real benchmarking" you should flush page cache between a -# teragen and a terasort. You can disable this for sanity runs/tests -# to make things go faster. Specify yes or no. Defaults to yes. -# -# export HADOOP_TERASORT_CLEAR_CACHE=no - -# Terasort output replication count -# -# For "terasort" mode during the actual terasort of data -# -# In some circumstances, replication of the output from the terasort -# must be equal to the replication of data for the input. In other -# cases it can be less. The below can be adjusted to tweak for -# benchmarking purposes. -# -# If not specified, defaults to Terasort default, which is 1 in most -# versions of Hadoop -# -# export HADOOP_TERASORT_OUTPUT_REPLICATION=1 - -# Terachecksum -# -# For "terasort" mode after the teragen of data -# -# After executing the teragen, run terachecksum to calculate a checksum of -# the input. -# -# If both this and HADOOP_TERASORT_RUN_TERAVALIDATE are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERACHECKSUM=no - -# Teravalidate -# -# For "terasort" mode after the actual terasort of data -# -# After executing the sort, run teravalidate to validate the sorted data. -# -# If both this and HADOOP_TERASORT_RUN_TERACHECKSUM are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERAVALIDATE=no - -############################################################################ -# Hadoop Decommission HDFS Nodes Configurations -############################################################################ - -# Specify decommission node size for "decommissionhdfsnodes" mode -# -# For example, if your current HDFS node size is 16, your job size is -# likely 17 nodes (including the master). If you wish to decommission -# to 8 data nodes (job size of 9 nodes total), set this to 8. -# -# export HADOOP_DECOMMISSION_HDFS_NODE_SIZE=8 - -############################################################################ -# Hadoop Filesystem Mode Configurations -############################################################################ - -# Set how the filesystem should be setup -# -# "hdfs" - Normal straight up HDFS if you have local disk in your -# cluster. This option is primarily for benchmarking and -# caching, but probably shouldn't be used in the general case. -# -# Be careful running this in a cluster environment. The next -# time you execute your job, if a different set of nodes are -# allocated to you, the HDFS data you wrote from a previous -# job may not be there. Specifying specific nodes to use in -# your job submission (e.g. --nodelist in sbatch) may be a -# way to alleviate this. -# -# User must set HADOOP_HDFS_PATH below. -# -# "hdfsoverlustre" - HDFS over Lustre. See README for description. -# -# User must set HADOOP_HDFSOVERLUSTRE_PATH below. -# -# "hdfsovernetworkfs" - HDFS over Network FS. Identical to HDFS over -# Lustre, but filesystem agnostic. -# -# User must set HADOOP_HDFSOVERNETWORKFS_PATH below. -# -# "rawnetworkfs" - Use Hadoop RawLocalFileSystem (i.e. file: scheme), -# to use networked file system directly. It could be a -# Lustre mount or NFS mount. Whatever you please. -# -# User must set HADOOP_RAWNETWORKFS_PATH below. -# -export HADOOP_FILESYSTEM_MODE="hdfsoverlustre" - -# Local Filesystem BlockSize -# -# This configuration is the blocksize hadoop will use when doing I/O -# to a local filesystem. It is used by HDFS when reading from the -# underlying filesystem. It is also used with -# HADOOP_FILESYSTEM_MODE="rawnetworkfs". -# -# Commonly 33554432, 67108864, 134217728 (i.e. 32m, 64m, 128m) -# -# If not specified, defaults to 33554432 -# -# export HADOOP_LOCAL_FILESYSTEM_BLOCKSIZE=33554432 - -# HDFS Replication -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# HDFS commonly uses 3. When doing HDFS over Lustre/NetworkFS, higher -# replication can also help with resilience if nodes fail. You may -# wish to set this to < 3 to save space. -# -# If not specified, defaults to 3 -# -# export HADOOP_HDFS_REPLICATION=3 - -# HDFS Block Size -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# Commonly 134217728, 268435456, 536870912 (i.e. 128m, 256m, 512m) -# -# If not specified, defaults to 134217728 -# -# export HADOOP_HDFS_BLOCKSIZE=134217728 - -# Path for HDFS when using local disk -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data and HDFS. The first -# path will also store daemon data, such as namenode or jobtracker -# data. -# -export HADOOP_HDFS_PATH="/ssd/${USER}/hdfs" - -# HDFS cleanup -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# After your job has completed, if HADOOP_HDFS_PATH_CLEAR is set to -# yes, Magpie will do a rm -rf on HADOOP_HDFS_PATH. -# -# This is particularly useful when doing normal HDFS on local storage. -# On your next job run, you may not be able to get the nodes you want -# on your next run. So you may want to clean up your work before the -# next user uses the node. -# -# export HADOOP_HDFS_PATH_CLEAR="yes" - -# Lustre path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERLUSTRE_PATH="/lustre/${USER}/hdfsoverlustre/" - -# HDFS over Lustre ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERLUSTRE_REMOVE_LOCKS=yes - -# Networkfs path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERNETWORKFS_PATH="/networkfs/${USER}/hdfsovernetworkfs/" - -# HDFS over Networkfs ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERNETWORKFS_REMOVE_LOCKS=yes - -# Path for rawnetworkfs -# -# This is used with HADOOP_FILESYSTEM_MODE="rawnetworkfs" -# -export HADOOP_RAWNETWORKFS_PATH="/lustre/${USER}/rawnetworkfs/" - -# If you have a local SSD or NVRAM, performance may be better to store -# intermediate data on it rather than Lustre or some other networked -# filesystem. If the below environment variable is specified, local -# intermediate data will be stored in the specified directory. -# Otherwise it will go to an appropriate directory in Lustre/networked -# FS. -# -# Be wary, local SSDs/NVRAM stores may have less space than HDDs or -# networked file systems. It can be easy to run out of space. -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data. -# -# export HADOOP_LOCALSTORE="/ssd/${USER}/localstore/" - -# HADOOP_LOCALSTORE_CLEAR -# -# After your job has completed, if HADOOP_LOCALSTORE_CLEAR is set to -# yes, Magpie will do a rm -rf on all directories in -# HADOOP_LOCALSTORE. This is particularly useful if the localstore -# directory is on local storage and you want to clean up your work -# before the next user uses the node. -# -# export HADOOP_LOCALSTORE_CLEAR="yes" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=yes - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - -############################################################################ -# Run Job -############################################################################ - -# Set alternate mpirun options here -# MPIRUN_OPTIONS="-genvall -genv MV2_USE_APM 0" - -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-check-inputs -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-setup-core -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-setup-projects -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-setup-post -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-pre-run -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-run -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-cleanup -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-post-run diff --git a/submission-scripts/script-msub-slurm-srun/magpie.msub-slurm-srun b/submission-scripts/script-msub-slurm-srun/magpie.msub-slurm-srun index b303d26fa..614d311e6 100644 --- a/submission-scripts/script-msub-slurm-srun/magpie.msub-slurm-srun +++ b/submission-scripts/script-msub-slurm-srun/magpie.msub-slurm-srun @@ -95,8 +95,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # # "pig" - Run a job according to the settings of PIG_JOB. # -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# # "spark" - Run a job according to the settings of SPARK_JOB. # # "kafka" - Run a job according to the settings of KAFKA_JOB. @@ -118,7 +116,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # For Hbase, testall will run performanceeval # For Phoenix, testall will run performanceeval # For Pig, testall will run testpig -# For Mahout, testall will run clustersyntheticcontrol # For Spark, testall will run sparkpi # For Kafka, testall will run performance # For Zeppelin, testall will run checkzeppelinup @@ -951,60 +948,6 @@ export PIG_JOB="testpig" # # export PIG_OPTS="-Djava.io.tmpdir=${PIG_LOCAL_JOB_DIR}/tmp" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=no - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - ############################################################################ # Hbase Core Configurations ############################################################################ diff --git a/submission-scripts/script-msub-slurm-srun/magpie.msub-slurm-srun-hadoop-and-mahout b/submission-scripts/script-msub-slurm-srun/magpie.msub-slurm-srun-hadoop-and-mahout deleted file mode 100644 index f26db3fde..000000000 --- a/submission-scripts/script-msub-slurm-srun/magpie.msub-slurm-srun-hadoop-and-mahout +++ /dev/null @@ -1,918 +0,0 @@ -#!/bin/sh -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -############################################################################ -# Moab Customizations -############################################################################ - -# Node count. Node count should include one node for the -# head/management/master node. For example, if you want 8 compute -# nodes to process data, specify 9 nodes below. -# -# If including Zookeeper, include expected Zookeeper nodes. For -# example, if you want 8 Hadoop compute nodes and 3 Zookeeper nodes, -# specify 12 nodes (1 master, 8 Hadoop, 3 Zookeeper) -# -# Also take into account additional nodes needed for other services. -# -# Many of the below can be configured on the command line. If you are -# more comfortable specifying these on the command line, feel free to -# delete the customizations below. - -#MSUB -l nodes= - -#MSUB -o moab-%j.out - -#MSUB -l partition= - -#MSUB -q - -# Note defaults of MAGPIE_STARTUP_TIME & MAGPIE_SHUTDOWN_TIME, the -# walltime should be a fair amount larger than them combined. - -#MSUB -l walltime= - -#MSUB -l resfailpolicy=ignore - -#MSUB -V - -export MOAB_JOBNAME="" - -export SLURM_TASKS_PER_NODE=1 -export SBATCH_EXCLUSIVE="yes" - -# Need to tell Magpie how you are submitting this job -# -# IMPORTANT: This msub file assumes Slurm is the underlying resource -# manager. If it is not, a new Magpie submission type should be added -# into Magpie. -export MAGPIE_SUBMISSION_TYPE="msubslurmsrun" - -############################################################################ -# Magpie Configurations -############################################################################ - -# Directory your launching scripts/files are stored -# -# Normally an NFS mount, someplace magpie can be reached on all nodes. -export MAGPIE_SCRIPTS_HOME="${HOME}/magpie" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" - -# Magpie job type -# -# "hadoop" - Run a job according to the settings of HADOOP_JOB. -# -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# -# "testall" - Run a job that runs all basic sanity tests for all -# software that is configured to be setup. This is a good -# way to sanity check that everything has been setup -# correctly and the way you like. -# -# For Hadoop, testall will run terasort -# For Mahout, testall will run clustersyntheticcontrol -# -# "script" - Run arbitraty script, as specified by MAGPIE_JOB_SCRIPT. -# You can find example job scripts in examples/. -# -# "interactive" - manually interact with job run to submit jobs, -# peruse data (e.g. HDFS), move data, etc. See job -# output for instructions to access your job -# allocation. -# -# "setuponly" - do not launch any daemons or services, only setup -# configuration files. Useful for debugging or -# development. -# -export MAGPIE_JOB_TYPE="mahout" - -# Specify script and arguments to execute for "script" mode in -# MAGPIE_JOB_TYPE -# -# export MAGPIE_JOB_SCRIPT="${HOME}/my-job-script" - -# Specify script startup / shutdown time window -# -# Specifies the amount of time to give startup / shutdown activities a -# chance to succeed before Magpie will give up (or in the case of -# shutdown, when the resource manager/scheduler may kill the running -# job). Defaults to 30 minutes for startup, 30 minutes for shutdown. -# -# The startup time in particular may need to be increased if you have -# a large amount of data. As an example, HDFS may need to spend a -# significant amount of time determine all of the blocks in HDFS -# before leaving safemode. -# -# The stop time in particular may need to be increased if you have a -# large amount of cleanup to be done. HDFS will save its NameSpace -# before shutting down. Hbase will do a compaction before shutting -# down. -# -# The startup & shutdown window must together be smaller than the -# timelimit specified for the job. -# -# MAGPIE_STARTUP_TIME and MAGPIE_SHUTDOWN_TIME at minimum must be 5 -# minutes. If MAGPIE_POST_JOB_RUN is specified below, -# MAGPIE_SHUTDOWN_TIME must be at minimum 10 minutes. -# -# export MAGPIE_STARTUP_TIME=30 -# export MAGPIE_SHUTDOWN_TIME=30 - -# Magpie One Time Run -# -# Normally, Magpie assumes that when a user runs a job, data created -# and stored within that job will be desired to be accessed again. For -# example, data created and stored within HDFS will be accessed again. -# -# Under a number of scenarios, this may not be desired. For example -# during testing. -# -# To improve useability and performance, setting MAGPIE_ONE_TIME_RUN -# below to yes will have two effects on the Magpie job. -# -# 1) A number of data paths (such as for HDFS) will be put into unique -# paths for this job. Therefore, no other job should be able to -# access the data again. This is particularly useful if you wish -# to run performance tests with this job script over and over -# again. -# -# Magpie will not remove data that was written, so be sure to clean up -# your directories later. -# -# 2) In order to improve job throughout, Magpie will take shortcuts by -# not properly tearing down the job. As data corruption should not be -# a concern on job teardown, the job can complete more quickly. -# -# export MAGPIE_ONE_TIME_RUN=yes - -# Convenience Scripts -# -# Specify script to be executed to before / after your job. It is run -# on all nodes. -# -# Typically the pre-job script is used to set something up or get -# debugging info. It can also be used to determine if system -# conditions meet the expectations of your job. The primary job -# running script (magpie-run) will not be executed if the -# MAGPIE_PRE_JOB_RUN exits with a non-zero exit code. -# -# The post-job script is typically used for cleaning up something or -# gathering info (such as logs) for post-debugging/analysis. If it is -# set, MAGPIE_SHUTDOWN_TIME above must be > 5. -# -# See example magpie-example-pre-job-script and -# magpie-example-post-job-script for ideas of what you can do w/ these -# scripts -# -# Multiple scripts can be specified separated by comma. Arguments can -# be passed to scripts as well. -# -# A number of convenient scripts are available in the -# ${MAGPIE_SCRIPTS_HOME}/scripts directory. -# -# export MAGPIE_PRE_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" -# -# Similar to the MAGPIE_PRE_JOB_RUN and MAGPIE_POST_JOB_RUN, scripts can be -# run after the stack is setup but prior to the script or interactive mode -# begins. This enables frontends and other processes that depend on the stack -# to be started up and torn down. In similar fashion the cleanup will be done -# immediately after the script or interactive mode exits before the stack is -# shutdown. -# -# export MAGPIE_PRE_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" - -# Environment Variable Script -# -# When working with Magpie interactively by logging into the master -# node of your job allocation, many environment variables may need to -# be set. For example, environment variables for config file -# directories (e.g. HADOOP_CONF_DIR, HBASE_CONF_DIR, etc.) and home -# directories (e.g. HADOOP_HOME, HBASE_HOME, etc.) and more general -# environment variables (e.g. JAVA_HOME) may need to be set before you -# begin interacting with your big data setup. -# -# The standard job output from Magpie provides instructions on all the -# environment variables typically needed to interact with your job. -# However, this can be tedious if done by hand. -# -# If the environment variable specified below is set, Magpie will -# create the file and put into it every environment variable that -# would be useful when running your job interactively. That way, it -# can be sourced easily if you will be running your job interactively. -# It can also be loaded or used by other job scripts. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT="${HOME}/my-job-env" - -# Environment Variable Shell Type -# -# Magpie outputs environment variables in help output and -# MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT based on your SHELL environment -# variable. -# -# If you would like to output in a different shell type (perhaps you -# have programmed scripts in a different shell), specify that shell -# here. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT_SHELL="/bin/bash" - -# Remote Shell -# -# Magpie requires a passwordless remote shell command to launch -# necessary daemons across your job allocation. Magpie defaults to -# ssh, but it may be an alternate command in some environments. An -# alternate ssh-equivalent remote command can be specified by setting -# MAGPIE_REMOTE_CMD below. -# -# If using ssh, Magpie requires keys to be setup ahead of time so it -# can be executed without passwords. -# -# Specify options to the remote shell command if necessary. -# -# export MAGPIE_REMOTE_CMD="ssh" -# export MAGPIE_REMOTE_CMD_OPTS="" - -############################################################################ -# General Configuration -############################################################################ - -# Necessary for most projects -export JAVA_HOME="/usr/lib/jvm/jre-1.7.0/" - -############################################################################ -# Hadoop Core Configurations -############################################################################ - -# Should Hadoop be run -# -# Specify yes or no. Defaults to no. -# -export HADOOP_SETUP=yes - -# Set Hadoop Setup Type -# -# Will inform scripts on how to setup config files and what daemons to -# launch/setup. -# -# MR - Launch HDFS and Yarn -# YARN - Enable only Yarn -# HDFS - Enable only HDFS -# -# HDFS only may be useful when you want to use HDFS with other big -# data software, such as Hbase, and do not care for MapReduce or Yarn. -# It only works with HDFS based HADOOP_FILESYSTEM_MODE, such as -# "hdfs", "hdfsoverlustre", or "hdfsovernetworkfs". -# -# YARN only may be useful when you need Yarn setup for scheduling, but -# will not be using HDFS. For example, you may be reading from a -# networked file system directly. This option requires -# HADOOP_FILESYSTEM_MODE to 'rawnetworkfs'. -# -export HADOOP_SETUP_TYPE="MR" - -# Version -# -# Make sure the version for Mapreduce version 1 or 2 matches whatever -# you set in HADOOP_SETUP_TYPE -# -export HADOOP_VERSION="2.9.1" - -# Path to your Hadoop build/binaries -# -# Make sure the build for MapReduce or HDFS version 1 or 2 matches -# whatever you set in HADOOP_SETUP_TYPE. -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -export HADOOP_HOME="${HOME}/hadoop-${HADOOP_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -# This will not be used for storing intermediate files or -# distributed cache files. See HADOOP_LOCALSTORE above for that. -# -export HADOOP_LOCAL_DIR="/tmp/${USER}/hadoop" - -# Directory where alternate Hadoop configuration templates are stored -# -# If you wish to tweak the configuration files used by Magpie, set -# HADOOP_CONF_FILES below, copy configuration templates from -# $MAGPIE_SCRIPTS_HOME/conf/hadoop into HADOOP_CONF_FILES, and modify -# as you desire. Magpie will still use configuration files in -# $MAGPIE_SCRIPTS_HOME/conf/hadoop if any of the files it needs are -# not found in HADOOP_CONF_FILES. -# -# export HADOOP_CONF_FILES="${HOME}/myconf" - -# Daemon Heap Max -# -# Heap maximum for Hadoop daemons (i.e. Resource Manger, NodeManager, -# DataNode, History Server, etc.), specified in megs. Special case -# for Namenode, see below. -# -# If not specified, defaults to Hadoop default of 1000 -# -# May need to be increased if you are scaling large, get OutofMemory -# errors, or perhaps have a lot of cores on a node. -# -# export HADOOP_DAEMON_HEAP_MAX=2000 - -# Daemon Namenode Heap Max -# -# Heap maximum for Hadoop Namenode daemons specified in megs. -# -# If not specified, defaults to HADOOP_DAEMON_HEAP_MAX above. -# -# Unlike most Hadoop daemons, namenode may need more memory if there -# are a very large number of files in your HDFS setup. A general rule -# of thumb is a 1G heap for each 100T of data. -# -# export HADOOP_NAMENODE_DAEMON_HEAP_MAX=2000 - -# Environment Extra -# -# Specify extra environment information that should be passed into -# Hadoop. This file will simply be appended into the hadoop-env.sh -# and (if appropriate) yarn-env.sh. -# -# By default, a reasonable estimate for max user processes and open -# file descriptors will be calculated and put into hadoop-env.sh and -# (if appropriate) yarn-env.sh. However, it's always possible they may -# need to be set differently. Everyone's cluster/situation can be -# slightly different. -# -# See the example example-environment-extra extra for examples on -# what you can/should do with adding extra environment settings. -# -# export HADOOP_ENVIRONMENT_EXTRA_PATH="${HOME}/hadoop-my-environment" - -############################################################################ -# Hadoop Job/Run Configurations -############################################################################ - -# Set hadoop job for MAGPIE_JOB_TYPE = hadoop -# -# "terasort" - run terasort. Useful for making sure things are setup -# the way you like. -# -# There are additional configuration options for this -# listed below. -# -# "upgradehdfs" - upgrade your version of HDFS. Most notably this is -# used when you are switching to a newer Hadoop -# version and the HDFS version would be inconsistent -# without upgrading. Only works with HDFS versions >= -# 2.2.0. -# -# Please set your job time to be quite large when -# performing this upgrade. If your job times out and -# this process does not complete fully, it can leave -# HDFS in a bad state. -# -# Beware, once you upgrade it'll be difficult to rollback. -# -# "decommissionhdfsnodes" - decrease your HDFS over Lustre or HDFS -# over NetworkFS node size just as if you -# were on a cluster with local disk. Launch -# your job with the current present node -# size and set -# HADOOP_DECOMMISSION_HDFS_NODE_SIZE to the -# smaller node size to decommission into. -# Only works on Hadoop versions >= 2.3.0. -# -# Please set your job time to be quite large -# when performing this update. If your job -# times out and this process does not -# complete fully, it can leave HDFS in a bad -# state. -# -export HADOOP_JOB="terasort" - -# Tasks per Node -# -# If not specified, a reasonable estimate will be calculated based on -# number of CPUs on the system. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_MAX_TASKS_PER_NODE=8 - -# Default Map tasks for Job -# -# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE * compute -# nodes. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_MAP_TASKS=8 - -# Default Reduce tasks for Job -# -# If not specified, defaults to # compute nodes (i.e. 1 reducer per -# node) -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_REDUCE_TASKS=8 - -# Heap size for JVM -# -# Specified in M. If not specified, a reasonable estimate will be -# calculated based on total memory available and number of CPUs on the -# system. -# -# HADOOP_CHILD_MAP_HEAPSIZE and HADOOP_CHILD_REDUCE_HEAPSIZE are for -# Yarn -# -# If HADOOP_CHILD_MAP_HEAPSIZE is not specified, it is assumed to be -# HADOOP_CHILD_HEAPSIZE. -# -# If HADOOP_CHILD_REDUCE_HEAPSIZE is not specified, it is assumed to -# be 2X the HADOOP_CHILD_MAP_HEAPSIZE. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_CHILD_HEAPSIZE=2048 -# export HADOOP_CHILD_MAP_HEAPSIZE=2048 -# export HADOOP_CHILD_REDUCE_HEAPSIZE=4096 - -# Container Buffer -# -# Specify the amount of overhead each Yarn container will have over -# the heap size. Specified in M. If not specified, a reasonable -# estimate will be calculated based on total memory available. -# -# export HADOOP_CHILD_MAP_CONTAINER_BUFFER=256 -# export HADOOP_CHILD_REDUCE_CONTAINER_BUFFER=512 - -# Mapreduce Slowstart, indicating percent of maps that should complete -# before reducers begin. -# -# If not specified, defaults to 0.05 -# -# export HADOOP_MAPREDUCE_SLOWSTART=0.05 - -# Container Memory -# -# Memory on compute nodes for containers. Typically "nice-chunk" less -# than actual memory on machine, b/c machine needs memory for its own -# needs (kernel, daemons, etc.). Specified in megs. -# -# If not specified, a reasonable estimate will be calculated based on -# total memory on the system. -# -# export YARN_RESOURCE_MEMORY=32768 - -# Check Memory Limits -# -# Should physical and virtual memory limits be enforced for containers. -# This can be helpful in cases where the OS (Centos/Redhat) is aggressive -# at allocating virtual memory and causes the vmem-to-pmem ratio to be -# hit. Defaults to true -# -# export YARN_VMEM_CHECK="false" -# export YARN_PMEM_CHECK="false" - -# Compression -# -# Should compression of outputs and intermediate data be enabled. -# Specify yes or no. Defaults to no. -# -# Effectively, is time spend compressing data going to save you time -# on I/O. Sometimes yes, sometimes no. -# -# export HADOOP_COMPRESSION=yes - -# IO Sort Factors + MB -# -# The number of streams of files to sort while reducing and the memory -# amount to use while sorting. This is a quite advanced mechanism -# taking into account many factors. If not specified, some reasonable -# number will be calculated. -# -# export HADOOP_IO_SORT_FACTOR=10 -# export HADOOP_IO_SORT_MB=100 - -# Parallel Copies -# -# The default number of parallel transfers run by reduce during the -# copy(shuffle) phase. If not specified, some reasonable number will -# be calculated. -# export HADOOP_PARALLEL_COPIES=10 - -############################################################################ -# Hadoop Terasort Configurations -############################################################################ - -# Terasort size -# -# For "terasort" mode. -# -# Specify terasort size in units of 100. Specify 10000000000 for -# terabyte, for actual benchmarking -# -# Specify something small, for basic sanity tests. -# -# Defaults to 50000000. -# -# export HADOOP_TERASORT_SIZE=50000000 - -# Terasort map count -# -# For "terasort" mode during the teragen of data. -# -# If not specified, will be computed to a reasonable number given -# HADOOP_TERASORT_SIZE and the block size of the the filesyste you are -# using (e.g. for HDFS the HADOOP_HDFS_BLOCKSIZE) -# -# export HADOOP_TERAGEN_MAP_COUNT=4 - -# Terasort reducer count -# -# For "terasort" mode during the actual terasort of data. -# -# If not specified, will be compute node count * 2. -# -# export HADOOP_TERASORT_REDUCER_COUNT=4 - -# Terasort cache -# -# For "real benchmarking" you should flush page cache between a -# teragen and a terasort. You can disable this for sanity runs/tests -# to make things go faster. Specify yes or no. Defaults to yes. -# -# export HADOOP_TERASORT_CLEAR_CACHE=no - -# Terasort output replication count -# -# For "terasort" mode during the actual terasort of data -# -# In some circumstances, replication of the output from the terasort -# must be equal to the replication of data for the input. In other -# cases it can be less. The below can be adjusted to tweak for -# benchmarking purposes. -# -# If not specified, defaults to Terasort default, which is 1 in most -# versions of Hadoop -# -# export HADOOP_TERASORT_OUTPUT_REPLICATION=1 - -# Terachecksum -# -# For "terasort" mode after the teragen of data -# -# After executing the teragen, run terachecksum to calculate a checksum of -# the input. -# -# If both this and HADOOP_TERASORT_RUN_TERAVALIDATE are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERACHECKSUM=no - -# Teravalidate -# -# For "terasort" mode after the actual terasort of data -# -# After executing the sort, run teravalidate to validate the sorted data. -# -# If both this and HADOOP_TERASORT_RUN_TERACHECKSUM are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERAVALIDATE=no - -############################################################################ -# Hadoop Decommission HDFS Nodes Configurations -############################################################################ - -# Specify decommission node size for "decommissionhdfsnodes" mode -# -# For example, if your current HDFS node size is 16, your job size is -# likely 17 nodes (including the master). If you wish to decommission -# to 8 data nodes (job size of 9 nodes total), set this to 8. -# -# export HADOOP_DECOMMISSION_HDFS_NODE_SIZE=8 - -############################################################################ -# Hadoop Filesystem Mode Configurations -############################################################################ - -# Set how the filesystem should be setup -# -# "hdfs" - Normal straight up HDFS if you have local disk in your -# cluster. This option is primarily for benchmarking and -# caching, but probably shouldn't be used in the general case. -# -# Be careful running this in a cluster environment. The next -# time you execute your job, if a different set of nodes are -# allocated to you, the HDFS data you wrote from a previous -# job may not be there. Specifying specific nodes to use in -# your job submission (e.g. --nodelist in sbatch) may be a -# way to alleviate this. -# -# User must set HADOOP_HDFS_PATH below. -# -# "hdfsoverlustre" - HDFS over Lustre. See README for description. -# -# User must set HADOOP_HDFSOVERLUSTRE_PATH below. -# -# "hdfsovernetworkfs" - HDFS over Network FS. Identical to HDFS over -# Lustre, but filesystem agnostic. -# -# User must set HADOOP_HDFSOVERNETWORKFS_PATH below. -# -# "rawnetworkfs" - Use Hadoop RawLocalFileSystem (i.e. file: scheme), -# to use networked file system directly. It could be a -# Lustre mount or NFS mount. Whatever you please. -# -# User must set HADOOP_RAWNETWORKFS_PATH below. -# -export HADOOP_FILESYSTEM_MODE="hdfsoverlustre" - -# Local Filesystem BlockSize -# -# This configuration is the blocksize hadoop will use when doing I/O -# to a local filesystem. It is used by HDFS when reading from the -# underlying filesystem. It is also used with -# HADOOP_FILESYSTEM_MODE="rawnetworkfs". -# -# Commonly 33554432, 67108864, 134217728 (i.e. 32m, 64m, 128m) -# -# If not specified, defaults to 33554432 -# -# export HADOOP_LOCAL_FILESYSTEM_BLOCKSIZE=33554432 - -# HDFS Replication -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# HDFS commonly uses 3. When doing HDFS over Lustre/NetworkFS, higher -# replication can also help with resilience if nodes fail. You may -# wish to set this to < 3 to save space. -# -# If not specified, defaults to 3 -# -# export HADOOP_HDFS_REPLICATION=3 - -# HDFS Block Size -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# Commonly 134217728, 268435456, 536870912 (i.e. 128m, 256m, 512m) -# -# If not specified, defaults to 134217728 -# -# export HADOOP_HDFS_BLOCKSIZE=134217728 - -# Path for HDFS when using local disk -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data and HDFS. The first -# path will also store daemon data, such as namenode or jobtracker -# data. -# -export HADOOP_HDFS_PATH="/ssd/${USER}/hdfs" - -# HDFS cleanup -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# After your job has completed, if HADOOP_HDFS_PATH_CLEAR is set to -# yes, Magpie will do a rm -rf on HADOOP_HDFS_PATH. -# -# This is particularly useful when doing normal HDFS on local storage. -# On your next job run, you may not be able to get the nodes you want -# on your next run. So you may want to clean up your work before the -# next user uses the node. -# -# export HADOOP_HDFS_PATH_CLEAR="yes" - -# Lustre path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERLUSTRE_PATH="/lustre/${USER}/hdfsoverlustre/" - -# HDFS over Lustre ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERLUSTRE_REMOVE_LOCKS=yes - -# Networkfs path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERNETWORKFS_PATH="/networkfs/${USER}/hdfsovernetworkfs/" - -# HDFS over Networkfs ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERNETWORKFS_REMOVE_LOCKS=yes - -# Path for rawnetworkfs -# -# This is used with HADOOP_FILESYSTEM_MODE="rawnetworkfs" -# -export HADOOP_RAWNETWORKFS_PATH="/lustre/${USER}/rawnetworkfs/" - -# If you have a local SSD or NVRAM, performance may be better to store -# intermediate data on it rather than Lustre or some other networked -# filesystem. If the below environment variable is specified, local -# intermediate data will be stored in the specified directory. -# Otherwise it will go to an appropriate directory in Lustre/networked -# FS. -# -# Be wary, local SSDs/NVRAM stores may have less space than HDDs or -# networked file systems. It can be easy to run out of space. -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data. -# -# export HADOOP_LOCALSTORE="/ssd/${USER}/localstore/" - -# HADOOP_LOCALSTORE_CLEAR -# -# After your job has completed, if HADOOP_LOCALSTORE_CLEAR is set to -# yes, Magpie will do a rm -rf on all directories in -# HADOOP_LOCALSTORE. This is particularly useful if the localstore -# directory is on local storage and you want to clean up your work -# before the next user uses the node. -# -# export HADOOP_LOCALSTORE_CLEAR="yes" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=yes - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - -############################################################################ -# Run Job -############################################################################ - -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-check-inputs -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-core -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-projects -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-post -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-pre-run -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-run -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-cleanup -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-post-run diff --git a/submission-scripts/script-msub-torque-pdsh/magpie.msub-torque-pdsh b/submission-scripts/script-msub-torque-pdsh/magpie.msub-torque-pdsh index cc43a4d8f..80f3668ed 100644 --- a/submission-scripts/script-msub-torque-pdsh/magpie.msub-torque-pdsh +++ b/submission-scripts/script-msub-torque-pdsh/magpie.msub-torque-pdsh @@ -91,8 +91,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # # "pig" - Run a job according to the settings of PIG_JOB. # -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# # "spark" - Run a job according to the settings of SPARK_JOB. # # "kafka" - Run a job according to the settings of KAFKA_JOB. @@ -114,7 +112,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # For Hbase, testall will run performanceeval # For Phoenix, testall will run performanceeval # For Pig, testall will run testpig -# For Mahout, testall will run clustersyntheticcontrol # For Spark, testall will run sparkpi # For Kafka, testall will run performance # For Zeppelin, testall will run checkzeppelinup @@ -947,60 +944,6 @@ export PIG_JOB="testpig" # # export PIG_OPTS="-Djava.io.tmpdir=${PIG_LOCAL_JOB_DIR}/tmp" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=no - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - ############################################################################ # Hbase Core Configurations ############################################################################ diff --git a/submission-scripts/script-msub-torque-pdsh/magpie.msub-torque-pdsh-hadoop-and-mahout b/submission-scripts/script-msub-torque-pdsh/magpie.msub-torque-pdsh-hadoop-and-mahout deleted file mode 100644 index baf09f983..000000000 --- a/submission-scripts/script-msub-torque-pdsh/magpie.msub-torque-pdsh-hadoop-and-mahout +++ /dev/null @@ -1,899 +0,0 @@ -#!/bin/sh -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -############################################################################ -# Moab Customizations -############################################################################ - -# Node count. Node count should include one node for the -# head/management/master node. For example, if you want 8 compute -# nodes to process data, specify 9 nodes below. -# -# If including Zookeeper, include expected Zookeeper nodes. For -# example, if you want 8 Hadoop compute nodes and 3 Zookeeper nodes, -# specify 12 nodes (1 master, 8 Hadoop, 3 Zookeeper) -# -# Also take into account additional nodes needed for other services. -# -# Many of the below can be configured on the command line. If you are -# more comfortable specifying these on the command line, feel free to -# delete the customizations below. - -#PBS -N - -#PBS -A - -#PBS -l nodes= - -#PBS -o moab-%j.out - -#PBS -l partition= - -#PBS -q - -# Note defaults of MAGPIE_STARTUP_TIME & MAGPIE_SHUTDOWN_TIME, the -# walltime should be a fair amount larger than them combined. - -#PBS -l walltime= - -#PBS -l resfailpolicy=ignore - -# Need to tell Magpie how you are submitting this job -# -# IMPORTANT: This submit file assumes torque is the underlying resource -# manager. If it is not, a new Magpie submission type should be added -# into Magpie. -export MAGPIE_SUBMISSION_TYPE="msubtorquepdsh" -############################################################################ -# Magpie Configurations -############################################################################ - -# Directory your launching scripts/files are stored -# -# Normally an NFS mount, someplace magpie can be reached on all nodes. -export MAGPIE_SCRIPTS_HOME="${HOME}/magpie" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" - -# Magpie job type -# -# "hadoop" - Run a job according to the settings of HADOOP_JOB. -# -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# -# "testall" - Run a job that runs all basic sanity tests for all -# software that is configured to be setup. This is a good -# way to sanity check that everything has been setup -# correctly and the way you like. -# -# For Hadoop, testall will run terasort -# For Mahout, testall will run clustersyntheticcontrol -# -# "script" - Run arbitraty script, as specified by MAGPIE_JOB_SCRIPT. -# You can find example job scripts in examples/. -# -# "interactive" - manually interact with job run to submit jobs, -# peruse data (e.g. HDFS), move data, etc. See job -# output for instructions to access your job -# allocation. -# -# "setuponly" - do not launch any daemons or services, only setup -# configuration files. Useful for debugging or -# development. -# -export MAGPIE_JOB_TYPE="mahout" - -# Specify script and arguments to execute for "script" mode in -# MAGPIE_JOB_TYPE -# -# export MAGPIE_JOB_SCRIPT="${HOME}/my-job-script" - -# Specify script startup / shutdown time window -# -# Specifies the amount of time to give startup / shutdown activities a -# chance to succeed before Magpie will give up (or in the case of -# shutdown, when the resource manager/scheduler may kill the running -# job). Defaults to 30 minutes for startup, 30 minutes for shutdown. -# -# The startup time in particular may need to be increased if you have -# a large amount of data. As an example, HDFS may need to spend a -# significant amount of time determine all of the blocks in HDFS -# before leaving safemode. -# -# The stop time in particular may need to be increased if you have a -# large amount of cleanup to be done. HDFS will save its NameSpace -# before shutting down. Hbase will do a compaction before shutting -# down. -# -# The startup & shutdown window must together be smaller than the -# timelimit specified for the job. -# -# MAGPIE_STARTUP_TIME and MAGPIE_SHUTDOWN_TIME at minimum must be 5 -# minutes. If MAGPIE_POST_JOB_RUN is specified below, -# MAGPIE_SHUTDOWN_TIME must be at minimum 10 minutes. -# -# export MAGPIE_STARTUP_TIME=30 -# export MAGPIE_SHUTDOWN_TIME=30 - -# Magpie One Time Run -# -# Normally, Magpie assumes that when a user runs a job, data created -# and stored within that job will be desired to be accessed again. For -# example, data created and stored within HDFS will be accessed again. -# -# Under a number of scenarios, this may not be desired. For example -# during testing. -# -# To improve useability and performance, setting MAGPIE_ONE_TIME_RUN -# below to yes will have two effects on the Magpie job. -# -# 1) A number of data paths (such as for HDFS) will be put into unique -# paths for this job. Therefore, no other job should be able to -# access the data again. This is particularly useful if you wish -# to run performance tests with this job script over and over -# again. -# -# Magpie will not remove data that was written, so be sure to clean up -# your directories later. -# -# 2) In order to improve job throughout, Magpie will take shortcuts by -# not properly tearing down the job. As data corruption should not be -# a concern on job teardown, the job can complete more quickly. -# -# export MAGPIE_ONE_TIME_RUN=yes - -# Convenience Scripts -# -# Specify script to be executed to before / after your job. It is run -# on all nodes. -# -# Typically the pre-job script is used to set something up or get -# debugging info. It can also be used to determine if system -# conditions meet the expectations of your job. The primary job -# running script (magpie-run) will not be executed if the -# MAGPIE_PRE_JOB_RUN exits with a non-zero exit code. -# -# The post-job script is typically used for cleaning up something or -# gathering info (such as logs) for post-debugging/analysis. If it is -# set, MAGPIE_SHUTDOWN_TIME above must be > 5. -# -# See example magpie-example-pre-job-script and -# magpie-example-post-job-script for ideas of what you can do w/ these -# scripts -# -# Multiple scripts can be specified separated by comma. Arguments can -# be passed to scripts as well. -# -# A number of convenient scripts are available in the -# ${MAGPIE_SCRIPTS_HOME}/scripts directory. -# -# export MAGPIE_PRE_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" -# -# Similar to the MAGPIE_PRE_JOB_RUN and MAGPIE_POST_JOB_RUN, scripts can be -# run after the stack is setup but prior to the script or interactive mode -# begins. This enables frontends and other processes that depend on the stack -# to be started up and torn down. In similar fashion the cleanup will be done -# immediately after the script or interactive mode exits before the stack is -# shutdown. -# -# export MAGPIE_PRE_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" - -# Environment Variable Script -# -# When working with Magpie interactively by logging into the master -# node of your job allocation, many environment variables may need to -# be set. For example, environment variables for config file -# directories (e.g. HADOOP_CONF_DIR, HBASE_CONF_DIR, etc.) and home -# directories (e.g. HADOOP_HOME, HBASE_HOME, etc.) and more general -# environment variables (e.g. JAVA_HOME) may need to be set before you -# begin interacting with your big data setup. -# -# The standard job output from Magpie provides instructions on all the -# environment variables typically needed to interact with your job. -# However, this can be tedious if done by hand. -# -# If the environment variable specified below is set, Magpie will -# create the file and put into it every environment variable that -# would be useful when running your job interactively. That way, it -# can be sourced easily if you will be running your job interactively. -# It can also be loaded or used by other job scripts. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT="${HOME}/my-job-env" - -# Environment Variable Shell Type -# -# Magpie outputs environment variables in help output and -# MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT based on your SHELL environment -# variable. -# -# If you would like to output in a different shell type (perhaps you -# have programmed scripts in a different shell), specify that shell -# here. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT_SHELL="/bin/bash" - -# Remote Shell -# -# Magpie requires a passwordless remote shell command to launch -# necessary daemons across your job allocation. Magpie defaults to -# ssh, but it may be an alternate command in some environments. An -# alternate ssh-equivalent remote command can be specified by setting -# MAGPIE_REMOTE_CMD below. -# -# If using ssh, Magpie requires keys to be setup ahead of time so it -# can be executed without passwords. -# -# Specify options to the remote shell command if necessary. -# -# export MAGPIE_REMOTE_CMD="ssh" -# export MAGPIE_REMOTE_CMD_OPTS="" - -############################################################################ -# General Configuration -############################################################################ - -# Necessary for most projects -export JAVA_HOME="/usr/lib/jvm/jre-1.7.0/" - -############################################################################ -# Hadoop Core Configurations -############################################################################ - -# Should Hadoop be run -# -# Specify yes or no. Defaults to no. -# -export HADOOP_SETUP=yes - -# Set Hadoop Setup Type -# -# Will inform scripts on how to setup config files and what daemons to -# launch/setup. -# -# MR - Launch HDFS and Yarn -# YARN - Enable only Yarn -# HDFS - Enable only HDFS -# -# HDFS only may be useful when you want to use HDFS with other big -# data software, such as Hbase, and do not care for MapReduce or Yarn. -# It only works with HDFS based HADOOP_FILESYSTEM_MODE, such as -# "hdfs", "hdfsoverlustre", or "hdfsovernetworkfs". -# -# YARN only may be useful when you need Yarn setup for scheduling, but -# will not be using HDFS. For example, you may be reading from a -# networked file system directly. This option requires -# HADOOP_FILESYSTEM_MODE to 'rawnetworkfs'. -# -export HADOOP_SETUP_TYPE="MR" - -# Version -# -# Make sure the version for Mapreduce version 1 or 2 matches whatever -# you set in HADOOP_SETUP_TYPE -# -export HADOOP_VERSION="2.9.1" - -# Path to your Hadoop build/binaries -# -# Make sure the build for MapReduce or HDFS version 1 or 2 matches -# whatever you set in HADOOP_SETUP_TYPE. -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -export HADOOP_HOME="${HOME}/hadoop-${HADOOP_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -# This will not be used for storing intermediate files or -# distributed cache files. See HADOOP_LOCALSTORE above for that. -# -export HADOOP_LOCAL_DIR="/tmp/${USER}/hadoop" - -# Directory where alternate Hadoop configuration templates are stored -# -# If you wish to tweak the configuration files used by Magpie, set -# HADOOP_CONF_FILES below, copy configuration templates from -# $MAGPIE_SCRIPTS_HOME/conf/hadoop into HADOOP_CONF_FILES, and modify -# as you desire. Magpie will still use configuration files in -# $MAGPIE_SCRIPTS_HOME/conf/hadoop if any of the files it needs are -# not found in HADOOP_CONF_FILES. -# -# export HADOOP_CONF_FILES="${HOME}/myconf" - -# Daemon Heap Max -# -# Heap maximum for Hadoop daemons (i.e. Resource Manger, NodeManager, -# DataNode, History Server, etc.), specified in megs. Special case -# for Namenode, see below. -# -# If not specified, defaults to Hadoop default of 1000 -# -# May need to be increased if you are scaling large, get OutofMemory -# errors, or perhaps have a lot of cores on a node. -# -# export HADOOP_DAEMON_HEAP_MAX=2000 - -# Daemon Namenode Heap Max -# -# Heap maximum for Hadoop Namenode daemons specified in megs. -# -# If not specified, defaults to HADOOP_DAEMON_HEAP_MAX above. -# -# Unlike most Hadoop daemons, namenode may need more memory if there -# are a very large number of files in your HDFS setup. A general rule -# of thumb is a 1G heap for each 100T of data. -# -# export HADOOP_NAMENODE_DAEMON_HEAP_MAX=2000 - -# Environment Extra -# -# Specify extra environment information that should be passed into -# Hadoop. This file will simply be appended into the hadoop-env.sh -# and (if appropriate) yarn-env.sh. -# -# By default, a reasonable estimate for max user processes and open -# file descriptors will be calculated and put into hadoop-env.sh and -# (if appropriate) yarn-env.sh. However, it's always possible they may -# need to be set differently. Everyone's cluster/situation can be -# slightly different. -# -# See the example example-environment-extra extra for examples on -# what you can/should do with adding extra environment settings. -# -# export HADOOP_ENVIRONMENT_EXTRA_PATH="${HOME}/hadoop-my-environment" - -############################################################################ -# Hadoop Job/Run Configurations -############################################################################ - -# Set hadoop job for MAGPIE_JOB_TYPE = hadoop -# -# "terasort" - run terasort. Useful for making sure things are setup -# the way you like. -# -# There are additional configuration options for this -# listed below. -# -# "upgradehdfs" - upgrade your version of HDFS. Most notably this is -# used when you are switching to a newer Hadoop -# version and the HDFS version would be inconsistent -# without upgrading. Only works with HDFS versions >= -# 2.2.0. -# -# Please set your job time to be quite large when -# performing this upgrade. If your job times out and -# this process does not complete fully, it can leave -# HDFS in a bad state. -# -# Beware, once you upgrade it'll be difficult to rollback. -# -# "decommissionhdfsnodes" - decrease your HDFS over Lustre or HDFS -# over NetworkFS node size just as if you -# were on a cluster with local disk. Launch -# your job with the current present node -# size and set -# HADOOP_DECOMMISSION_HDFS_NODE_SIZE to the -# smaller node size to decommission into. -# Only works on Hadoop versions >= 2.3.0. -# -# Please set your job time to be quite large -# when performing this update. If your job -# times out and this process does not -# complete fully, it can leave HDFS in a bad -# state. -# -export HADOOP_JOB="terasort" - -# Tasks per Node -# -# If not specified, a reasonable estimate will be calculated based on -# number of CPUs on the system. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_MAX_TASKS_PER_NODE=8 - -# Default Map tasks for Job -# -# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE * compute -# nodes. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_MAP_TASKS=8 - -# Default Reduce tasks for Job -# -# If not specified, defaults to # compute nodes (i.e. 1 reducer per -# node) -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_REDUCE_TASKS=8 - -# Heap size for JVM -# -# Specified in M. If not specified, a reasonable estimate will be -# calculated based on total memory available and number of CPUs on the -# system. -# -# HADOOP_CHILD_MAP_HEAPSIZE and HADOOP_CHILD_REDUCE_HEAPSIZE are for -# Yarn -# -# If HADOOP_CHILD_MAP_HEAPSIZE is not specified, it is assumed to be -# HADOOP_CHILD_HEAPSIZE. -# -# If HADOOP_CHILD_REDUCE_HEAPSIZE is not specified, it is assumed to -# be 2X the HADOOP_CHILD_MAP_HEAPSIZE. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_CHILD_HEAPSIZE=2048 -# export HADOOP_CHILD_MAP_HEAPSIZE=2048 -# export HADOOP_CHILD_REDUCE_HEAPSIZE=4096 - -# Container Buffer -# -# Specify the amount of overhead each Yarn container will have over -# the heap size. Specified in M. If not specified, a reasonable -# estimate will be calculated based on total memory available. -# -# export HADOOP_CHILD_MAP_CONTAINER_BUFFER=256 -# export HADOOP_CHILD_REDUCE_CONTAINER_BUFFER=512 - -# Mapreduce Slowstart, indicating percent of maps that should complete -# before reducers begin. -# -# If not specified, defaults to 0.05 -# -# export HADOOP_MAPREDUCE_SLOWSTART=0.05 - -# Container Memory -# -# Memory on compute nodes for containers. Typically "nice-chunk" less -# than actual memory on machine, b/c machine needs memory for its own -# needs (kernel, daemons, etc.). Specified in megs. -# -# If not specified, a reasonable estimate will be calculated based on -# total memory on the system. -# -# export YARN_RESOURCE_MEMORY=32768 - -# Check Memory Limits -# -# Should physical and virtual memory limits be enforced for containers. -# This can be helpful in cases where the OS (Centos/Redhat) is aggressive -# at allocating virtual memory and causes the vmem-to-pmem ratio to be -# hit. Defaults to true -# -# export YARN_VMEM_CHECK="false" -# export YARN_PMEM_CHECK="false" - -# Compression -# -# Should compression of outputs and intermediate data be enabled. -# Specify yes or no. Defaults to no. -# -# Effectively, is time spend compressing data going to save you time -# on I/O. Sometimes yes, sometimes no. -# -# export HADOOP_COMPRESSION=yes - -# IO Sort Factors + MB -# -# The number of streams of files to sort while reducing and the memory -# amount to use while sorting. This is a quite advanced mechanism -# taking into account many factors. If not specified, some reasonable -# number will be calculated. -# -# export HADOOP_IO_SORT_FACTOR=10 -# export HADOOP_IO_SORT_MB=100 - -# Parallel Copies -# -# The default number of parallel transfers run by reduce during the -# copy(shuffle) phase. If not specified, some reasonable number will -# be calculated. -# export HADOOP_PARALLEL_COPIES=10 - -############################################################################ -# Hadoop Terasort Configurations -############################################################################ - -# Terasort size -# -# For "terasort" mode. -# -# Specify terasort size in units of 100. Specify 10000000000 for -# terabyte, for actual benchmarking -# -# Specify something small, for basic sanity tests. -# -# Defaults to 50000000. -# -# export HADOOP_TERASORT_SIZE=50000000 - -# Terasort map count -# -# For "terasort" mode during the teragen of data. -# -# If not specified, will be computed to a reasonable number given -# HADOOP_TERASORT_SIZE and the block size of the the filesyste you are -# using (e.g. for HDFS the HADOOP_HDFS_BLOCKSIZE) -# -# export HADOOP_TERAGEN_MAP_COUNT=4 - -# Terasort reducer count -# -# For "terasort" mode during the actual terasort of data. -# -# If not specified, will be compute node count * 2. -# -# export HADOOP_TERASORT_REDUCER_COUNT=4 - -# Terasort cache -# -# For "real benchmarking" you should flush page cache between a -# teragen and a terasort. You can disable this for sanity runs/tests -# to make things go faster. Specify yes or no. Defaults to yes. -# -# export HADOOP_TERASORT_CLEAR_CACHE=no - -# Terasort output replication count -# -# For "terasort" mode during the actual terasort of data -# -# In some circumstances, replication of the output from the terasort -# must be equal to the replication of data for the input. In other -# cases it can be less. The below can be adjusted to tweak for -# benchmarking purposes. -# -# If not specified, defaults to Terasort default, which is 1 in most -# versions of Hadoop -# -# export HADOOP_TERASORT_OUTPUT_REPLICATION=1 - -# Terachecksum -# -# For "terasort" mode after the teragen of data -# -# After executing the teragen, run terachecksum to calculate a checksum of -# the input. -# -# If both this and HADOOP_TERASORT_RUN_TERAVALIDATE are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERACHECKSUM=no - -# Teravalidate -# -# For "terasort" mode after the actual terasort of data -# -# After executing the sort, run teravalidate to validate the sorted data. -# -# If both this and HADOOP_TERASORT_RUN_TERACHECKSUM are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERAVALIDATE=no - -############################################################################ -# Hadoop Decommission HDFS Nodes Configurations -############################################################################ - -# Specify decommission node size for "decommissionhdfsnodes" mode -# -# For example, if your current HDFS node size is 16, your job size is -# likely 17 nodes (including the master). If you wish to decommission -# to 8 data nodes (job size of 9 nodes total), set this to 8. -# -# export HADOOP_DECOMMISSION_HDFS_NODE_SIZE=8 - -############################################################################ -# Hadoop Filesystem Mode Configurations -############################################################################ - -# Set how the filesystem should be setup -# -# "hdfs" - Normal straight up HDFS if you have local disk in your -# cluster. This option is primarily for benchmarking and -# caching, but probably shouldn't be used in the general case. -# -# Be careful running this in a cluster environment. The next -# time you execute your job, if a different set of nodes are -# allocated to you, the HDFS data you wrote from a previous -# job may not be there. Specifying specific nodes to use in -# your job submission (e.g. --nodelist in sbatch) may be a -# way to alleviate this. -# -# User must set HADOOP_HDFS_PATH below. -# -# "hdfsoverlustre" - HDFS over Lustre. See README for description. -# -# User must set HADOOP_HDFSOVERLUSTRE_PATH below. -# -# "hdfsovernetworkfs" - HDFS over Network FS. Identical to HDFS over -# Lustre, but filesystem agnostic. -# -# User must set HADOOP_HDFSOVERNETWORKFS_PATH below. -# -# "rawnetworkfs" - Use Hadoop RawLocalFileSystem (i.e. file: scheme), -# to use networked file system directly. It could be a -# Lustre mount or NFS mount. Whatever you please. -# -# User must set HADOOP_RAWNETWORKFS_PATH below. -# -export HADOOP_FILESYSTEM_MODE="hdfsoverlustre" - -# Local Filesystem BlockSize -# -# This configuration is the blocksize hadoop will use when doing I/O -# to a local filesystem. It is used by HDFS when reading from the -# underlying filesystem. It is also used with -# HADOOP_FILESYSTEM_MODE="rawnetworkfs". -# -# Commonly 33554432, 67108864, 134217728 (i.e. 32m, 64m, 128m) -# -# If not specified, defaults to 33554432 -# -# export HADOOP_LOCAL_FILESYSTEM_BLOCKSIZE=33554432 - -# HDFS Replication -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# HDFS commonly uses 3. When doing HDFS over Lustre/NetworkFS, higher -# replication can also help with resilience if nodes fail. You may -# wish to set this to < 3 to save space. -# -# If not specified, defaults to 3 -# -# export HADOOP_HDFS_REPLICATION=3 - -# HDFS Block Size -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# Commonly 134217728, 268435456, 536870912 (i.e. 128m, 256m, 512m) -# -# If not specified, defaults to 134217728 -# -# export HADOOP_HDFS_BLOCKSIZE=134217728 - -# Path for HDFS when using local disk -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data and HDFS. The first -# path will also store daemon data, such as namenode or jobtracker -# data. -# -export HADOOP_HDFS_PATH="/ssd/${USER}/hdfs" - -# HDFS cleanup -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# After your job has completed, if HADOOP_HDFS_PATH_CLEAR is set to -# yes, Magpie will do a rm -rf on HADOOP_HDFS_PATH. -# -# This is particularly useful when doing normal HDFS on local storage. -# On your next job run, you may not be able to get the nodes you want -# on your next run. So you may want to clean up your work before the -# next user uses the node. -# -# export HADOOP_HDFS_PATH_CLEAR="yes" - -# Lustre path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERLUSTRE_PATH="/lustre/${USER}/hdfsoverlustre/" - -# HDFS over Lustre ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERLUSTRE_REMOVE_LOCKS=yes - -# Networkfs path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERNETWORKFS_PATH="/networkfs/${USER}/hdfsovernetworkfs/" - -# HDFS over Networkfs ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERNETWORKFS_REMOVE_LOCKS=yes - -# Path for rawnetworkfs -# -# This is used with HADOOP_FILESYSTEM_MODE="rawnetworkfs" -# -export HADOOP_RAWNETWORKFS_PATH="/lustre/${USER}/rawnetworkfs/" - -# If you have a local SSD or NVRAM, performance may be better to store -# intermediate data on it rather than Lustre or some other networked -# filesystem. If the below environment variable is specified, local -# intermediate data will be stored in the specified directory. -# Otherwise it will go to an appropriate directory in Lustre/networked -# FS. -# -# Be wary, local SSDs/NVRAM stores may have less space than HDDs or -# networked file systems. It can be easy to run out of space. -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data. -# -# export HADOOP_LOCALSTORE="/ssd/${USER}/localstore/" - -# HADOOP_LOCALSTORE_CLEAR -# -# After your job has completed, if HADOOP_LOCALSTORE_CLEAR is set to -# yes, Magpie will do a rm -rf on all directories in -# HADOOP_LOCALSTORE. This is particularly useful if the localstore -# directory is on local storage and you want to clean up your work -# before the next user uses the node. -# -# export HADOOP_LOCALSTORE_CLEAR="yes" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=yes - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - -############################################################################ -# Run Job -############################################################################ - -ENV=$(env | grep -E '^MAGPIE|^HADOOP|^PIG|^ZOOKEEPER|^KAFKA|^ZEPPELIN|^PHOENIX|^HBASE|^SPARK|^STORM|^JAVA|^LD_LIBRARY_PATH|^MOAB|^PATH|^PBS|RAMDISK'\ - | sed 's/^/export /;s/=/="/;s/$/"/') - -pdsh "$ENV; - $MAGPIE_SCRIPTS_HOME/magpie-check-inputs && - $MAGPIE_SCRIPTS_HOME/magpie-setup-core && - $MAGPIE_SCRIPTS_HOME/magpie-setup-projects && - $MAGPIE_SCRIPTS_HOME/magpie-setup-post && - $MAGPIE_SCRIPTS_HOME/magpie-pre-run && - $MAGPIE_SCRIPTS_HOME/magpie-run && - $MAGPIE_SCRIPTS_HOME/magpie-cleanup && - $MAGPIE_SCRIPTS_HOME/magpie-post-run - " diff --git a/submission-scripts/script-sbatch-mpirun/magpie.sbatch-mpirun b/submission-scripts/script-sbatch-mpirun/magpie.sbatch-mpirun index 4df58150d..8fed2ddff 100644 --- a/submission-scripts/script-sbatch-mpirun/magpie.sbatch-mpirun +++ b/submission-scripts/script-sbatch-mpirun/magpie.sbatch-mpirun @@ -90,8 +90,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # # "pig" - Run a job according to the settings of PIG_JOB. # -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# # "spark" - Run a job according to the settings of SPARK_JOB. # # "kafka" - Run a job according to the settings of KAFKA_JOB. @@ -113,7 +111,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # For Hbase, testall will run performanceeval # For Phoenix, testall will run performanceeval # For Pig, testall will run testpig -# For Mahout, testall will run clustersyntheticcontrol # For Spark, testall will run sparkpi # For Kafka, testall will run performance # For Zeppelin, testall will run checkzeppelinup @@ -946,60 +943,6 @@ export PIG_JOB="testpig" # # export PIG_OPTS="-Djava.io.tmpdir=${PIG_LOCAL_JOB_DIR}/tmp" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=no - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - ############################################################################ # Hbase Core Configurations ############################################################################ diff --git a/submission-scripts/script-sbatch-mpirun/magpie.sbatch-mpirun-hadoop-and-mahout b/submission-scripts/script-sbatch-mpirun/magpie.sbatch-mpirun-hadoop-and-mahout deleted file mode 100644 index 41cb33dd6..000000000 --- a/submission-scripts/script-sbatch-mpirun/magpie.sbatch-mpirun-hadoop-and-mahout +++ /dev/null @@ -1,916 +0,0 @@ -#!/bin/sh -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -############################################################################ -# SLURM Customizations -############################################################################ - -# Node count. Node count should include one node for the -# head/management/master node. For example, if you want 8 compute -# nodes to process data, specify 9 nodes below. -# -# If including Zookeeper, include expected Zookeeper nodes. For -# example, if you want 8 Hadoop compute nodes and 3 Zookeeper nodes, -# specify 12 nodes (1 master, 8 Hadoop, 3 Zookeeper) -# -# Also take into account additional nodes needed for other services. -# -# Many of the below can be configured on the command line. If you are -# more comfortable specifying these on the command line, feel free to -# delete the customizations below. - -#SBATCH --nodes= -#SBATCH --output="slurm-%j.out" - -# Note defaults of MAGPIE_STARTUP_TIME & MAGPIE_SHUTDOWN_TIME, this -# timelimit should be a fair amount larger than them combined. -#SBATCH --time= - -# Job name. This will be used in naming directories for the job. -#SBATCH --job-name= - -# Partition to launch job in -#SBATCH --partition= - -## SLURM Values -# Generally speaking, don't touch the following, misc other configuration - -#SBATCH --ntasks-per-node=1 -#SBATCH --exclusive -#SBATCH --no-kill - -# Need to tell Magpie how you are submitting this job -export MAGPIE_SUBMISSION_TYPE="sbatchmpirun" - - -############################################################################ -# Magpie Configurations -############################################################################ - -# Directory your launching scripts/files are stored -# -# Normally an NFS mount, someplace magpie can be reached on all nodes. -export MAGPIE_SCRIPTS_HOME="${HOME}/magpie" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" - -# Magpie job type -# -# "hadoop" - Run a job according to the settings of HADOOP_JOB. -# -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# -# "testall" - Run a job that runs all basic sanity tests for all -# software that is configured to be setup. This is a good -# way to sanity check that everything has been setup -# correctly and the way you like. -# -# For Hadoop, testall will run terasort -# For Mahout, testall will run clustersyntheticcontrol -# -# "script" - Run arbitraty script, as specified by MAGPIE_JOB_SCRIPT. -# You can find example job scripts in examples/. -# -# "interactive" - manually interact with job run to submit jobs, -# peruse data (e.g. HDFS), move data, etc. See job -# output for instructions to access your job -# allocation. -# -# "setuponly" - do not launch any daemons or services, only setup -# configuration files. Useful for debugging or -# development. -# -export MAGPIE_JOB_TYPE="mahout" - -# Specify script and arguments to execute for "script" mode in -# MAGPIE_JOB_TYPE -# -# export MAGPIE_JOB_SCRIPT="${HOME}/my-job-script" - -# Specify script startup / shutdown time window -# -# Specifies the amount of time to give startup / shutdown activities a -# chance to succeed before Magpie will give up (or in the case of -# shutdown, when the resource manager/scheduler may kill the running -# job). Defaults to 30 minutes for startup, 30 minutes for shutdown. -# -# The startup time in particular may need to be increased if you have -# a large amount of data. As an example, HDFS may need to spend a -# significant amount of time determine all of the blocks in HDFS -# before leaving safemode. -# -# The stop time in particular may need to be increased if you have a -# large amount of cleanup to be done. HDFS will save its NameSpace -# before shutting down. Hbase will do a compaction before shutting -# down. -# -# The startup & shutdown window must together be smaller than the -# timelimit specified for the job. -# -# MAGPIE_STARTUP_TIME and MAGPIE_SHUTDOWN_TIME at minimum must be 5 -# minutes. If MAGPIE_POST_JOB_RUN is specified below, -# MAGPIE_SHUTDOWN_TIME must be at minimum 10 minutes. -# -# export MAGPIE_STARTUP_TIME=30 -# export MAGPIE_SHUTDOWN_TIME=30 - -# Magpie One Time Run -# -# Normally, Magpie assumes that when a user runs a job, data created -# and stored within that job will be desired to be accessed again. For -# example, data created and stored within HDFS will be accessed again. -# -# Under a number of scenarios, this may not be desired. For example -# during testing. -# -# To improve useability and performance, setting MAGPIE_ONE_TIME_RUN -# below to yes will have two effects on the Magpie job. -# -# 1) A number of data paths (such as for HDFS) will be put into unique -# paths for this job. Therefore, no other job should be able to -# access the data again. This is particularly useful if you wish -# to run performance tests with this job script over and over -# again. -# -# Magpie will not remove data that was written, so be sure to clean up -# your directories later. -# -# 2) In order to improve job throughout, Magpie will take shortcuts by -# not properly tearing down the job. As data corruption should not be -# a concern on job teardown, the job can complete more quickly. -# -# export MAGPIE_ONE_TIME_RUN=yes - -# Convenience Scripts -# -# Specify script to be executed to before / after your job. It is run -# on all nodes. -# -# Typically the pre-job script is used to set something up or get -# debugging info. It can also be used to determine if system -# conditions meet the expectations of your job. The primary job -# running script (magpie-run) will not be executed if the -# MAGPIE_PRE_JOB_RUN exits with a non-zero exit code. -# -# The post-job script is typically used for cleaning up something or -# gathering info (such as logs) for post-debugging/analysis. If it is -# set, MAGPIE_SHUTDOWN_TIME above must be > 5. -# -# See example magpie-example-pre-job-script and -# magpie-example-post-job-script for ideas of what you can do w/ these -# scripts -# -# Multiple scripts can be specified separated by comma. Arguments can -# be passed to scripts as well. -# -# A number of convenient scripts are available in the -# ${MAGPIE_SCRIPTS_HOME}/scripts directory. -# -# export MAGPIE_PRE_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" -# -# Similar to the MAGPIE_PRE_JOB_RUN and MAGPIE_POST_JOB_RUN, scripts can be -# run after the stack is setup but prior to the script or interactive mode -# begins. This enables frontends and other processes that depend on the stack -# to be started up and torn down. In similar fashion the cleanup will be done -# immediately after the script or interactive mode exits before the stack is -# shutdown. -# -# export MAGPIE_PRE_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" - -# Environment Variable Script -# -# When working with Magpie interactively by logging into the master -# node of your job allocation, many environment variables may need to -# be set. For example, environment variables for config file -# directories (e.g. HADOOP_CONF_DIR, HBASE_CONF_DIR, etc.) and home -# directories (e.g. HADOOP_HOME, HBASE_HOME, etc.) and more general -# environment variables (e.g. JAVA_HOME) may need to be set before you -# begin interacting with your big data setup. -# -# The standard job output from Magpie provides instructions on all the -# environment variables typically needed to interact with your job. -# However, this can be tedious if done by hand. -# -# If the environment variable specified below is set, Magpie will -# create the file and put into it every environment variable that -# would be useful when running your job interactively. That way, it -# can be sourced easily if you will be running your job interactively. -# It can also be loaded or used by other job scripts. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT="${HOME}/my-job-env" - -# Environment Variable Shell Type -# -# Magpie outputs environment variables in help output and -# MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT based on your SHELL environment -# variable. -# -# If you would like to output in a different shell type (perhaps you -# have programmed scripts in a different shell), specify that shell -# here. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT_SHELL="/bin/bash" - -# Remote Shell -# -# Magpie requires a passwordless remote shell command to launch -# necessary daemons across your job allocation. Magpie defaults to -# ssh, but it may be an alternate command in some environments. An -# alternate ssh-equivalent remote command can be specified by setting -# MAGPIE_REMOTE_CMD below. -# -# If using ssh, Magpie requires keys to be setup ahead of time so it -# can be executed without passwords. -# -# Specify options to the remote shell command if necessary. -# -# export MAGPIE_REMOTE_CMD="ssh" -# export MAGPIE_REMOTE_CMD_OPTS="" - -############################################################################ -# General Configuration -############################################################################ - -# Necessary for most projects -export JAVA_HOME="/usr/lib/jvm/jre-1.7.0/" - -############################################################################ -# Hadoop Core Configurations -############################################################################ - -# Should Hadoop be run -# -# Specify yes or no. Defaults to no. -# -export HADOOP_SETUP=yes - -# Set Hadoop Setup Type -# -# Will inform scripts on how to setup config files and what daemons to -# launch/setup. -# -# MR - Launch HDFS and Yarn -# YARN - Enable only Yarn -# HDFS - Enable only HDFS -# -# HDFS only may be useful when you want to use HDFS with other big -# data software, such as Hbase, and do not care for MapReduce or Yarn. -# It only works with HDFS based HADOOP_FILESYSTEM_MODE, such as -# "hdfs", "hdfsoverlustre", or "hdfsovernetworkfs". -# -# YARN only may be useful when you need Yarn setup for scheduling, but -# will not be using HDFS. For example, you may be reading from a -# networked file system directly. This option requires -# HADOOP_FILESYSTEM_MODE to 'rawnetworkfs'. -# -export HADOOP_SETUP_TYPE="MR" - -# Version -# -# Make sure the version for Mapreduce version 1 or 2 matches whatever -# you set in HADOOP_SETUP_TYPE -# -export HADOOP_VERSION="2.9.1" - -# Path to your Hadoop build/binaries -# -# Make sure the build for MapReduce or HDFS version 1 or 2 matches -# whatever you set in HADOOP_SETUP_TYPE. -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -export HADOOP_HOME="${HOME}/hadoop-${HADOOP_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -# This will not be used for storing intermediate files or -# distributed cache files. See HADOOP_LOCALSTORE above for that. -# -export HADOOP_LOCAL_DIR="/tmp/${USER}/hadoop" - -# Directory where alternate Hadoop configuration templates are stored -# -# If you wish to tweak the configuration files used by Magpie, set -# HADOOP_CONF_FILES below, copy configuration templates from -# $MAGPIE_SCRIPTS_HOME/conf/hadoop into HADOOP_CONF_FILES, and modify -# as you desire. Magpie will still use configuration files in -# $MAGPIE_SCRIPTS_HOME/conf/hadoop if any of the files it needs are -# not found in HADOOP_CONF_FILES. -# -# export HADOOP_CONF_FILES="${HOME}/myconf" - -# Daemon Heap Max -# -# Heap maximum for Hadoop daemons (i.e. Resource Manger, NodeManager, -# DataNode, History Server, etc.), specified in megs. Special case -# for Namenode, see below. -# -# If not specified, defaults to Hadoop default of 1000 -# -# May need to be increased if you are scaling large, get OutofMemory -# errors, or perhaps have a lot of cores on a node. -# -# export HADOOP_DAEMON_HEAP_MAX=2000 - -# Daemon Namenode Heap Max -# -# Heap maximum for Hadoop Namenode daemons specified in megs. -# -# If not specified, defaults to HADOOP_DAEMON_HEAP_MAX above. -# -# Unlike most Hadoop daemons, namenode may need more memory if there -# are a very large number of files in your HDFS setup. A general rule -# of thumb is a 1G heap for each 100T of data. -# -# export HADOOP_NAMENODE_DAEMON_HEAP_MAX=2000 - -# Environment Extra -# -# Specify extra environment information that should be passed into -# Hadoop. This file will simply be appended into the hadoop-env.sh -# and (if appropriate) yarn-env.sh. -# -# By default, a reasonable estimate for max user processes and open -# file descriptors will be calculated and put into hadoop-env.sh and -# (if appropriate) yarn-env.sh. However, it's always possible they may -# need to be set differently. Everyone's cluster/situation can be -# slightly different. -# -# See the example example-environment-extra extra for examples on -# what you can/should do with adding extra environment settings. -# -# export HADOOP_ENVIRONMENT_EXTRA_PATH="${HOME}/hadoop-my-environment" - -############################################################################ -# Hadoop Job/Run Configurations -############################################################################ - -# Set hadoop job for MAGPIE_JOB_TYPE = hadoop -# -# "terasort" - run terasort. Useful for making sure things are setup -# the way you like. -# -# There are additional configuration options for this -# listed below. -# -# "upgradehdfs" - upgrade your version of HDFS. Most notably this is -# used when you are switching to a newer Hadoop -# version and the HDFS version would be inconsistent -# without upgrading. Only works with HDFS versions >= -# 2.2.0. -# -# Please set your job time to be quite large when -# performing this upgrade. If your job times out and -# this process does not complete fully, it can leave -# HDFS in a bad state. -# -# Beware, once you upgrade it'll be difficult to rollback. -# -# "decommissionhdfsnodes" - decrease your HDFS over Lustre or HDFS -# over NetworkFS node size just as if you -# were on a cluster with local disk. Launch -# your job with the current present node -# size and set -# HADOOP_DECOMMISSION_HDFS_NODE_SIZE to the -# smaller node size to decommission into. -# Only works on Hadoop versions >= 2.3.0. -# -# Please set your job time to be quite large -# when performing this update. If your job -# times out and this process does not -# complete fully, it can leave HDFS in a bad -# state. -# -export HADOOP_JOB="terasort" - -# Tasks per Node -# -# If not specified, a reasonable estimate will be calculated based on -# number of CPUs on the system. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_MAX_TASKS_PER_NODE=8 - -# Default Map tasks for Job -# -# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE * compute -# nodes. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_MAP_TASKS=8 - -# Default Reduce tasks for Job -# -# If not specified, defaults to # compute nodes (i.e. 1 reducer per -# node) -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_REDUCE_TASKS=8 - -# Heap size for JVM -# -# Specified in M. If not specified, a reasonable estimate will be -# calculated based on total memory available and number of CPUs on the -# system. -# -# HADOOP_CHILD_MAP_HEAPSIZE and HADOOP_CHILD_REDUCE_HEAPSIZE are for -# Yarn -# -# If HADOOP_CHILD_MAP_HEAPSIZE is not specified, it is assumed to be -# HADOOP_CHILD_HEAPSIZE. -# -# If HADOOP_CHILD_REDUCE_HEAPSIZE is not specified, it is assumed to -# be 2X the HADOOP_CHILD_MAP_HEAPSIZE. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_CHILD_HEAPSIZE=2048 -# export HADOOP_CHILD_MAP_HEAPSIZE=2048 -# export HADOOP_CHILD_REDUCE_HEAPSIZE=4096 - -# Container Buffer -# -# Specify the amount of overhead each Yarn container will have over -# the heap size. Specified in M. If not specified, a reasonable -# estimate will be calculated based on total memory available. -# -# export HADOOP_CHILD_MAP_CONTAINER_BUFFER=256 -# export HADOOP_CHILD_REDUCE_CONTAINER_BUFFER=512 - -# Mapreduce Slowstart, indicating percent of maps that should complete -# before reducers begin. -# -# If not specified, defaults to 0.05 -# -# export HADOOP_MAPREDUCE_SLOWSTART=0.05 - -# Container Memory -# -# Memory on compute nodes for containers. Typically "nice-chunk" less -# than actual memory on machine, b/c machine needs memory for its own -# needs (kernel, daemons, etc.). Specified in megs. -# -# If not specified, a reasonable estimate will be calculated based on -# total memory on the system. -# -# export YARN_RESOURCE_MEMORY=32768 - -# Check Memory Limits -# -# Should physical and virtual memory limits be enforced for containers. -# This can be helpful in cases where the OS (Centos/Redhat) is aggressive -# at allocating virtual memory and causes the vmem-to-pmem ratio to be -# hit. Defaults to true -# -# export YARN_VMEM_CHECK="false" -# export YARN_PMEM_CHECK="false" - -# Compression -# -# Should compression of outputs and intermediate data be enabled. -# Specify yes or no. Defaults to no. -# -# Effectively, is time spend compressing data going to save you time -# on I/O. Sometimes yes, sometimes no. -# -# export HADOOP_COMPRESSION=yes - -# IO Sort Factors + MB -# -# The number of streams of files to sort while reducing and the memory -# amount to use while sorting. This is a quite advanced mechanism -# taking into account many factors. If not specified, some reasonable -# number will be calculated. -# -# export HADOOP_IO_SORT_FACTOR=10 -# export HADOOP_IO_SORT_MB=100 - -# Parallel Copies -# -# The default number of parallel transfers run by reduce during the -# copy(shuffle) phase. If not specified, some reasonable number will -# be calculated. -# export HADOOP_PARALLEL_COPIES=10 - -############################################################################ -# Hadoop Terasort Configurations -############################################################################ - -# Terasort size -# -# For "terasort" mode. -# -# Specify terasort size in units of 100. Specify 10000000000 for -# terabyte, for actual benchmarking -# -# Specify something small, for basic sanity tests. -# -# Defaults to 50000000. -# -# export HADOOP_TERASORT_SIZE=50000000 - -# Terasort map count -# -# For "terasort" mode during the teragen of data. -# -# If not specified, will be computed to a reasonable number given -# HADOOP_TERASORT_SIZE and the block size of the the filesyste you are -# using (e.g. for HDFS the HADOOP_HDFS_BLOCKSIZE) -# -# export HADOOP_TERAGEN_MAP_COUNT=4 - -# Terasort reducer count -# -# For "terasort" mode during the actual terasort of data. -# -# If not specified, will be compute node count * 2. -# -# export HADOOP_TERASORT_REDUCER_COUNT=4 - -# Terasort cache -# -# For "real benchmarking" you should flush page cache between a -# teragen and a terasort. You can disable this for sanity runs/tests -# to make things go faster. Specify yes or no. Defaults to yes. -# -# export HADOOP_TERASORT_CLEAR_CACHE=no - -# Terasort output replication count -# -# For "terasort" mode during the actual terasort of data -# -# In some circumstances, replication of the output from the terasort -# must be equal to the replication of data for the input. In other -# cases it can be less. The below can be adjusted to tweak for -# benchmarking purposes. -# -# If not specified, defaults to Terasort default, which is 1 in most -# versions of Hadoop -# -# export HADOOP_TERASORT_OUTPUT_REPLICATION=1 - -# Terachecksum -# -# For "terasort" mode after the teragen of data -# -# After executing the teragen, run terachecksum to calculate a checksum of -# the input. -# -# If both this and HADOOP_TERASORT_RUN_TERAVALIDATE are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERACHECKSUM=no - -# Teravalidate -# -# For "terasort" mode after the actual terasort of data -# -# After executing the sort, run teravalidate to validate the sorted data. -# -# If both this and HADOOP_TERASORT_RUN_TERACHECKSUM are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERAVALIDATE=no - -############################################################################ -# Hadoop Decommission HDFS Nodes Configurations -############################################################################ - -# Specify decommission node size for "decommissionhdfsnodes" mode -# -# For example, if your current HDFS node size is 16, your job size is -# likely 17 nodes (including the master). If you wish to decommission -# to 8 data nodes (job size of 9 nodes total), set this to 8. -# -# export HADOOP_DECOMMISSION_HDFS_NODE_SIZE=8 - -############################################################################ -# Hadoop Filesystem Mode Configurations -############################################################################ - -# Set how the filesystem should be setup -# -# "hdfs" - Normal straight up HDFS if you have local disk in your -# cluster. This option is primarily for benchmarking and -# caching, but probably shouldn't be used in the general case. -# -# Be careful running this in a cluster environment. The next -# time you execute your job, if a different set of nodes are -# allocated to you, the HDFS data you wrote from a previous -# job may not be there. Specifying specific nodes to use in -# your job submission (e.g. --nodelist in sbatch) may be a -# way to alleviate this. -# -# User must set HADOOP_HDFS_PATH below. -# -# "hdfsoverlustre" - HDFS over Lustre. See README for description. -# -# User must set HADOOP_HDFSOVERLUSTRE_PATH below. -# -# "hdfsovernetworkfs" - HDFS over Network FS. Identical to HDFS over -# Lustre, but filesystem agnostic. -# -# User must set HADOOP_HDFSOVERNETWORKFS_PATH below. -# -# "rawnetworkfs" - Use Hadoop RawLocalFileSystem (i.e. file: scheme), -# to use networked file system directly. It could be a -# Lustre mount or NFS mount. Whatever you please. -# -# User must set HADOOP_RAWNETWORKFS_PATH below. -# -export HADOOP_FILESYSTEM_MODE="hdfsoverlustre" - -# Local Filesystem BlockSize -# -# This configuration is the blocksize hadoop will use when doing I/O -# to a local filesystem. It is used by HDFS when reading from the -# underlying filesystem. It is also used with -# HADOOP_FILESYSTEM_MODE="rawnetworkfs". -# -# Commonly 33554432, 67108864, 134217728 (i.e. 32m, 64m, 128m) -# -# If not specified, defaults to 33554432 -# -# export HADOOP_LOCAL_FILESYSTEM_BLOCKSIZE=33554432 - -# HDFS Replication -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# HDFS commonly uses 3. When doing HDFS over Lustre/NetworkFS, higher -# replication can also help with resilience if nodes fail. You may -# wish to set this to < 3 to save space. -# -# If not specified, defaults to 3 -# -# export HADOOP_HDFS_REPLICATION=3 - -# HDFS Block Size -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# Commonly 134217728, 268435456, 536870912 (i.e. 128m, 256m, 512m) -# -# If not specified, defaults to 134217728 -# -# export HADOOP_HDFS_BLOCKSIZE=134217728 - -# Path for HDFS when using local disk -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data and HDFS. The first -# path will also store daemon data, such as namenode or jobtracker -# data. -# -export HADOOP_HDFS_PATH="/ssd/${USER}/hdfs" - -# HDFS cleanup -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# After your job has completed, if HADOOP_HDFS_PATH_CLEAR is set to -# yes, Magpie will do a rm -rf on HADOOP_HDFS_PATH. -# -# This is particularly useful when doing normal HDFS on local storage. -# On your next job run, you may not be able to get the nodes you want -# on your next run. So you may want to clean up your work before the -# next user uses the node. -# -# export HADOOP_HDFS_PATH_CLEAR="yes" - -# Lustre path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERLUSTRE_PATH="/lustre/${USER}/hdfsoverlustre/" - -# HDFS over Lustre ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERLUSTRE_REMOVE_LOCKS=yes - -# Networkfs path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERNETWORKFS_PATH="/networkfs/${USER}/hdfsovernetworkfs/" - -# HDFS over Networkfs ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERNETWORKFS_REMOVE_LOCKS=yes - -# Path for rawnetworkfs -# -# This is used with HADOOP_FILESYSTEM_MODE="rawnetworkfs" -# -export HADOOP_RAWNETWORKFS_PATH="/lustre/${USER}/rawnetworkfs/" - -# If you have a local SSD or NVRAM, performance may be better to store -# intermediate data on it rather than Lustre or some other networked -# filesystem. If the below environment variable is specified, local -# intermediate data will be stored in the specified directory. -# Otherwise it will go to an appropriate directory in Lustre/networked -# FS. -# -# Be wary, local SSDs/NVRAM stores may have less space than HDDs or -# networked file systems. It can be easy to run out of space. -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data. -# -# export HADOOP_LOCALSTORE="/ssd/${USER}/localstore/" - -# HADOOP_LOCALSTORE_CLEAR -# -# After your job has completed, if HADOOP_LOCALSTORE_CLEAR is set to -# yes, Magpie will do a rm -rf on all directories in -# HADOOP_LOCALSTORE. This is particularly useful if the localstore -# directory is on local storage and you want to clean up your work -# before the next user uses the node. -# -# export HADOOP_LOCALSTORE_CLEAR="yes" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=yes - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - -############################################################################ -# Run Job -############################################################################ - -# Set alternate mpirun options here -# MPIRUN_OPTIONS="-genvall -genv MV2_USE_APM 0" - -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-check-inputs -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-setup-core -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-setup-projects -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-setup-post -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-pre-run -if [ $? -ne 0 ] -then - exit 1 -fi -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-run -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-cleanup -mpirun $MPIRUN_OPTIONS $MAGPIE_SCRIPTS_HOME/magpie-post-run diff --git a/submission-scripts/script-sbatch-srun/magpie.sbatch-srun b/submission-scripts/script-sbatch-srun/magpie.sbatch-srun index 45b8a6f36..1b96b5d35 100644 --- a/submission-scripts/script-sbatch-srun/magpie.sbatch-srun +++ b/submission-scripts/script-sbatch-srun/magpie.sbatch-srun @@ -90,8 +90,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # # "pig" - Run a job according to the settings of PIG_JOB. # -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# # "spark" - Run a job according to the settings of SPARK_JOB. # # "kafka" - Run a job according to the settings of KAFKA_JOB. @@ -113,7 +111,6 @@ export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" # For Hbase, testall will run performanceeval # For Phoenix, testall will run performanceeval # For Pig, testall will run testpig -# For Mahout, testall will run clustersyntheticcontrol # For Spark, testall will run sparkpi # For Kafka, testall will run performance # For Zeppelin, testall will run checkzeppelinup @@ -946,60 +943,6 @@ export PIG_JOB="testpig" # # export PIG_OPTS="-Djava.io.tmpdir=${PIG_LOCAL_JOB_DIR}/tmp" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=no - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - ############################################################################ # Hbase Core Configurations ############################################################################ diff --git a/submission-scripts/script-sbatch-srun/magpie.sbatch-srun-hadoop-and-mahout b/submission-scripts/script-sbatch-srun/magpie.sbatch-srun-hadoop-and-mahout deleted file mode 100644 index 5bcbdc398..000000000 --- a/submission-scripts/script-sbatch-srun/magpie.sbatch-srun-hadoop-and-mahout +++ /dev/null @@ -1,913 +0,0 @@ -#!/bin/sh -############################################################################# -# Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC. -# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). -# Written by Albert Chu -# LLNL-CODE-644248 -# -# This file is part of Magpie, scripts for running Hadoop on -# traditional HPC systems. For details, see https://github.com/llnl/magpie. -# -# Magpie is free software; you can redistribute it and/or modify it -# under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# Magpie is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -# General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with Magpie. If not, see . -############################################################################# - -############################################################################ -# SLURM Customizations -############################################################################ - -# Node count. Node count should include one node for the -# head/management/master node. For example, if you want 8 compute -# nodes to process data, specify 9 nodes below. -# -# If including Zookeeper, include expected Zookeeper nodes. For -# example, if you want 8 Hadoop compute nodes and 3 Zookeeper nodes, -# specify 12 nodes (1 master, 8 Hadoop, 3 Zookeeper) -# -# Also take into account additional nodes needed for other services. -# -# Many of the below can be configured on the command line. If you are -# more comfortable specifying these on the command line, feel free to -# delete the customizations below. - -#SBATCH --nodes= -#SBATCH --output="slurm-%j.out" - -# Note defaults of MAGPIE_STARTUP_TIME & MAGPIE_SHUTDOWN_TIME, this -# timelimit should be a fair amount larger than them combined. -#SBATCH --time= - -# Job name. This will be used in naming directories for the job. -#SBATCH --job-name= - -# Partition to launch job in -#SBATCH --partition= - -## SLURM Values -# Generally speaking, don't touch the following, misc other configuration - -#SBATCH --ntasks-per-node=1 -#SBATCH --exclusive -#SBATCH --no-kill - -# Need to tell Magpie how you are submitting this job -export MAGPIE_SUBMISSION_TYPE="sbatchsrun" - - -############################################################################ -# Magpie Configurations -############################################################################ - -# Directory your launching scripts/files are stored -# -# Normally an NFS mount, someplace magpie can be reached on all nodes. -export MAGPIE_SCRIPTS_HOME="${HOME}/magpie" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie" - -# Magpie job type -# -# "hadoop" - Run a job according to the settings of HADOOP_JOB. -# -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# -# "testall" - Run a job that runs all basic sanity tests for all -# software that is configured to be setup. This is a good -# way to sanity check that everything has been setup -# correctly and the way you like. -# -# For Hadoop, testall will run terasort -# For Mahout, testall will run clustersyntheticcontrol -# -# "script" - Run arbitraty script, as specified by MAGPIE_JOB_SCRIPT. -# You can find example job scripts in examples/. -# -# "interactive" - manually interact with job run to submit jobs, -# peruse data (e.g. HDFS), move data, etc. See job -# output for instructions to access your job -# allocation. -# -# "setuponly" - do not launch any daemons or services, only setup -# configuration files. Useful for debugging or -# development. -# -export MAGPIE_JOB_TYPE="mahout" - -# Specify script and arguments to execute for "script" mode in -# MAGPIE_JOB_TYPE -# -# export MAGPIE_JOB_SCRIPT="${HOME}/my-job-script" - -# Specify script startup / shutdown time window -# -# Specifies the amount of time to give startup / shutdown activities a -# chance to succeed before Magpie will give up (or in the case of -# shutdown, when the resource manager/scheduler may kill the running -# job). Defaults to 30 minutes for startup, 30 minutes for shutdown. -# -# The startup time in particular may need to be increased if you have -# a large amount of data. As an example, HDFS may need to spend a -# significant amount of time determine all of the blocks in HDFS -# before leaving safemode. -# -# The stop time in particular may need to be increased if you have a -# large amount of cleanup to be done. HDFS will save its NameSpace -# before shutting down. Hbase will do a compaction before shutting -# down. -# -# The startup & shutdown window must together be smaller than the -# timelimit specified for the job. -# -# MAGPIE_STARTUP_TIME and MAGPIE_SHUTDOWN_TIME at minimum must be 5 -# minutes. If MAGPIE_POST_JOB_RUN is specified below, -# MAGPIE_SHUTDOWN_TIME must be at minimum 10 minutes. -# -# export MAGPIE_STARTUP_TIME=30 -# export MAGPIE_SHUTDOWN_TIME=30 - -# Magpie One Time Run -# -# Normally, Magpie assumes that when a user runs a job, data created -# and stored within that job will be desired to be accessed again. For -# example, data created and stored within HDFS will be accessed again. -# -# Under a number of scenarios, this may not be desired. For example -# during testing. -# -# To improve useability and performance, setting MAGPIE_ONE_TIME_RUN -# below to yes will have two effects on the Magpie job. -# -# 1) A number of data paths (such as for HDFS) will be put into unique -# paths for this job. Therefore, no other job should be able to -# access the data again. This is particularly useful if you wish -# to run performance tests with this job script over and over -# again. -# -# Magpie will not remove data that was written, so be sure to clean up -# your directories later. -# -# 2) In order to improve job throughout, Magpie will take shortcuts by -# not properly tearing down the job. As data corruption should not be -# a concern on job teardown, the job can complete more quickly. -# -# export MAGPIE_ONE_TIME_RUN=yes - -# Convenience Scripts -# -# Specify script to be executed to before / after your job. It is run -# on all nodes. -# -# Typically the pre-job script is used to set something up or get -# debugging info. It can also be used to determine if system -# conditions meet the expectations of your job. The primary job -# running script (magpie-run) will not be executed if the -# MAGPIE_PRE_JOB_RUN exits with a non-zero exit code. -# -# The post-job script is typically used for cleaning up something or -# gathering info (such as logs) for post-debugging/analysis. If it is -# set, MAGPIE_SHUTDOWN_TIME above must be > 5. -# -# See example magpie-example-pre-job-script and -# magpie-example-post-job-script for ideas of what you can do w/ these -# scripts -# -# Multiple scripts can be specified separated by comma. Arguments can -# be passed to scripts as well. -# -# A number of convenient scripts are available in the -# ${MAGPIE_SCRIPTS_HOME}/scripts directory. -# -# export MAGPIE_PRE_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" -# -# Similar to the MAGPIE_PRE_JOB_RUN and MAGPIE_POST_JOB_RUN, scripts can be -# run after the stack is setup but prior to the script or interactive mode -# begins. This enables frontends and other processes that depend on the stack -# to be started up and torn down. In similar fashion the cleanup will be done -# immediately after the script or interactive mode exits before the stack is -# shutdown. -# -# export MAGPIE_PRE_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script" -# export MAGPIE_POST_EXECUTE_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script" - -# Environment Variable Script -# -# When working with Magpie interactively by logging into the master -# node of your job allocation, many environment variables may need to -# be set. For example, environment variables for config file -# directories (e.g. HADOOP_CONF_DIR, HBASE_CONF_DIR, etc.) and home -# directories (e.g. HADOOP_HOME, HBASE_HOME, etc.) and more general -# environment variables (e.g. JAVA_HOME) may need to be set before you -# begin interacting with your big data setup. -# -# The standard job output from Magpie provides instructions on all the -# environment variables typically needed to interact with your job. -# However, this can be tedious if done by hand. -# -# If the environment variable specified below is set, Magpie will -# create the file and put into it every environment variable that -# would be useful when running your job interactively. That way, it -# can be sourced easily if you will be running your job interactively. -# It can also be loaded or used by other job scripts. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT="${HOME}/my-job-env" - -# Environment Variable Shell Type -# -# Magpie outputs environment variables in help output and -# MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT based on your SHELL environment -# variable. -# -# If you would like to output in a different shell type (perhaps you -# have programmed scripts in a different shell), specify that shell -# here. -# -# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT_SHELL="/bin/bash" - -# Remote Shell -# -# Magpie requires a passwordless remote shell command to launch -# necessary daemons across your job allocation. Magpie defaults to -# ssh, but it may be an alternate command in some environments. An -# alternate ssh-equivalent remote command can be specified by setting -# MAGPIE_REMOTE_CMD below. -# -# If using ssh, Magpie requires keys to be setup ahead of time so it -# can be executed without passwords. -# -# Specify options to the remote shell command if necessary. -# -# export MAGPIE_REMOTE_CMD="ssh" -# export MAGPIE_REMOTE_CMD_OPTS="" - -############################################################################ -# General Configuration -############################################################################ - -# Necessary for most projects -export JAVA_HOME="/usr/lib/jvm/jre-1.7.0/" - -############################################################################ -# Hadoop Core Configurations -############################################################################ - -# Should Hadoop be run -# -# Specify yes or no. Defaults to no. -# -export HADOOP_SETUP=yes - -# Set Hadoop Setup Type -# -# Will inform scripts on how to setup config files and what daemons to -# launch/setup. -# -# MR - Launch HDFS and Yarn -# YARN - Enable only Yarn -# HDFS - Enable only HDFS -# -# HDFS only may be useful when you want to use HDFS with other big -# data software, such as Hbase, and do not care for MapReduce or Yarn. -# It only works with HDFS based HADOOP_FILESYSTEM_MODE, such as -# "hdfs", "hdfsoverlustre", or "hdfsovernetworkfs". -# -# YARN only may be useful when you need Yarn setup for scheduling, but -# will not be using HDFS. For example, you may be reading from a -# networked file system directly. This option requires -# HADOOP_FILESYSTEM_MODE to 'rawnetworkfs'. -# -export HADOOP_SETUP_TYPE="MR" - -# Version -# -# Make sure the version for Mapreduce version 1 or 2 matches whatever -# you set in HADOOP_SETUP_TYPE -# -export HADOOP_VERSION="2.9.1" - -# Path to your Hadoop build/binaries -# -# Make sure the build for MapReduce or HDFS version 1 or 2 matches -# whatever you set in HADOOP_SETUP_TYPE. -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -export HADOOP_HOME="${HOME}/hadoop-${HADOOP_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -# This will not be used for storing intermediate files or -# distributed cache files. See HADOOP_LOCALSTORE above for that. -# -export HADOOP_LOCAL_DIR="/tmp/${USER}/hadoop" - -# Directory where alternate Hadoop configuration templates are stored -# -# If you wish to tweak the configuration files used by Magpie, set -# HADOOP_CONF_FILES below, copy configuration templates from -# $MAGPIE_SCRIPTS_HOME/conf/hadoop into HADOOP_CONF_FILES, and modify -# as you desire. Magpie will still use configuration files in -# $MAGPIE_SCRIPTS_HOME/conf/hadoop if any of the files it needs are -# not found in HADOOP_CONF_FILES. -# -# export HADOOP_CONF_FILES="${HOME}/myconf" - -# Daemon Heap Max -# -# Heap maximum for Hadoop daemons (i.e. Resource Manger, NodeManager, -# DataNode, History Server, etc.), specified in megs. Special case -# for Namenode, see below. -# -# If not specified, defaults to Hadoop default of 1000 -# -# May need to be increased if you are scaling large, get OutofMemory -# errors, or perhaps have a lot of cores on a node. -# -# export HADOOP_DAEMON_HEAP_MAX=2000 - -# Daemon Namenode Heap Max -# -# Heap maximum for Hadoop Namenode daemons specified in megs. -# -# If not specified, defaults to HADOOP_DAEMON_HEAP_MAX above. -# -# Unlike most Hadoop daemons, namenode may need more memory if there -# are a very large number of files in your HDFS setup. A general rule -# of thumb is a 1G heap for each 100T of data. -# -# export HADOOP_NAMENODE_DAEMON_HEAP_MAX=2000 - -# Environment Extra -# -# Specify extra environment information that should be passed into -# Hadoop. This file will simply be appended into the hadoop-env.sh -# and (if appropriate) yarn-env.sh. -# -# By default, a reasonable estimate for max user processes and open -# file descriptors will be calculated and put into hadoop-env.sh and -# (if appropriate) yarn-env.sh. However, it's always possible they may -# need to be set differently. Everyone's cluster/situation can be -# slightly different. -# -# See the example example-environment-extra extra for examples on -# what you can/should do with adding extra environment settings. -# -# export HADOOP_ENVIRONMENT_EXTRA_PATH="${HOME}/hadoop-my-environment" - -############################################################################ -# Hadoop Job/Run Configurations -############################################################################ - -# Set hadoop job for MAGPIE_JOB_TYPE = hadoop -# -# "terasort" - run terasort. Useful for making sure things are setup -# the way you like. -# -# There are additional configuration options for this -# listed below. -# -# "upgradehdfs" - upgrade your version of HDFS. Most notably this is -# used when you are switching to a newer Hadoop -# version and the HDFS version would be inconsistent -# without upgrading. Only works with HDFS versions >= -# 2.2.0. -# -# Please set your job time to be quite large when -# performing this upgrade. If your job times out and -# this process does not complete fully, it can leave -# HDFS in a bad state. -# -# Beware, once you upgrade it'll be difficult to rollback. -# -# "decommissionhdfsnodes" - decrease your HDFS over Lustre or HDFS -# over NetworkFS node size just as if you -# were on a cluster with local disk. Launch -# your job with the current present node -# size and set -# HADOOP_DECOMMISSION_HDFS_NODE_SIZE to the -# smaller node size to decommission into. -# Only works on Hadoop versions >= 2.3.0. -# -# Please set your job time to be quite large -# when performing this update. If your job -# times out and this process does not -# complete fully, it can leave HDFS in a bad -# state. -# -export HADOOP_JOB="terasort" - -# Tasks per Node -# -# If not specified, a reasonable estimate will be calculated based on -# number of CPUs on the system. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_MAX_TASKS_PER_NODE=8 - -# Default Map tasks for Job -# -# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE * compute -# nodes. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_MAP_TASKS=8 - -# Default Reduce tasks for Job -# -# If not specified, defaults to # compute nodes (i.e. 1 reducer per -# node) -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_DEFAULT_REDUCE_TASKS=8 - -# Heap size for JVM -# -# Specified in M. If not specified, a reasonable estimate will be -# calculated based on total memory available and number of CPUs on the -# system. -# -# HADOOP_CHILD_MAP_HEAPSIZE and HADOOP_CHILD_REDUCE_HEAPSIZE are for -# Yarn -# -# If HADOOP_CHILD_MAP_HEAPSIZE is not specified, it is assumed to be -# HADOOP_CHILD_HEAPSIZE. -# -# If HADOOP_CHILD_REDUCE_HEAPSIZE is not specified, it is assumed to -# be 2X the HADOOP_CHILD_MAP_HEAPSIZE. -# -# If running Hbase (or other Big Data software) with Hadoop MapReduce, -# be aware of the number of tasks and the amount of memory that may be -# needed by other software. -# -# export HADOOP_CHILD_HEAPSIZE=2048 -# export HADOOP_CHILD_MAP_HEAPSIZE=2048 -# export HADOOP_CHILD_REDUCE_HEAPSIZE=4096 - -# Container Buffer -# -# Specify the amount of overhead each Yarn container will have over -# the heap size. Specified in M. If not specified, a reasonable -# estimate will be calculated based on total memory available. -# -# export HADOOP_CHILD_MAP_CONTAINER_BUFFER=256 -# export HADOOP_CHILD_REDUCE_CONTAINER_BUFFER=512 - -# Mapreduce Slowstart, indicating percent of maps that should complete -# before reducers begin. -# -# If not specified, defaults to 0.05 -# -# export HADOOP_MAPREDUCE_SLOWSTART=0.05 - -# Container Memory -# -# Memory on compute nodes for containers. Typically "nice-chunk" less -# than actual memory on machine, b/c machine needs memory for its own -# needs (kernel, daemons, etc.). Specified in megs. -# -# If not specified, a reasonable estimate will be calculated based on -# total memory on the system. -# -# export YARN_RESOURCE_MEMORY=32768 - -# Check Memory Limits -# -# Should physical and virtual memory limits be enforced for containers. -# This can be helpful in cases where the OS (Centos/Redhat) is aggressive -# at allocating virtual memory and causes the vmem-to-pmem ratio to be -# hit. Defaults to true -# -# export YARN_VMEM_CHECK="false" -# export YARN_PMEM_CHECK="false" - -# Compression -# -# Should compression of outputs and intermediate data be enabled. -# Specify yes or no. Defaults to no. -# -# Effectively, is time spend compressing data going to save you time -# on I/O. Sometimes yes, sometimes no. -# -# export HADOOP_COMPRESSION=yes - -# IO Sort Factors + MB -# -# The number of streams of files to sort while reducing and the memory -# amount to use while sorting. This is a quite advanced mechanism -# taking into account many factors. If not specified, some reasonable -# number will be calculated. -# -# export HADOOP_IO_SORT_FACTOR=10 -# export HADOOP_IO_SORT_MB=100 - -# Parallel Copies -# -# The default number of parallel transfers run by reduce during the -# copy(shuffle) phase. If not specified, some reasonable number will -# be calculated. -# export HADOOP_PARALLEL_COPIES=10 - -############################################################################ -# Hadoop Terasort Configurations -############################################################################ - -# Terasort size -# -# For "terasort" mode. -# -# Specify terasort size in units of 100. Specify 10000000000 for -# terabyte, for actual benchmarking -# -# Specify something small, for basic sanity tests. -# -# Defaults to 50000000. -# -# export HADOOP_TERASORT_SIZE=50000000 - -# Terasort map count -# -# For "terasort" mode during the teragen of data. -# -# If not specified, will be computed to a reasonable number given -# HADOOP_TERASORT_SIZE and the block size of the the filesyste you are -# using (e.g. for HDFS the HADOOP_HDFS_BLOCKSIZE) -# -# export HADOOP_TERAGEN_MAP_COUNT=4 - -# Terasort reducer count -# -# For "terasort" mode during the actual terasort of data. -# -# If not specified, will be compute node count * 2. -# -# export HADOOP_TERASORT_REDUCER_COUNT=4 - -# Terasort cache -# -# For "real benchmarking" you should flush page cache between a -# teragen and a terasort. You can disable this for sanity runs/tests -# to make things go faster. Specify yes or no. Defaults to yes. -# -# export HADOOP_TERASORT_CLEAR_CACHE=no - -# Terasort output replication count -# -# For "terasort" mode during the actual terasort of data -# -# In some circumstances, replication of the output from the terasort -# must be equal to the replication of data for the input. In other -# cases it can be less. The below can be adjusted to tweak for -# benchmarking purposes. -# -# If not specified, defaults to Terasort default, which is 1 in most -# versions of Hadoop -# -# export HADOOP_TERASORT_OUTPUT_REPLICATION=1 - -# Terachecksum -# -# For "terasort" mode after the teragen of data -# -# After executing the teragen, run terachecksum to calculate a checksum of -# the input. -# -# If both this and HADOOP_TERASORT_RUN_TERAVALIDATE are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERACHECKSUM=no - -# Teravalidate -# -# For "terasort" mode after the actual terasort of data -# -# After executing the sort, run teravalidate to validate the sorted data. -# -# If both this and HADOOP_TERASORT_RUN_TERACHECKSUM are set, the -# checksums will be compared afterwards for equality. -# -# Defaults to no -# -# export HADOOP_TERASORT_RUN_TERAVALIDATE=no - -############################################################################ -# Hadoop Decommission HDFS Nodes Configurations -############################################################################ - -# Specify decommission node size for "decommissionhdfsnodes" mode -# -# For example, if your current HDFS node size is 16, your job size is -# likely 17 nodes (including the master). If you wish to decommission -# to 8 data nodes (job size of 9 nodes total), set this to 8. -# -# export HADOOP_DECOMMISSION_HDFS_NODE_SIZE=8 - -############################################################################ -# Hadoop Filesystem Mode Configurations -############################################################################ - -# Set how the filesystem should be setup -# -# "hdfs" - Normal straight up HDFS if you have local disk in your -# cluster. This option is primarily for benchmarking and -# caching, but probably shouldn't be used in the general case. -# -# Be careful running this in a cluster environment. The next -# time you execute your job, if a different set of nodes are -# allocated to you, the HDFS data you wrote from a previous -# job may not be there. Specifying specific nodes to use in -# your job submission (e.g. --nodelist in sbatch) may be a -# way to alleviate this. -# -# User must set HADOOP_HDFS_PATH below. -# -# "hdfsoverlustre" - HDFS over Lustre. See README for description. -# -# User must set HADOOP_HDFSOVERLUSTRE_PATH below. -# -# "hdfsovernetworkfs" - HDFS over Network FS. Identical to HDFS over -# Lustre, but filesystem agnostic. -# -# User must set HADOOP_HDFSOVERNETWORKFS_PATH below. -# -# "rawnetworkfs" - Use Hadoop RawLocalFileSystem (i.e. file: scheme), -# to use networked file system directly. It could be a -# Lustre mount or NFS mount. Whatever you please. -# -# User must set HADOOP_RAWNETWORKFS_PATH below. -# -export HADOOP_FILESYSTEM_MODE="hdfsoverlustre" - -# Local Filesystem BlockSize -# -# This configuration is the blocksize hadoop will use when doing I/O -# to a local filesystem. It is used by HDFS when reading from the -# underlying filesystem. It is also used with -# HADOOP_FILESYSTEM_MODE="rawnetworkfs". -# -# Commonly 33554432, 67108864, 134217728 (i.e. 32m, 64m, 128m) -# -# If not specified, defaults to 33554432 -# -# export HADOOP_LOCAL_FILESYSTEM_BLOCKSIZE=33554432 - -# HDFS Replication -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# HDFS commonly uses 3. When doing HDFS over Lustre/NetworkFS, higher -# replication can also help with resilience if nodes fail. You may -# wish to set this to < 3 to save space. -# -# If not specified, defaults to 3 -# -# export HADOOP_HDFS_REPLICATION=3 - -# HDFS Block Size -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre", -# and "hdfsovernetworkfs" -# -# Commonly 134217728, 268435456, 536870912 (i.e. 128m, 256m, 512m) -# -# If not specified, defaults to 134217728 -# -# export HADOOP_HDFS_BLOCKSIZE=134217728 - -# Path for HDFS when using local disk -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data and HDFS. The first -# path will also store daemon data, such as namenode or jobtracker -# data. -# -export HADOOP_HDFS_PATH="/ssd/${USER}/hdfs" - -# HDFS cleanup -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfs" -# -# After your job has completed, if HADOOP_HDFS_PATH_CLEAR is set to -# yes, Magpie will do a rm -rf on HADOOP_HDFS_PATH. -# -# This is particularly useful when doing normal HDFS on local storage. -# On your next job run, you may not be able to get the nodes you want -# on your next run. So you may want to clean up your work before the -# next user uses the node. -# -# export HADOOP_HDFS_PATH_CLEAR="yes" - -# Lustre path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERLUSTRE_PATH="/lustre/${USER}/hdfsoverlustre/" - -# HDFS over Lustre ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERLUSTRE_REMOVE_LOCKS=yes - -# Networkfs path to do Hadoop HDFS out of -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Note that different versions of Hadoop may not be compatible with -# your current HDFS data. If you're going to switch around to -# different versions, perhaps set different paths for different data. -# -export HADOOP_HDFSOVERNETWORKFS_PATH="/networkfs/${USER}/hdfsovernetworkfs/" - -# HDFS over Networkfs ignore lock -# -# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs" -# -# Cleanup in_use.lock files before launching HDFS -# -# On traditional Hadoop clusters, the in_use.lock file protects -# against a second HDFS daemon running on the same node. The lock -# file can similarly protect against a second HDFS daemon running on -# another node of your cluster (which is not desired, as both -# namenodes could change namenode data at the same time). -# -# However, sometimes the lock file may be there due to a prior job -# that failed and locks were not cleaned up on teardown. This may -# prohibit new HDFS daemons from running correctly. -# -# By default, if this option is not set, the lock file will be left in -# place and may cause HDFS daemons to not start. If set to yes, the -# lock files will be removed before starting HDFS. -# -# export HADOOP_HDFSOVERNETWORKFS_REMOVE_LOCKS=yes - -# Path for rawnetworkfs -# -# This is used with HADOOP_FILESYSTEM_MODE="rawnetworkfs" -# -export HADOOP_RAWNETWORKFS_PATH="/lustre/${USER}/rawnetworkfs/" - -# If you have a local SSD or NVRAM, performance may be better to store -# intermediate data on it rather than Lustre or some other networked -# filesystem. If the below environment variable is specified, local -# intermediate data will be stored in the specified directory. -# Otherwise it will go to an appropriate directory in Lustre/networked -# FS. -# -# Be wary, local SSDs/NVRAM stores may have less space than HDDs or -# networked file systems. It can be easy to run out of space. -# -# If you want to specify multiple paths (such as multiple drives), -# make them comma separated (e.g. /dir1,/dir2,/dir3). The multiple -# paths will be used for local intermediate data. -# -# export HADOOP_LOCALSTORE="/ssd/${USER}/localstore/" - -# HADOOP_LOCALSTORE_CLEAR -# -# After your job has completed, if HADOOP_LOCALSTORE_CLEAR is set to -# yes, Magpie will do a rm -rf on all directories in -# HADOOP_LOCALSTORE. This is particularly useful if the localstore -# directory is on local storage and you want to clean up your work -# before the next user uses the node. -# -# export HADOOP_LOCALSTORE_CLEAR="yes" -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=yes - -# Mahout Version -# -export MAHOUT_VERSION="0.13.0" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="${HOME}/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="/tmp/${USER}/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - -############################################################################ -# Run Job -############################################################################ - -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-check-inputs -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-core -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-projects -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-post -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-pre-run -if [ $? -ne 0 ] -then - exit 1 -fi -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-run -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-cleanup -srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-post-run diff --git a/submission-scripts/script-templates/Makefile b/submission-scripts/script-templates/Makefile index d4e6ec0e4..ff41e8a63 100644 --- a/submission-scripts/script-templates/Makefile +++ b/submission-scripts/script-templates/Makefile @@ -10,7 +10,6 @@ # HADOOP_INCLUDE=y PIG_INCLUDE=y -MAHOUT_INCLUDE=y HBASE_INCLUDE=y HIVE_INCLUDE=y PHOENIX_INCLUDE=y @@ -49,7 +48,6 @@ POSTGRES_DIR_PREFIX=$(PROJECT_DIR_PREFIX) TESTBENCH_DIR_PREFIX=${PROJECT_DIR_PREFIX} TEZ_DIR_PREFIX=${PROJECT_DIR_PREFIX} KAFKA_DIR_PREFIX=$(PROJECT_DIR_PREFIX) -MAHOUT_DIR_PREFIX=$(PROJECT_DIR_PREFIX) PHOENIX_DIR_PREFIX=$(PROJECT_DIR_PREFIX) PIG_DIR_PREFIX=$(PROJECT_DIR_PREFIX) SPARK_DIR_PREFIX=$(PROJECT_DIR_PREFIX) @@ -64,7 +62,6 @@ REMOTE_CMD_DEFAULT=ssh JAVA_DEFAULT=/usr/lib/jvm/jre-1.8.0/ HADOOP_JAVA_DEFAULT=/usr/lib/jvm/jre-1.8.0/ PIG_JAVA_DEFAULT=/usr/lib/jvm/jre-1.7.0/ -MAHOUT_JAVA_DEFAULT=/usr/lib/jvm/jre-1.7.0/ HBASE_JAVA_DEFAULT=/usr/lib/jvm/jre-1.7.0/ PHOENIX_JAVA_DEFAULT=/usr/lib/jvm/jre-1.7.0/ SPARK_JAVA_DEFAULT=/usr/lib/jvm/jre-1.8.0/ @@ -92,8 +89,6 @@ HIVE_DEFAULT_DB_NAME="hive_default_db" HADOOP_VERSION_DEFAULT=3.3.1 PIG_VERSION_DEFAULT=0.17.0 PIG_HADOOP_VERSION_DEFAULT=2.9.1 -MAHOUT_VERSION_DEFAULT=0.13.0 -MAHOUT_HADOOP_VERSION_DEFAULT=2.9.1 HBASE_VERSION_DEFAULT=1.6.0 HBASE_HADOOP_VERSION_DEFAULT=2.9.1 HBASE_ZOOKEEPER_VERSION_DEFAULT=3.4.14 @@ -189,7 +184,6 @@ define common-substitution sed -i -e 's;TEZDIRPREFIX;${TEZ_DIR_PREFIX};g' $(1) sed -i -e 's;TESTBENCHDIRPREFIX;$(TESTBENCH_DIR_PREFIX);g' $(1) sed -i -e 's;KAFKADIRPREFIX;$(KAFKA_DIR_PREFIX);g' $(1) - sed -i -e 's;MAHOUTDIRPREFIX;$(MAHOUT_DIR_PREFIX);g' $(1) sed -i -e 's;PHOENIXDIRPREFIX;$(PHOENIX_DIR_PREFIX);g' $(1) sed -i -e 's;PIGDIRPREFIX;$(PIG_DIR_PREFIX);g' $(1) sed -i -e 's;SPARKDIRPREFIX;$(SPARK_DIR_PREFIX);g' $(1) @@ -201,7 +195,6 @@ define common-substitution sed -i -e 's;HADOOPVERSIONDEFAULT;$(HADOOP_VERSION_DEFAULT);g' $(1) sed -i -e 's;PIGVERSIONDEFAULT;$(PIG_VERSION_DEFAULT);g' $(1) - sed -i -e 's;MAHOUTVERSIONDEFAULT;$(MAHOUT_VERSION_DEFAULT);g' $(1) sed -i -e 's;HBASEVERSIONDEFAULT;$(HBASE_VERSION_DEFAULT);g' $(1) sed -i -e 's;HIVEVERSIONDEFAULT;$(HIVE_VERSION_DEFAULT);g' $(1) sed -i -e 's;HIVETEZVERSIONDEFAULT;$(HIVE_TEZ_VERSION_DEFAULT);g' $(1) @@ -244,7 +237,7 @@ endef # Perhaps there is a niftier way to do this in sed than what I'm doing, glad # to take suggestions. I couldn't figure out a way to do it in 1 pass. define common-addition - for project in hadoop hbase hive phoenix pig mahout spark kafka zeppelin storm zookeeper tensorflow tensorflow-horovod ray alluxio; do \ + for project in hadoop hbase hive phoenix pig spark kafka zeppelin storm zookeeper tensorflow tensorflow-horovod ray alluxio; do \ if echo $(2) | grep -q $$project; then \ sed -i -e "/@MAGPIE_JOB_TYPES@/a @MAGPIE_JOB_TYPES_TEMP@/" $(1); \ sed -i -e "/@MAGPIE_JOB_TYPES@/{r magpie-magpie-customizations-job-$$project" -e "}" $(1); \ @@ -328,7 +321,6 @@ define create-bigdata-templates $(eval MAGPIE_HADOOP := ../script-$(SCHED)/magpie.$(SCHED)-hadoop) $(eval MAGPIE_HADOOP_AND_PIG := ../script-$(SCHED)/magpie.$(SCHED)-hadoop-and-pig) $(eval MAGPIE_HADOOP_AND_HIVE := ../script-$(SCHED)/magpie.$(SCHED)-hadoop-and-hive) - $(eval MAGPIE_HADOOP_AND_MAHOUT := ../script-$(SCHED)/magpie.$(SCHED)-hadoop-and-mahout) $(eval MAGPIE_HBASE_WITH_HDFS := ../script-$(SCHED)/magpie.$(SCHED)-hbase-with-hdfs) $(eval MAGPIE_HBASE_WITH_HDFS_WITH_PHOENIX := ../script-$(SCHED)/magpie.$(SCHED)-hbase-with-hdfs-with-phoenix) $(eval MAGPIE_SPARK := ../script-$(SCHED)/magpie.$(SCHED)-spark) @@ -344,7 +336,6 @@ define create-bigdata-templates $(if $(findstring "${HADOOP_INCLUDE}", "y"), $(call create-hadoop)) $(if $(findstring "${HADOOP_INCLUDE}${PIG_INCLUDE}", "yy"), $(call create-hadoop-and-pig)) $(if $(findstring "${HADOOP_INCLUDE}${HIVE_INCLUDE}", "yy"), $(call create-hadoop-and-hive)) - $(if $(findstring "${HADOOP_INCLUDE}${MAHOUT_INCLUDE}", "yy"), $(call create-hadoop-and-mahout)) $(if $(findstring "${HADOOP_INCLUDE}${HBASE_INCLUDE}${ZOOKEEPER_INCLUDE}", "yyy"), $(call create-hbase-with-hdfs)) $(if $(findstring "${HADOOP_INCLUDE}${HBASE_INCLUDE}${PHOENIX_INCLUDE}${ZOOKEEPER_INCLUDE}", "yyyy"), $(call create-hbase-with-hdfs-with-phoenix)) $(if $(findstring "${SPARK_INCLUDE}", "y"), $(call create-spark)) @@ -470,10 +461,6 @@ define create-all cat magpie-pig >> $(MAGPIE); \ fi - if test "${MAHOUT_INCLUDE}" = "y"; then \ - cat magpie-mahout >> $(MAGPIE); \ - fi - if test "${HBASE_INCLUDE}" = "y"; then \ cat magpie-hbase >> $(MAGPIE); \ fi @@ -518,7 +505,6 @@ define create-all $(call common-addition, ${MAGPIE}, $(if $(findstring "${HIVE_INCLUDE}", "y"), "hive")) $(call common-addition, ${MAGPIE}, $(if $(findstring "${PHOENIX_INCLUDE}", "y"), "phoenix")) $(call common-addition, ${MAGPIE}, $(if $(findstring "${PIG_INCLUDE}", "y"), "pig")) - $(call common-addition, ${MAGPIE}, $(if $(findstring "${MAHOUT_INCLUDE}", "y"), "mahout")) $(call common-addition, ${MAGPIE}, $(if $(findstring "${SPARK_INCLUDE}", "y"), "spark")) $(call common-addition, ${MAGPIE}, $(if $(findstring "${KAFKA_INCLUDE}", "y"), "kafka")) $(call common-addition, ${MAGPIE}, $(if $(findstring "${ZEPPELIN_INCLUDE}", "y"), "zeppelin")) @@ -599,30 +585,6 @@ echo "Creating magpie.$(SCHED)-hadoop-and-hive" sed -i -e "s/ZOOKEEPER_SETUP=.*/ZOOKEEPER_SETUP=yes/" $(MAGPIE_HADOOP_AND_HIVE) endef -define create-hadoop-and-mahout - echo "Creating magpie.$(SCHED)-hadoop-and-mahout" - cat magpie-shebang \ - magpie-header-llnl \ - magpie-config-$(SCHED)-header \ - magpie-config-bigdata-instructions \ - magpie-config-$(SCHED)-master-worker \ - magpie-magpie-customizations-substitution-bigdata \ - magpie-general-configuration-header \ - magpie-general-configuration-java \ - magpie-hadoop-core \ - magpie-hadoop-job \ - magpie-hadoop-filesystem-substitution \ - magpie-mahout \ - magpie-run-job-header-substitution \ - magpie-run-job-$(DIST)-master-worker > $(MAGPIE_HADOOP_AND_MAHOUT) - $(call common-substitution, ${MAGPIE_HADOOP_AND_MAHOUT},${MAHOUT_JAVA_DEFAULT}) - $(call common-additions, ${MAGPIE_HADOOP_AND_MAHOUT}, hadoop mahout) - sed -i -e "s/MAGPIE_JOB_TYPE=\"\(.*\)\"/MAGPIE_JOB_TYPE=\"mahout\"/" $(MAGPIE_HADOOP_AND_MAHOUT) - sed -i -e "s/HADOOP_SETUP=.*/HADOOP_SETUP=yes/" $(MAGPIE_HADOOP_AND_MAHOUT) - sed -i -e "s/HADOOP_VERSION=\"\(.*\)\"/HADOOP_VERSION=\"$(MAHOUT_HADOOP_VERSION_DEFAULT)\"/" $(MAGPIE_HADOOP_AND_MAHOUT) - sed -i -e "s/MAHOUT_SETUP=.*/MAHOUT_SETUP=yes/" $(MAGPIE_HADOOP_AND_MAHOUT) -endef - define create-hbase-with-hdfs echo "Creating magpie.$(SCHED)-hbase-with-hdfs" cat magpie-shebang \ diff --git a/submission-scripts/script-templates/magpie-magpie-customizations-job-mahout b/submission-scripts/script-templates/magpie-magpie-customizations-job-mahout deleted file mode 100644 index 03053dd75..000000000 --- a/submission-scripts/script-templates/magpie-magpie-customizations-job-mahout +++ /dev/null @@ -1,2 +0,0 @@ -# "mahout" - Run a job according to the settings of MAHOUT_JOB. -# diff --git a/submission-scripts/script-templates/magpie-magpie-customizations-testall-mahout b/submission-scripts/script-templates/magpie-magpie-customizations-testall-mahout deleted file mode 100644 index 8e6df1ea6..000000000 --- a/submission-scripts/script-templates/magpie-magpie-customizations-testall-mahout +++ /dev/null @@ -1 +0,0 @@ -# For Mahout, testall will run clustersyntheticcontrol diff --git a/submission-scripts/script-templates/magpie-mahout b/submission-scripts/script-templates/magpie-mahout deleted file mode 100644 index 6cdf551a9..000000000 --- a/submission-scripts/script-templates/magpie-mahout +++ /dev/null @@ -1,54 +0,0 @@ -############################################################################ -# Mahout Configurations -############################################################################ - -# Should Mahout be setup -# -# Specify yes or no. Defaults to no. -# -# Note that unlike Hadoop or Zookeeper, Mahout does not need to be -# enabled/disabled to be run with Hadoop. For example, no daemons are setup. -# -# If MAHOUT_SETUP is enabled, this will inform Magpie to setup -# environment variables that will hopefully make it easier to run -# Mahout w/ Hadoop. You could leave this disabled and setup/config -# Mahout as you need. -# -export MAHOUT_SETUP=no - -# Mahout Version -# -export MAHOUT_VERSION="MAHOUTVERSIONDEFAULT" - -# Path to your Mahout build/binaries -# -# This should be accessible on all nodes in your allocation. Typically -# this is in an NFS mount. -# -# Ensure the build matches the Hadoop version this will run against. -# -export MAHOUT_HOME="MAHOUTDIRPREFIX/apache-mahout-distribution-${MAHOUT_VERSION}" - -# Path to store data local to each cluster node, typically something -# in /tmp. This will store local conf files and log files for your -# job. If local scratch space is not available, consider using the -# MAGPIE_NO_LOCAL_DIR option. See README for more details. -# -export MAHOUT_LOCAL_DIR="LOCALDIRPREFIX/mahout" - -# Set how Mahout should run -# -# "clustersyntheticcontrol" - Run the Mahout -# cluster-syntheticcontrol.sh example. An -# internet connection outside of your -# cluster is required, as data will be -# downloaded for the test. -# -export MAHOUT_JOB="clustersyntheticcontrol" - -# Mahout Opts -# -# Extra Java runtime options -# -# export MAHOUT_OPTS="-Djava.io.tmpdir=${MAHOUT_LOCAL_JOB_DIR}/tmp" - diff --git a/testsuite/test-common.sh b/testsuite/test-common.sh index 68e50d88d..6c2c5d72e 100755 --- a/testsuite/test-common.sh +++ b/testsuite/test-common.sh @@ -18,13 +18,6 @@ hadoop3Xjava18versions_javaversion=${java18} hadoop_test_groups="hadoop2Xjava16versions hadoop2Xjava17versions hadoop2Xjava18versions hadoop3Xjava18versions" hadoop_all_versions="${hadoop2Xjava16versions} ${hadoop2Xjava17versions} ${hadoop2Xjava18versions} ${hadoop3Xjava18versions}" -mahouthadoop27java17versions="0.11.0 0.11.1 0.11.2 0.12.0 0.12.1 0.12.2 0.13.0" -mahouthadoop27java17versions_hadoopversion="2.7.0" -mahouthadoop27java17versions_javaversion=${java17} - -mahout_test_groups="mahouthadoop27java17versions" -mahout_all_versions="${mahouthadoop27java17versions}" - pighadoop26java16versions="0.13.0 0.14.0" pighadoop26java16versions_hadoopversion="2.6.0" pighadoop26java16versions_javaversion=${java16} diff --git a/testsuite/test-config.sh b/testsuite/test-config.sh index b6a9256cf..339eb29ed 100644 --- a/testsuite/test-config.sh +++ b/testsuite/test-config.sh @@ -58,7 +58,6 @@ PROJECT_DIR_PATH="/usr/workspace/wsa/achu/bigdata/" # HBASE_DIR_PATH="" # HIVE_DIR_PATH="" # KAFKA_DIR_PATH="" -# MAHOUT_DIR_PATH="" # PHOENIX_DIR_PATH="" # PIG_DIR_PATH="" # SPARK_DIR_PATH="" diff --git a/testsuite/test-download-projects.sh b/testsuite/test-download-projects.sh index d1ae1056b..a7407d682 100755 --- a/testsuite/test-download-projects.sh +++ b/testsuite/test-download-projects.sh @@ -16,7 +16,6 @@ source ../magpie/lib/magpie-lib-helper HADOOP_DOWNLOAD=n PIG_DOWNLOAD=n -MAHOUT_DOWNLOAD=n HBASE_DOWNLOAD=n PHOENIX_DOWNLOAD=n SPARK_DOWNLOAD=n @@ -130,21 +129,6 @@ then done fi -if [ "${MAHOUT_DOWNLOAD}" == "y" ] -then - for mahoutversion in ${mahout_all_versions} - do - MAHOUT_PACKAGE="${mahoutversion}/apache-mahout-distribution-${mahoutversion}.tar.gz" - MAHOUT_DOWNLOAD_URL="${APACHE_ARCHIVE_URL_BASE}/mahout/${MAHOUT_PACKAGE}" - - __download_package ${MAHOUT_PACKAGE} ${MAHOUT_DOWNLOAD_URL} - - MAHOUT_PACKAGE_BASEDIR=$(echo `basename ${MAHOUT_PACKAGE}` | sed 's/\(.*\)\.\(.*\)\.\(.*\)/\1/g') - __apply_patches_if_exist ${MAHOUT_PACKAGE_BASEDIR} \ - ${MAGPIE_SCRIPTS_HOME}/patches/mahout/${MAHOUT_PACKAGE_BASEDIR}.patch - done -fi - if [ "${HBASE_DOWNLOAD}" == "y" ] then for hbaseversion in ${hbase_all_versions} diff --git a/testsuite/test-generate-cornercase.sh b/testsuite/test-generate-cornercase.sh index 6b337292a..e76173851 100755 --- a/testsuite/test-generate-cornercase.sh +++ b/testsuite/test-generate-cornercase.sh @@ -10,11 +10,6 @@ __GenerateCornerCaseTests_CatchProjectDependencies() { sed -i -e 's/export HADOOP_SETUP=\(.*\)/export HADOOP_SETUP=no/' magpie.${submissiontype}-hadoop-and-pig-cornercase-catchprojectdependency-hadoop fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-catchprojectdependency-hadoop - sed -i -e 's/export HADOOP_SETUP=\(.*\)/export HADOOP_SETUP=no/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-catchprojectdependency-hadoop - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-catchprojectdependency-hadoop cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-catchprojectdependency-zookeeper @@ -89,10 +84,6 @@ __GenerateCornerCaseTests_NoSetJava() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-cornercase-nosetjava fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetjava - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetjava fi @@ -142,10 +133,6 @@ __GenerateCornerCaseTests_BadSetJava() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-cornercase-badsetjava fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badsetjava - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badsetjava fi @@ -237,11 +224,6 @@ __GenerateCornerCaseTests_NoSetVersion() { sed -i -e 's/export PIG_VERSION/# export PIG_VERSION/' magpie.${submissiontype}-hadoop-and-pig-cornercase-nosetversion fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetversion - sed -i -e 's/export MAHOUT_VERSION/# export MAHOUT_VERSION/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetversion - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetversion sed -i -e 's/export HBASE_VERSION/# export HBASE_VERSION/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetversion @@ -339,11 +321,6 @@ __GenerateCornerCaseTests_BadVersion() { sed -i -e 's/export PIG_VERSION="\(.*\)"/export PIG_VERSION="2.2"/' magpie.${submissiontype}-hadoop-and-pig-cornercase-badversion fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badversion - sed -i -e 's/export MAHOUT_VERSION="\(.*\)"/export MAHOUT_VERSION="2.2"/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-badversion - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badversion sed -i -e 's/export HBASE_VERSION="\(.*\)"/export HBASE_VERSION="2.2"/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-badversion @@ -406,11 +383,6 @@ __GenerateCornerCaseTests_NoSetHome() { sed -i -e 's/export PIG_HOME/# export PIG_HOME/' magpie.${submissiontype}-hadoop-and-pig-cornercase-nosethome fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosethome - sed -i -e 's/export MAHOUT_HOME/# export MAHOUT_HOME/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosethome - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosethome sed -i -e 's/export HBASE_HOME/# export HBASE_HOME/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosethome @@ -478,11 +450,6 @@ __GenerateCornerCaseTests_BadSetHome() { sed -i -e 's/export PIG_HOME="\(.*\)"/export PIG_HOME="\/FOO\/BAR\/BAZ"/' magpie.${submissiontype}-hadoop-and-pig-cornercase-badsethome fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badsethome - sed -i -e 's/export MAHOUT_HOME="\(.*\)"/export MAHOUT_HOME="\/FOO\/BAR\/BAZ"/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-badsethome - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badsethome sed -i -e 's/export HBASE_HOME="\(.*\)"/export HBASE_HOME="\/FOO\/BAR\/BAZ"/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-badsethome @@ -550,11 +517,6 @@ __GenerateCornerCaseTests_NoSetLocalDir() { sed -i -e 's/export PIG_LOCAL_DIR/# export PIG_LOCAL_DIR/' magpie.${submissiontype}-hadoop-and-pig-cornercase-nosetlocaldir fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetlocaldir - sed -i -e 's/export MAHOUT_LOCAL_DIR/# export MAHOUT_LOCAL_DIR/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetlocaldir - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetlocaldir sed -i -e 's/export HBASE_LOCAL_DIR/# export HBASE_LOCAL_DIR/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetlocaldir @@ -622,11 +584,6 @@ __GenerateCornerCaseTests_BadSetLocalDir() { sed -i -e 's/export PIG_LOCAL_DIR="\(.*\)"/export PIG_LOCAL_DIR="\/FOO\/BAR\/BAZ"/' magpie.${submissiontype}-hadoop-and-pig-cornercase-badlocaldir fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badlocaldir - sed -i -e 's/export MAHOUT_LOCAL_DIR="\(.*\)"/export MAHOUT_LOCAL_DIR="\/FOO\/BAR\/BAZ"/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-badlocaldir - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badlocaldir sed -i -e 's/export HBASE_LOCAL_DIR="\(.*\)"/export HBASE_LOCAL_DIR="\/FOO\/BAR\/BAZ"/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-badlocaldir @@ -820,10 +777,6 @@ __GenerateCornerCaseTests_BadJobTime() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-cornercase-badjobtime fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badjobtime fi @@ -897,14 +850,6 @@ __GenerateCornerCaseTests_BadJobTime() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-cornercase-badjobtime-sbatchsrun-days-hours-minutes-seconds fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-minutes-seconds - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-hours-minutes-seconds - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-days-hours - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-days-hours-minutes - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-days-hours-minutes-seconds - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badjobtime-sbatchsrun-minutes-seconds cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badjobtime-sbatchsrun-hours-minutes-seconds @@ -1026,10 +971,6 @@ __GenerateCornerCaseTests_BadStartupTime() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-cornercase-badstartuptime fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badstartuptime - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badstartuptime fi @@ -1092,10 +1033,6 @@ __GenerateCornerCaseTests_BadShutdownTime() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-cornercase-badshutdowntime fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badshutdowntime - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badshutdowntime fi @@ -1157,10 +1094,6 @@ __GenerateCornerCaseTests_BadNodeCount() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-cornercase-badnodecount-small fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badnodecount-small - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badnodecount-small cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badnodecount-big @@ -1234,11 +1167,6 @@ __GenerateCornerCaseTests_NoCoreSettings() { sed -i -e 's/export PIG_JOB/# export PIG_JOB/' magpie.${submissiontype}-hadoop-and-pig-cornercase-nocoresettings fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-nocoresettings - sed -i -e 's/export MAHOUT_JOB/# export MAHOUT_JOB/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-nocoresettings - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-nocoresettings sed -i -e 's/export HBASE_JOB/# export HBASE_JOB/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-nocoresettings @@ -1352,11 +1280,6 @@ __GenerateCornerCaseTests_BadCoreSettings() { sed -i -e 's/export PIG_JOB="\(.*\)"/export PIG_JOB="foobar"/' magpie.${submissiontype}-hadoop-and-pig-cornercase-badcoresettings fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-badcoresettings - sed -i -e 's/export MAHOUT_JOB="\(.*\)"/export MAHOUT_JOB="foobar"/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-badcoresettings - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-badcoresettings sed -i -e 's/export HBASE_JOB="\(.*\)"/export HBASE_JOB="foobar"/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-badcoresettings @@ -1453,11 +1376,6 @@ __GenerateCornerCaseTests_RequireHDFS() { sed -i -e 's/export HADOOP_FILESYSTEM_MODE="\(.*\)"/export HADOOP_FILESYSTEM_MODE="rawnetworkfs"/' magpie.${submissiontype}-hadoop-cornercase-requirehdfs-3 fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-requirehdfs - sed -i -e 's/export HADOOP_FILESYSTEM_MODE="\(.*\)"/export HADOOP_FILESYSTEM_MODE="rawnetworkfs"/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-requirehdfs - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-requirehdfs sed -i -e 's/export HADOOP_FILESYSTEM_MODE="\(.*\)"/export HADOOP_FILESYSTEM_MODE="rawnetworkfs"/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-requirehdfs @@ -1538,11 +1456,6 @@ __GenerateCornerCaseTests_RequireYarn() { sed -i -e 's/export HADOOP_SETUP_TYPE="\(.*\)"/export HADOOP_SETUP_TYPE="HDFS"/' magpie.${submissiontype}-hadoop-and-pig-cornercase-requireyarn fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-requireyarn - sed -i -e 's/export HADOOP_SETUP_TYPE="\(.*\)"/export HADOOP_SETUP_TYPE="HDFS"/' magpie.${submissiontype}-hadoop-and-mahout-cornercase-requireyarn - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-requireyarn-1 sed -i -e 's/export HADOOP_SETUP_TYPE="\(.*\)"/export HADOOP_SETUP_TYPE="HDFS"/' magpie.${submissiontype}-hbase-with-hdfs-cornercase-requireyarn-1 @@ -1740,11 +1653,6 @@ __GenerateCornerCaseTests_NoLongerSupported() { sed -i '/# Run Job/a export PIG_MODE="foobar"' magpie.${submissiontype}-hadoop-and-pig-cornercase-nolongersupported fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-cornercase-nolongersupported - sed -i '/# Run Job/a export MAHOUT_MODE="foobar"' magpie.${submissiontype}-hadoop-and-mahout-cornercase-nolongersupported - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-cornercase-nolongersupported sed -i '/# Run Job/a export HBASE_MODE="foobar"' magpie.${submissiontype}-hbase-with-hdfs-cornercase-nolongersupported diff --git a/testsuite/test-generate-default.sh b/testsuite/test-generate-default.sh index d916ffeb9..2393b87cb 100755 --- a/testsuite/test-generate-default.sh +++ b/testsuite/test-generate-default.sh @@ -21,10 +21,6 @@ GenerateDefaultStandardTests() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-default-run-testpig cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-default-run-testpig-no-local-dir fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-default-run-clustersyntheticcontrol - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-default-run-clustersyntheticcontrol-no-local-dir - fi if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-default-run-hbaseperformanceeval cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-default-run-hbaseperformanceeval-no-local-dir diff --git a/testsuite/test-generate-functionality.sh b/testsuite/test-generate-functionality.sh index d73569df9..03a01518c 100755 --- a/testsuite/test-generate-functionality.sh +++ b/testsuite/test-generate-functionality.sh @@ -15,11 +15,6 @@ __GenerateFunctionalityTests_BadJobNames() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-job-name-dollarsign fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-job-name-whitespace - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-job-name-dollarsign - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-job-name-whitespace cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-job-name-dollarsign @@ -103,14 +98,6 @@ __GenerateFunctionalityTests_AltJobTimes() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-altjobtime-sbatchsrun-days-hours-minutes-seconds fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-minutes-seconds - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-hours-minutes-seconds - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-days-hours - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-days-hours-minutes - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-days-hours-minutes-seconds - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altjobtime-sbatchsrun-minutes-seconds cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altjobtime-sbatchsrun-hours-minutes-seconds @@ -226,11 +213,6 @@ __GenerateFunctionalityTests_AltConfFilesDir() { sed -i -e 's/# export PIG_CONF_FILES="\(.*\)"/export PIG_CONF_FILES="'"${magpiescriptshomesubst}"'\/conf\/"/' magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-altconffilesdir fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altconffilesdir - sed -i -e 's/# export HADOOP_CONF_FILES="\(.*\)"/export HADOOP_CONF_FILES="'"${magpiescriptshomesubst}"'\/conf\/"/' magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altconffilesdir - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altconffilesdir sed -i -e 's/# export HADOOP_CONF_FILES="\(.*\)"/export HADOOP_CONF_FILES="'"${magpiescriptshomesubst}"'\/conf\/"/' magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altconffilesdir @@ -305,10 +287,6 @@ __GenerateFunctionalityTests_TestAll() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-run-hadoopterasort-run-testpig-functionality-testall fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-hadoopterasort-run-clustersyntheticcontrol-functionality-testall - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-run-zookeeperruok-functionality-testall fi @@ -360,10 +338,6 @@ __GenerateFunctionalityTests_InteractiveMode() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-functionality-interactive-mode fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-interactive-mode - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-interactive-mode fi @@ -414,10 +388,6 @@ __GenerateFunctionalityTests_Setuponlymode() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-functionality-setuponly-mode fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-setuponly-mode - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-setuponly-mode fi @@ -472,10 +442,6 @@ __GenerateFunctionalityTests_JobTimeout() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-functionality-jobtimeout fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-jobtimeout - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-jobtimeout fi @@ -535,12 +501,6 @@ __GenerateFunctionalityTests_MagpieExports() { sed -i -e "s/FILENAMESEARCHREPLACEKEY/hdfs-FILENAMESEARCHREPLACEKEY/" magpie.${submissiontype}-hadoop-and-pig-functionality-checkexports sed -i -e "s/FILENAMESEARCHREPLACEKEY/pig-FILENAMESEARCHREPLACEKEY/" magpie.${submissiontype}-hadoop-and-pig-functionality-checkexports fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-checkexports - sed -i -e "s/FILENAMESEARCHREPLACEKEY/hadoop-FILENAMESEARCHREPLACEKEY/" magpie.${submissiontype}-hadoop-and-mahout-functionality-checkexports - sed -i -e "s/FILENAMESEARCHREPLACEKEY/hdfs-FILENAMESEARCHREPLACEKEY/" magpie.${submissiontype}-hadoop-and-mahout-functionality-checkexports - sed -i -e "s/FILENAMESEARCHREPLACEKEY/mahout-FILENAMESEARCHREPLACEKEY/" magpie.${submissiontype}-hadoop-and-mahout-functionality-checkexports - fi if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-checkexports sed -i -e "s/FILENAMESEARCHREPLACEKEY/hbase-FILENAMESEARCHREPLACEKEY/" magpie.${submissiontype}-hbase-with-hdfs-functionality-checkexports @@ -616,10 +576,6 @@ __GenerateFunctionalityTests_MagpieScript() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-functionality-magpiescript fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-magpiescript - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-magpiescript fi @@ -690,13 +646,6 @@ __GenerateFunctionalityTests_PrePostRunScripts() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostecho-multi fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostrunscripts-single - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostrunscripts-multi - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostecho-single - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostecho-multi - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostrunscripts-single cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostrunscripts-multi @@ -810,12 +759,6 @@ __GenerateFunctionalityTests_PreRunScriptError() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-functionality-prerunscripterror-multi2 fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-prerunscripterror-single - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-prerunscripterror-multi1 - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-prerunscripterror-multi2 - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-prerunscripterror-single cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-prerunscripterror-multi1 @@ -906,11 +849,6 @@ __GenerateFunctionalityTests_PrePostExecuteScripts() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostexecutescripts-multi fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostexecutescripts-single - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostexecutescripts-multi - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostexecutescripts-single cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostexecutescripts-multi @@ -1003,12 +941,6 @@ __GenerateFunctionalityTests_PreExecuteScriptError() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-functionality-preexecutescripterror-multi2 fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-preexecutescripterror-single - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-preexecutescripterror-multi1 - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-functionality-preexecutescripterror-multi2 - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-preexecutescripterror-single cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-preexecutescripterror-multi1 @@ -1096,8 +1028,6 @@ __GenerateFunctionalityTests_ScriptArgs() { # No Pig test, "script" in Pig executes via a pig command - # No Mahout test, "script" in Mahout executes via a mahout command - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-functionality-scriptargs fi @@ -1139,10 +1069,6 @@ __GenerateFunctionalityTests_HostnameMap() { cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-pig magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-hostname-map fi - if [ "${mahouttests}" == "y" ]; then - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-hostname-map - fi - if [ "${hbasetests}" == "y" ]; then cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hbase-with-hdfs magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-hostname-map fi diff --git a/testsuite/test-generate-mahout.sh b/testsuite/test-generate-mahout.sh deleted file mode 100755 index 88f6a1828..000000000 --- a/testsuite/test-generate-mahout.sh +++ /dev/null @@ -1,118 +0,0 @@ -#!/bin/bash - -source test-generate-common.sh -source test-common.sh -source test-config.sh - -__GenerateMahoutStandardTests_ClusterSyntheticcontrol() { - local mahoutversion=$1 - local hadoopversion=$2 - local javaversion=$3 - - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsoverlustre-run-clustersyntheticcontrol - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsoverlustre-run-clustersyntheticcontrol-no-local-dir - - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsovernetworkfs-run-clustersyntheticcontrol - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsovernetworkfs-run-clustersyntheticcontrol-no-local-dir - - sed -i \ - -e 's/export HADOOP_VERSION="\(.*\)"/export HADOOP_VERSION="'"${hadoopversion}"'"/' \ - -e 's/export MAHOUT_VERSION="\(.*\)"/export MAHOUT_VERSION="'"${mahoutversion}"'"/' \ - magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}* - - SetupHDFSoverLustreStandard `ls \ - magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}*hdfsoverlustre*` - - SetupHDFSoverNetworkFSStandard `ls \ - magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}*hdfsovernetworkfs*` - - JavaCommonSubstitution ${javaversion} `ls magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}*` -} - -GenerateMahoutStandardTests() { - - cd ${MAGPIE_SCRIPTS_HOME}/testsuite/ - - echo "Making Mahout Standard Tests" - - for testfunction in __GenerateMahoutStandardTests_ClusterSyntheticcontrol - do - for testgroup in ${mahout_test_groups} - do - local hadoopversion="${testgroup}_hadoopversion" - local javaversion="${testgroup}_javaversion" - if ! CheckForDependency "Mahout" "Hadoop" ${!hadoopversion} - then - continue - fi - for testversion in ${!testgroup} - do - ${testfunction} ${testversion} ${!hadoopversion} ${!javaversion} - done - done - done -} - -__GenerateMahoutDependencyTests_Dependency1() { - local mahoutversion=$1 - local hadoopversion=$2 - local javaversion=$3 - - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsoverlustre-run-clustersyntheticcontrol - cp ../submission-scripts/script-${submissiontype}/magpie.${submissiontype}-hadoop-and-mahout magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsovernetworkfs-run-clustersyntheticcontrol - - sed -i \ - -e 's/export HADOOP_VERSION="\(.*\)"/export HADOOP_VERSION="'"${hadoopversion}"'"/' \ - -e 's/export MAHOUT_VERSION="\(.*\)"/export MAHOUT_VERSION="'"${mahoutversion}"'"/' \ - magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}*run-clustersyntheticcontrol - - SetupHDFSoverLustreDependency "Mahout1A" ${mahoutversion} `ls \ - magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}*hdfsoverlustre*` - - SetupHDFSoverNetworkFSDependency "Mahout1A" ${mahoutversion} `ls \ - magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}*hdfsovernetworkfs*` - - JavaCommonSubstitution ${javaversion} `ls magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}*run-clustersyntheticcontrol` -} - -GenerateMahoutDependencyTests() { - - cd ${MAGPIE_SCRIPTS_HOME}/testsuite/ - - echo "Making Mahout Dependency Tests" - -# Dependency 1 Tests, run after another - - for testfunction in __GenerateMahoutDependencyTests_Dependency1 - do - for testgroup in ${mahout_test_groups} - do - local hadoopversion="${testgroup}_hadoopversion" - local javaversion="${testgroup}_javaversion" - if ! CheckForDependency "Mahout" "Hadoop" ${!hadoopversion} - then - continue - fi - for testversion in ${!testgroup} - do - ${testfunction} ${testversion} ${!hadoopversion} ${!javaversion} - done - done - done -} - -GenerateMahoutPostProcessing () { - files=`find . -maxdepth 1 -name "magpie.${submissiontype}*run-clustersyntheticcontrol*"` - if [ -n "${files}" ] - then - sed -i -e "s/FILENAMESEARCHREPLACEKEY/run-clustersyntheticcontrol-FILENAMESEARCHREPLACEKEY/" ${files} - fi - - files=`find . -maxdepth 1 -name "magpie.${submissiontype}-hadoop-and-mahout*"` - if [ -n "${files}" ] - then - # Guarantee 60 minutes for the job that should last awhile - ${functiontogettimeoutput} 60 - sed -i -e "s/${timestringtoreplace}/${timeoutputforjob}/" ${files} - fi -} diff --git a/testsuite/test-generate.sh b/testsuite/test-generate.sh index aa946f978..6b048404c 100755 --- a/testsuite/test-generate.sh +++ b/testsuite/test-generate.sh @@ -9,7 +9,6 @@ source test-generate-hadoop.sh source test-generate-hbase.sh source test-generate-hive.sh source test-generate-kafka.sh -source test-generate-mahout.sh source test-generate-phoenix.sh source test-generate-pig.sh source test-generate-spark.sh @@ -41,7 +40,6 @@ standardtests=n dependencytests=n hadooptests=n pigtests=n -mahouttests=n hbasetests=n hivetests=n phoenixtests=n @@ -63,7 +61,6 @@ cornercasetests=y functionalitytests=y hadoopversiontests=y pigversiontests=y -mahoutversiontests=y hbaseversiontests=y hiveversiontests=y phoenixversiontests=y @@ -139,13 +136,6 @@ pig_0_14_0=y pig_0_15_0=y pig_0_16_0=y pig_0_17_0=y -mahout_0_11_0=y -mahout_0_11_1=y -mahout_0_11_2=y -mahout_0_12_0=y -mahout_0_12_1=y -mahout_0_12_2=y -mahout_0_13_0=y hbase_0_98_0_hadoop2=y hbase_0_98_1_hadoop2=y hbase_0_98_2_hadoop2=y @@ -466,12 +456,6 @@ then sed -i -e "s/KAFKA_DIR_PREFIX=\(.*\)/KAFKA_DIR_PREFIX=${kafkadirpathsubst}/" ${MAGPIE_SCRIPTS_HOME}/submission-scripts/script-templates/Makefile fi -if [ "${MAHOUT_DIR_PATH}X" != "X" ] -then - mahoutdirpathsubst=`echo ${MAHOUT_DIR_PATH} | sed "s/\\//\\\\\\\\\//g"` - sed -i -e "s/MAHOUT_DIR_PREFIX=\(.*\)/MAHOUT_DIR_PREFIX=${mahoutdirpathsubst}/" ${MAGPIE_SCRIPTS_HOME}/submission-scripts/script-templates/Makefile -fi - if [ "${PHOENIX_DIR_PATH}X" != "X" ] then phoenixdirpathsubst=`echo ${PHOENIX_DIR_PATH} | sed "s/\\//\\\\\\\\\//g"` @@ -589,14 +573,6 @@ if [ "${pigtests}" == "y" ] && [ "${pigversiontests}" == "y" ]; then GeneratePigDependencyTests fi fi -if [ "${mahouttests}" == "y" ] && [ "${mahoutversiontests}" == "y" ]; then - if [ "${standardtests}" == "y" ]; then - GenerateMahoutStandardTests - fi - if [ "${dependencytests}" == "y" ]; then - GenerateMahoutDependencyTests - fi -fi if [ "${hbasetests}" == "y" ] && [ "${hbaseversiontests}" == "y" ]; then if [ "${standardtests}" == "y" ]; then GenerateHbaseStandardTests @@ -716,7 +692,7 @@ then rm -f magpie.${submissiontype}*no-local-dir* fi -for project in hadoop pig mahout hbase hive phoenix spark storm kafka zookeeper zeppelin +for project in hadoop pig hbase hive phoenix spark storm kafka zookeeper zeppelin do versionsvariable="${project}_all_versions" for version in ${!versionsvariable} @@ -732,7 +708,6 @@ GenerateFunctionalityPostProcessing # GenerateCornerCasePostProcessing GenerateHadoopPostProcessing GeneratePigPostProcessing -GenerateMahoutPostProcessing GenerateHbasePostProcessing GenerateHivePostProcessing GeneratePhoenixPostProcessing @@ -767,7 +742,6 @@ then sed -i -e 's/export HADOOP_LOCAL_DIR="\(.*\)"/export HADOOP_LOCAL_DIR="'"${nolocaldirpathsubst}"'"/' ${files} sed -i -e 's/export HBASE_LOCAL_DIR="\(.*\)"/export HBASE_LOCAL_DIR="'"${nolocaldirpathsubst}"'"/' ${files} sed -i -e 's/export KAFKA_LOCAL_DIR="\(.*\)"/export KAFKA_LOCAL_DIR="'"${nolocaldirpathsubst}"'"/' ${files} - sed -i -e 's/export MAHOUT_LOCAL_DIR="\(.*\)"/export MAHOUT_LOCAL_DIR="'"${nolocaldirpathsubst}"'"/' ${files} sed -i -e 's/export PHOENIX_LOCAL_DIR="\(.*\)"/export PHOENIX_LOCAL_DIR="'"${nolocaldirpathsubst}"'"/' ${files} sed -i -e 's/export PIG_LOCAL_DIR="\(.*\)"/export PIG_LOCAL_DIR="'"${nolocaldirpathsubst}"'"/' ${files} sed -i -e 's/export SPARK_LOCAL_DIR="\(.*\)"/export SPARK_LOCAL_DIR="'"${nolocaldirpathsubst}"'"/' ${files} diff --git a/testsuite/test-submit-cornercase.sh b/testsuite/test-submit-cornercase.sh index ddf7a3a0c..57d3e236c 100755 --- a/testsuite/test-submit-cornercase.sh +++ b/testsuite/test-submit-cornercase.sh @@ -4,7 +4,6 @@ source test-config.sh __SubmitCornerCaseTests_CatchProjectDependencies() { BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-catchprojectdependency-hadoop - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-catchprojectdependency-hadoop BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-catchprojectdependency-hadoop BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-catchprojectdependency-zookeeper BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-catchprojectdependency-hadoop @@ -20,7 +19,6 @@ __SubmitCornerCaseTests_CatchProjectDependencies() { __SubmitCornerCaseTests_NoSetJava() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nosetjava BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-nosetjava - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetjava BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetjava BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-nosetjava BasicJobSubmit magpie.${submissiontype}-spark-cornercase-nosetjava @@ -35,7 +33,6 @@ __SubmitCornerCaseTests_NoSetJava() { __SubmitCornerCaseTests_BadSetJava() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badsetjava BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badsetjava - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badsetjava BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badsetjava BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badsetjava BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badsetjava @@ -60,7 +57,6 @@ __SubmitCornerCaseTests_BadSetPython() { __SubmitCornerCaseTests_NoSetVersion() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nosetversion BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-nosetversion - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetversion BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetversion BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-nosetversion BasicJobSubmit magpie.${submissiontype}-spark-cornercase-nosetversion @@ -77,7 +73,6 @@ __SubmitCornerCaseTests_NoSetVersion() { __SubmitCornerCaseTests_BadVersion() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badversion BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badversion - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badversion BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badversion BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badversion BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badversion @@ -102,7 +97,6 @@ __SubmitCornerCaseTests_BadVersion() { __SubmitCornerCaseTests_NoSetHome() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nosethome BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-nosethome - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosethome BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosethome BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-nosethome BasicJobSubmit magpie.${submissiontype}-spark-cornercase-nosethome @@ -119,7 +113,6 @@ __SubmitCornerCaseTests_NoSetHome() { __SubmitCornerCaseTests_BadSetHome() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badsethome BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badsethome - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badsethome BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badsethome BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badsethome BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badsethome @@ -136,7 +129,6 @@ __SubmitCornerCaseTests_BadSetHome() { __SubmitCornerCaseTests_NoSetLocalDir() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nosetlocaldir BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-nosetlocaldir - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-nosetlocaldir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-nosetlocaldir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-nosetlocaldir BasicJobSubmit magpie.${submissiontype}-spark-cornercase-nosetlocaldir @@ -153,7 +145,6 @@ __SubmitCornerCaseTests_NoSetLocalDir() { __SubmitCornerCaseTests_BadSetLocalDir() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badlocaldir BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badlocaldir - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badlocaldir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badlocaldir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badlocaldir BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badlocaldir @@ -199,7 +190,6 @@ __SubmitCornerCaseTests_BadHostnameMap() { __SubmitCornerCaseTests_BadJobTime() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badjobtime BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badjobtime - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badjobtime BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badjobtime BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badjobtime @@ -223,11 +213,6 @@ __SubmitCornerCaseTests_BadJobTime() { BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badjobtime-sbatchsrun-days-hours BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badjobtime-sbatchsrun-days-hours-minutes BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badjobtime-sbatchsrun-days-hours-minutes-seconds - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-minutes-seconds - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-hours-minutes-seconds - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-days-hours - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-days-hours-minutes - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badjobtime-sbatchsrun-days-hours-minutes-seconds BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badjobtime-sbatchsrun-minutes-seconds BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badjobtime-sbatchsrun-hours-minutes-seconds BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badjobtime-sbatchsrun-days-hours @@ -283,7 +268,6 @@ __SubmitCornerCaseTests_BadJobTime() { __SubmitCornerCaseTests_BadStartupTime() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badstartuptime BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badstartuptime - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badstartuptime BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badstartuptime BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badstartuptime BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badstartuptime @@ -301,7 +285,6 @@ __SubmitCornerCaseTests_BadStartupTime() { __SubmitCornerCaseTests_BadShutdownTime() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badshutdowntime BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badshutdowntime - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badshutdowntime BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badshutdowntime BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badshutdowntime BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badshutdowntime @@ -319,7 +302,6 @@ __SubmitCornerCaseTests_BadShutdownTime() { __SubmitCornerCaseTests_BadNodeCount() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badnodecount-small BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badnodecount-small - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badnodecount-small BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badnodecount-small BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badnodecount-big BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badnodecount-small @@ -341,7 +323,6 @@ __SubmitCornerCaseTests_NoCoreSettings() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nocoresettings-2 BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nocoresettings-3 BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-nocoresettings - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-nocoresettings BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-nocoresettings BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-nocoresettings BasicJobSubmit magpie.${submissiontype}-spark-cornercase-nocoresettings-1 @@ -371,7 +352,6 @@ __SubmitCornerCaseTests_BadCoreSettings() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badcoresettings-2 BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-badcoresettings-3 BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-badcoresettings - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-badcoresettings BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-badcoresettings BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-badcoresettings BasicJobSubmit magpie.${submissiontype}-spark-cornercase-badcoresettings-1 @@ -406,7 +386,6 @@ __SubmitCornerCaseTests_RequireHDFS() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-requirehdfs-1 BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-requirehdfs-2 BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-requirehdfs-3 - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-requirehdfs BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-requirehdfs BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-requirehdfs BasicJobSubmit magpie.${submissiontype}-spark-with-hdfs-cornercase-requirehdfs @@ -416,7 +395,6 @@ __SubmitCornerCaseTests_RequireHDFS() { __SubmitCornerCaseTests_RequireYarn() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-requireyarn BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-requireyarn - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-requireyarn BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-requireyarn-1 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-requireyarn-2 BasicJobSubmit magpie.${submissiontype}-spark-with-yarn-cornercase-requireyarn @@ -461,7 +439,6 @@ __SubmitCornerCaseTests_NoLongerSupported() { BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nolongersupported-3 BasicJobSubmit magpie.${submissiontype}-hadoop-cornercase-nolongersupported-4 BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-cornercase-nolongersupported - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-cornercase-nolongersupported BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-cornercase-nolongersupported BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-cornercase-nolongersupported BasicJobSubmit magpie.${submissiontype}-spark-cornercase-nolongersupported-1 diff --git a/testsuite/test-submit-default.sh b/testsuite/test-submit-default.sh index ed8023fb3..60bbb656d 100755 --- a/testsuite/test-submit-default.sh +++ b/testsuite/test-submit-default.sh @@ -7,8 +7,6 @@ SubmitDefaultStandardTests() { BasicJobSubmit magpie.${submissiontype}-hadoop-default-run-hadoopterasort-no-local-dir BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-default-run-testpig BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-default-run-testpig-no-local-dir - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-default-run-clustersyntheticcontrol - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-default-run-clustersyntheticcontrol-no-local-dir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-default-run-hbaseperformanceeval BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-default-run-hbaseperformanceeval-no-local-dir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-default-run-phoenixperformanceeval diff --git a/testsuite/test-submit-functionality.sh b/testsuite/test-submit-functionality.sh index f127d4296..60b6811dd 100755 --- a/testsuite/test-submit-functionality.sh +++ b/testsuite/test-submit-functionality.sh @@ -7,8 +7,6 @@ __SubmitFunctionalityTests_BadJobNames () { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-job-name-dollarsign BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-job-name-whitespace BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-job-name-dollarsign - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-job-name-whitespace - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-job-name-dollarsign BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-job-name-whitespace BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-job-name-dollarsign BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-job-name-whitespace @@ -38,11 +36,6 @@ __SubmitFunctionalityTests_AltJobTimes () { BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-altjobtime-sbatchsrun-days-hours BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-altjobtime-sbatchsrun-days-hours-minutes BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-altjobtime-sbatchsrun-days-hours-minutes-seconds - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-minutes-seconds - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-hours-minutes-seconds - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-days-hours - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-days-hours-minutes - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altjobtime-sbatchsrun-days-hours-minutes-seconds BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altjobtime-sbatchsrun-minutes-seconds BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altjobtime-sbatchsrun-hours-minutes-seconds BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altjobtime-sbatchsrun-days-hours @@ -88,7 +81,6 @@ __SubmitFunctionalityTests_AltJobTimes () { __SubmitFunctionalityTests_AltConfFilesDir () { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-altconffilesdir BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-altconffilesdir - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-altconffilesdir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-altconffilesdir BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-altconffilesdir BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-altconffilesdir @@ -102,7 +94,6 @@ __SubmitFunctionalityTests_AltConfFilesDir () { __SubmitFunctionalityTests_TestAll() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-testall BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-hadoopterasort-run-testpig-functionality-testall - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-hadoopterasort-run-clustersyntheticcontrol-functionality-testall BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-run-zookeeperruok-functionality-testall BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-hbaseperformanceeval-run-phoenixperformanceeval-run-zookeeperruok-functionality-testall BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-testall @@ -117,7 +108,6 @@ __SubmitFunctionalityTests_TestAll() { __SubmitFunctionalityTests_InteractiveMode () { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-interactive-mode BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-interactive-mode - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-interactive-mode BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-interactive-mode BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-interactive-mode BasicJobSubmit magpie.${submissiontype}-spark-functionality-interactive-mode @@ -131,7 +121,6 @@ __SubmitFunctionalityTests_InteractiveMode () { __SubmitFunctionalityTests_SetuponlyMode () { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-setuponly-mode BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-setuponly-mode - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-setuponly-mode BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-setuponly-mode BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-setuponly-mode BasicJobSubmit magpie.${submissiontype}-spark-functionality-setuponly-mode @@ -145,7 +134,6 @@ __SubmitFunctionalityTests_SetuponlyMode () { __SubmitFunctionalityTests_JobTimeout () { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-jobtimeout BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-jobtimeout - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-jobtimeout BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-jobtimeout BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-jobtimeout BasicJobSubmit magpie.${submissiontype}-spark-functionality-jobtimeout @@ -160,7 +148,6 @@ __SubmitFunctionalityTests_MagpieExports() { BasicJobSubmit magpie.${submissiontype}-hadoop-hdfs-functionality-checkexports BasicJobSubmit magpie.${submissiontype}-hadoop-rawnetworkfs-functionality-checkexports BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-checkexports - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-checkexports BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-checkexports BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-checkexports BasicJobSubmit magpie.${submissiontype}-spark-functionality-checkexports @@ -175,7 +162,6 @@ __SubmitFunctionalityTests_MagpieExports() { __SubmitFunctionalityTests_MagpieScript() { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-magpiescript BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-magpiescript - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-magpiescript BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-magpiescript BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-magpiescript BasicJobSubmit magpie.${submissiontype}-spark-functionality-magpiescript @@ -188,7 +174,6 @@ __SubmitFunctionalityTests_MagpieScript() { __SubmitFunctionalityTests_PrePostRunScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostrunscripts-single BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostrunscripts-single - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostrunscripts-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostrunscripts-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostrunscripts-single BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostrunscripts-single @@ -200,7 +185,6 @@ __SubmitFunctionalityTests_PrePostRunScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostrunscripts-multi BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostrunscripts-multi - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostrunscripts-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostrunscripts-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostrunscripts-multi BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostrunscripts-multi @@ -212,7 +196,6 @@ __SubmitFunctionalityTests_PrePostRunScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostecho-single - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostecho-single @@ -224,7 +207,6 @@ __SubmitFunctionalityTests_PrePostRunScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostecho-multi - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostecho-multi @@ -238,7 +220,6 @@ __SubmitFunctionalityTests_PrePostRunScripts() { __SubmitFunctionalityTests_PreRunScriptError() { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-prerunscripterror-single BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-prerunscripterror-single - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-prerunscripterror-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-prerunscripterror-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-prerunscripterror-single BasicJobSubmit magpie.${submissiontype}-spark-functionality-prerunscripterror-single @@ -250,7 +231,6 @@ __SubmitFunctionalityTests_PreRunScriptError() { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-prerunscripterror-multi1 BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-prerunscripterror-multi1 - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-prerunscripterror-multi1 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-prerunscripterror-multi1 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-prerunscripterror-multi1 BasicJobSubmit magpie.${submissiontype}-spark-functionality-prerunscripterror-multi1 @@ -262,7 +242,6 @@ __SubmitFunctionalityTests_PreRunScriptError() { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-prerunscripterror-multi2 BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-prerunscripterror-multi2 - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-prerunscripterror-multi2 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-prerunscripterror-multi2 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-prerunscripterror-multi2 BasicJobSubmit magpie.${submissiontype}-spark-functionality-prerunscripterror-multi2 @@ -276,7 +255,6 @@ __SubmitFunctionalityTests_PreRunScriptError() { __SubmitFunctionalityTests_PrePostExecuteScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostexecutescripts-single BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostexecutescripts-single - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostexecutescripts-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostexecutescripts-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostexecutescripts-single BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostexecutescripts-single @@ -288,7 +266,6 @@ __SubmitFunctionalityTests_PrePostExecuteScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostexecutescripts-multi BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostexecutescripts-multi - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostexecutescripts-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostexecutescripts-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostexecutescripts-multi BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostexecutescripts-multi @@ -300,7 +277,6 @@ __SubmitFunctionalityTests_PrePostExecuteScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostecho-single - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostecho-single BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostecho-single @@ -312,7 +288,6 @@ __SubmitFunctionalityTests_PrePostExecuteScripts() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-prepostecho-multi - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-prepostecho-multi BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-prepostecho-multi @@ -326,7 +301,6 @@ __SubmitFunctionalityTests_PrePostExecuteScripts() { __SubmitFunctionalityTests_PreExecuteScriptError() { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-preexecutescripterror-single BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-preexecutescripterror-single - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-preexecutescripterror-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-preexecutescripterror-single BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-preexecutescripterror-single BasicJobSubmit magpie.${submissiontype}-spark-functionality-preexecutescripterror-single @@ -338,7 +312,6 @@ __SubmitFunctionalityTests_PreExecuteScriptError() { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-preexecutescripterror-multi1 BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-preexecutescripterror-multi1 - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-preexecutescripterror-multi1 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-preexecutescripterror-multi1 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-preexecutescripterror-multi1 BasicJobSubmit magpie.${submissiontype}-spark-functionality-preexecutescripterror-multi1 @@ -350,7 +323,6 @@ __SubmitFunctionalityTests_PreExecuteScriptError() { BasicJobSubmit magpie.${submissiontype}-hadoop-functionality-preexecutescripterror-multi2 BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-functionality-preexecutescripterror-multi2 - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-functionality-preexecutescripterror-multi2 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-functionality-preexecutescripterror-multi2 BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-functionality-preexecutescripterror-multi2 BasicJobSubmit magpie.${submissiontype}-spark-functionality-preexecutescripterror-multi2 @@ -373,7 +345,6 @@ __SubmitFunctionalityTests_ScriptArgs() { __SubmitFunctionalityTests_HostnameMap() { BasicJobSubmit magpie.${submissiontype}-hadoop-run-hadoopterasort-functionality-hostname-map BasicJobSubmit magpie.${submissiontype}-hadoop-and-pig-run-testpig-functionality-hostname-map - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-run-clustersyntheticcontrol-functionality-hostname-map BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-run-hbaseperformanceeval-functionality-hostname-map BasicJobSubmit magpie.${submissiontype}-hbase-with-hdfs-with-phoenix-run-phoenixperformanceeval-functionality-hostname-map BasicJobSubmit magpie.${submissiontype}-spark-run-sparkpi-functionality-hostname-map diff --git a/testsuite/test-submit-mahout.sh b/testsuite/test-submit-mahout.sh deleted file mode 100755 index 53a433209..000000000 --- a/testsuite/test-submit-mahout.sh +++ /dev/null @@ -1,54 +0,0 @@ -#!/bin/bash - -source test-common.sh -source test-config.sh - -__SubmitMahoutStandardTests_ClusterSyntheticcontrol() { - local mahoutversion=$1 - local hadoopversion=$2 - - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsoverlustre-run-clustersyntheticcontrol - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsoverlustre-run-clustersyntheticcontrol-no-local-dir - - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsovernetworkfs-run-clustersyntheticcontrol - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsovernetworkfs-run-clustersyntheticcontrol-no-local-dir -} - -SubmitMahoutStandardTests() { - for testfunction in __SubmitMahoutStandardTests_ClusterSyntheticcontrol - do - for testgroup in ${mahout_test_groups} - do - local hadoopversion="${testgroup}_hadoopversion" - for testversion in ${!testgroup} - do - ${testfunction} ${testversion} ${!hadoopversion} - done - done - done -} - -__SubmitMahoutDependencyTests_Dependency1() { - local mahoutversion=$1 - local hadoopversion=$2 - - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsoverlustre-run-clustersyntheticcontrol - DependentJobSubmit magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsoverlustre-run-clustersyntheticcontrol - - BasicJobSubmit magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsovernetworkfs-run-clustersyntheticcontrol - DependentJobSubmit magpie.${submissiontype}-hadoop-and-mahout-DependencyMahout1A-hadoop-${hadoopversion}-mahout-${mahoutversion}-hdfsovernetworkfs-run-clustersyntheticcontrol -} - -SubmitMahoutDependencyTests() { - for testfunction in __SubmitMahoutDependencyTests_Dependency1 - do - for testgroup in ${mahout_test_groups} - do - local hadoopversion="${testgroup}_hadoopversion" - for testversion in ${!testgroup} - do - ${testfunction} ${testversion} ${!hadoopversion} - done - done - done -} \ No newline at end of file diff --git a/testsuite/test-submit.sh b/testsuite/test-submit.sh index 28672e622..546d8af1b 100755 --- a/testsuite/test-submit.sh +++ b/testsuite/test-submit.sh @@ -7,7 +7,6 @@ source test-submit-hadoop.sh source test-submit-hbase.sh source test-submit-hive.sh source test-submit-kafka.sh -source test-submit-mahout.sh source test-submit-phoenix.sh source test-submit-pig.sh source test-submit-spark.sh @@ -147,9 +146,6 @@ SubmitHadoopDependencyTests SubmitPigStandardTests SubmitPigDependencyTests -SubmitMahoutStandardTests -SubmitMahoutDependencyTests - SubmitHbaseStandardTests SubmitHbaseDependencyTests diff --git a/testsuite/test-validate.sh b/testsuite/test-validate.sh index 613786f9d..83444c389 100755 --- a/testsuite/test-validate.sh +++ b/testsuite/test-validate.sh @@ -807,12 +807,6 @@ then __check_exports_pig ${file} fi - if echo ${file} | grep -q "mahout" - then - # None guaranted to user at moment - : - fi - if echo ${file} | grep -q "hbase" then __check_exports_hbase ${file}