You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running Osprey in an HPC facility using the PBS PRO queue system. I'm launching jobs as array jobs, so it is my understanding that multiple workers are accesing the database file at the same time, maybe being the cause of the issue that I'm reporting here:
Here is the Osprey config file, and here is the PBS submission file.
Some of the jobs run without problems, but most (>60%) are giving the following error:
======================================================================
= osprey is a tool for machine learning hyperparameter optimization. =
======================================================================
osprey version: 1.2.0dev
time: January 16, 2018 2:46 PM
hostname: cx1-138-2-3.cx1.hpc.ic.ac.uk
cwd: /tmp/pbs.1108144[7].cx1
pid: 15308
Loading config file: /work/je714/cross-validations/ef-hand/cv_cx1.yaml...
msmbuilder version: 3.7.0
mdtraj version: 1.8.0
Loading dataset...
Dataset contains 145 element(s) with out labels
The elements have shape: [(7250, 263), (7250, 263), (1500, 263), (2500, 263), (7871, 263), (5625, 263), (4500, 263), (4277, 263), (4725, 263), (4568, 263), (8100, 263), (7425, 263), (5690, 263), (1000, 263), (2500, 263), (2500, 263), (2500, 263), (2500, 263), (2500, 263), (2500, 263), ...]
Instantiated estimator:
Pipeline(steps=[('tica', tICA(commute_mapping=False, kinetic_mapping=False, lag_time=1,
n_components=None, shrinkage=None)), ('cluster', MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8,
n_init=3, rando...les=5,
prior_counts=0, reversible_type='mle', sliding_window=True,
verbose=True))])
Hyperparameter search space:
tica__lag_time (int) 1 <= x <= 200
tica__commute_mapping (enum) choices = (True, False)
cluster__n_clusters (int) 50 <= x <= 5000
tica__n_components (int) 1 <= x <= 20
----------------------------------------------------------------------
Beginning iteration 1 / 1
----------------------------------------------------------------------
Loading trials database: sqlite:////work/je714/cross-validations/ef-hand/osprey-trials.db...
History contains: 178 trials
Choosing next hyperparameters with random...
{'tica__lag_time': 125, 'tica__commute_mapping': False, 'cluster__n_clusters': 708, 'tica__n_components': 2}
(random took 0.006 s)
/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sklearn/grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
An unexpected error has occurred with osprey (version 1.2.0dev), please
consider sending the following traceback to the osprey GitHub issue tracker at:
https://github.com/msmbuilder/osprey/issues
Traceback (most recent call last):
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 721, in _commit_impl
self.engine.dialect.do_commit(self.connection)
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 443, in do_commit
dbapi_connection.commit()
sqlite3.OperationalError: disk I/O error
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/je714/.conda/envs/osprey/bin/osprey", line 11, in <module>
load_entry_point('osprey', 'console_scripts', 'osprey')()
File "/export131/home/je714/osprey/osprey/cli/main.py", line 37, in main
args_func(args, p)
File "/export131/home/je714/osprey/osprey/cli/main.py", line 42, in args_func
args.func(args, p)
File "/export131/home/je714/osprey/osprey/cli/parser_worker.py", line 8, in func
execute(args, parser)
File "/export131/home/je714/osprey/osprey/execute_worker.py", line 89, in execute
max_param_suggestion_retries=max_param_suggestion_retries)
File "/export131/home/je714/osprey/osprey/execute_worker.py", line 149, in initialize_trial
session.commit()
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 874, in commit
self.transaction.commit()
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 465, in commit
t[1].commit()
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1623, in commit
self._do_commit()
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1654, in _do_commit
self.connection._commit_impl()
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 723, in _commit_impl
self._handle_dbapi_exception(e, None, None, None, None)
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1393, in _handle_dbapi_exception
exc_info
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
raise value.with_traceback(tb)
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 721, in _commit_impl
self.engine.dialect.do_commit(self.connection)
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 443, in do_commit
dbapi_connection.commit()
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) disk I/O error
Exception during reset or similar
Traceback (most recent call last):
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/pool.py", line 687, in _finalize_fairy
fairy._reset(pool)
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/pool.py", line 829, in _reset
pool._dialect.do_rollback(self)
File "/home/je714/.conda/envs/osprey/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 440, in do_rollback
dbapi_connection.rollback()
sqlite3.OperationalError: cannot rollback - no transaction is active
I've seen issue #6 from awhile ago but I am not sure this is related. Any idea what is going on here?
Also, I'm using the latest copy of the github code for Osprey.
Thanks for any help!
The text was updated successfully, but these errors were encountered:
I'm running Osprey in an HPC facility using the PBS PRO queue system. I'm launching jobs as array jobs, so it is my understanding that multiple workers are accesing the database file at the same time, maybe being the cause of the issue that I'm reporting here:
Here is the Osprey config file, and here is the PBS submission file.
Some of the jobs run without problems, but most (>60%) are giving the following error:
I've seen issue #6 from awhile ago but I am not sure this is related. Any idea what is going on here?
Also, I'm using the latest copy of the github code for Osprey.
Thanks for any help!
The text was updated successfully, but these errors were encountered: