Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error after Performing machine learning classification #32

Open
Aciole-David opened this issue Mar 12, 2024 · 2 comments
Open

Error after Performing machine learning classification #32

Aciole-David opened this issue Mar 12, 2024 · 2 comments

Comments

@Aciole-David
Copy link

Hello!
I'm testing vRhyme and got stuck after 'Performing machine learning classification' step

Running on a slurm HPC system
Fresh mamba install
Inputs :
a) Single-end Next-seq reads;
b) virsorter output sequences from megahit contigs

Slurm log below:


/home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/vRhyme:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/p
kg_resources.html
import pkg_resources
Command: /home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/vRhyme
-i final-viral-combined.fa
-u putativeVLP_data_106.fastq putativeVLP_data_76.fastq putativeVLP_data_77.fastq putativeVLP_data_78.fastq putativeVLP_data_79.fastq putativeVLP_data_80.fastq putativeVLP_data_81.fastq putativeVLP_data_82.fastq putativeVLP_data_83.fastq putativeVLP_data_85.fastq putativeVLP_data_86.fastq putativeVLP_data_87.fastq putativeVLP_data_88.fastq putativeVLP_data_89.fastq
-t 20
-o vrhyme_out
--method longest
--verbose

Date: 2024-03-12 (y-m-d)
Start: 11:30:34 (h:m:s)
Program: vRhyme v1.1.0

Time (min) | Log

0.0 Initializing and validating vRhyme parameters
0.11 Running 'longest' dereplication: 97% identity and 70% coverage
0.69 No sequences were of sufficient similarity to dereplicate
0.69 Single end read file(s) identified. Running bowtie2 on 14 unpaired file(s)
3.43 Extracting coverage information from BAM files
3.57 Coverage extraction complete. Generating coverage table
3.57 Performing pairwise coverage comparisons
3.58 Running Prodigal on filtered sequences
3.64 Generating codon usage features
3.64 Generating nucleotide features
3.67 Performing pairwise distance calculations
3.67 Performing machine learning classification
Traceback (most recent call last):
File "/home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/vRhyme", line 960, in
net_data = machine_stuff.machine_stuff(distances, presets, model_method, pairs_machine, cohen_machine, iterations, cohen_check)
File "/home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/machine_stuff.py", line 73, in machine_stuff
model_ET = pickle.load(read_model_ET)
File "sklearn/tree/_tree.pyx", line 865, in sklearn.tree._tree.Tree.setstate
File "sklearn/tree/_tree.pyx", line 1571, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:

  • expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'f
    ormats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
  • got : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weight
    ed_n_node_samples', '<f8')]

Thank you!

@Aciole-David
Copy link
Author

Easily solved with #30.
Thanks!

@Vini2
Copy link

Vini2 commented Sep 2, 2024

Thanks for pointing to the fix @Aciole-David!

@AnantharamanLab It would be great if you can pin the version of scikit-learn in the setup.py and in the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants