You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a potential error in the QM8 dataset from the original MoleculeNet paper caused by duplicate columns (possibly due to a pandas data processing error).
Assuming the error is indeed present, the benchmarking numbers for QM8 may need to be rerun. The duplicated columns are for two very similar tasks though (the two tasks are to predict DFT results on the same molecule computed with the same functional but different basis sets) so I suspect that the qualitative changes will be relatively minimal (models have in effect been double predicting one DFT run instead of two slightly different DFT runs)
The text was updated successfully, but these errors were encountered:
rbharath
changed the title
Potential Porcessing Error in the Original QM8 Dataset on Some Tasks
Potential Processing Error in the Original QM8 Dataset on Some Tasks
Nov 15, 2021
There is a potential error in the QM8 dataset from the original MoleculeNet paper caused by duplicate columns (possibly due to a pandas data processing error).
deepchem/deepchem#2747
We are still working to verify the error but in the meanwhile there is a fix PR under review that you can use:
deepchem/deepchem#2756
Assuming the error is indeed present, the benchmarking numbers for QM8 may need to be rerun. The duplicated columns are for two very similar tasks though (the two tasks are to predict DFT results on the same molecule computed with the same functional but different basis sets) so I suspect that the qualitative changes will be relatively minimal (models have in effect been double predicting one DFT run instead of two slightly different DFT runs)
The text was updated successfully, but these errors were encountered: