You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe I've found a bug in pdbtbx's structure validation, specifically in the call to Atom::corresponds() in pdbtbx::validate::validate_models(): this function assumes that "corresponding" atoms have identical serial numbers over all models, but this appears not to be the case in PDBx/mmCIF format. This means that multi-conformer (in my case, NMR) structures in this format cannot currently be parsed.
For example, the following script using PDB entry 3PDZ panics:
use pdbtbx::{Format,ReadOptions,StrictnessLevel};fnmain(){let(pdb, _errors) = ReadOptions::default().set_level(StrictnessLevel::Loose).set_format(Format::Mmcif).read("7QCX.cif").expect("errors in structure");println!("{:#?}", pdb);}// > errors in structure: [// > StrictWarning: Atoms in Models not corresponding// > Atom 1427 in Model 2 does not correspond to the respective Atom in the first model.// > , StrictWarning: Atoms in Models not corresponding// > Atom 1428 in Model 2 does not correspond to the respective Atom in the first model.// > , ...]
As a preliminary fix, I believe it would be easiest to just remove the line self.serial_number == other.serial_number from pdbtbx::structs::atom::Atom::corresponds() entirely - this allows the snippet above to run successfully.
Atom.serial_number: int is unique within each model, but identical for corresponding atoms in different models (= current behaviour)
Atom.global_id: String is unique for every atom in a structure
pdbtbx::read::mmcif::parser extracts Atom.global_id from the file and calculates Atom.serial_number itself (ensuring atom correspondence etc.)
pdbtbx::read::pdb::parser extracts Atom.serial_number from the file and calculates Atom.global_id itself (e.g. as a String representation of an incrementing integer)
pdbtbx::save::mmcif uses Atom.global_id only
pdbtbx::save::pdb uses Atom.serial_number only
This would keep most of the library's behaviour as it is - the only "outward-facing" change would be the way PDBx/mmCIF files are read and written.
In any case, it might also be good to add a multi-conformer structure to the test suite (like in #131), if that doesn't end up being too much bloat.
Very sorry for the long message! This is absolutely not urgent for me as I'm not currently using the library, so there's absolutely no rush to get it fixed. If it would help, I'd be glad to try making a PR of how I'd imagine the changes when I get the chance. Many thanks for your work on this!
The text was updated successfully, but these errors were encountered:
Thanks so much for the detailed issue. I might have indeed fared mostly on what I saw in mmCIF files and not the specification itself. The proposed behaviour seems very sensible. Adding a test case is a good thing and if you have the time to work on a PR that would be splendid. Personally my work has moved on from structural proteins so I do not have a lot of time on my hands to work on pdbtbx, but I would be happy to discuss the issue further or review a PR.
Thanks as well for the quick response! In that case, I'd be glad to work on a PR in this direction (although I'm afraid it may take a while for me as well - anybody please feel free to give me a poke if nothing's happened here yet and if the changes would be useful for you). I'll see how much of the issue I can deal with by myself so you don't need to spend extra time on this, but would also be glad of the opportunity to discuss a bit in case I find something I'm not quite clear on.
I also noticed that #64/#95 are somewhat related to this, so I'll try not to introduce anything that interferes with previous changes from those.
Hi,
I believe I've found a bug in
pdbtbx
's structure validation, specifically in the call toAtom::corresponds()
inpdbtbx::validate::validate_models()
: this function assumes that "corresponding" atoms have identical serial numbers over all models, but this appears not to be the case in PDBx/mmCIF format. This means that multi-conformer (in my case, NMR) structures in this format cannot currently be parsed.For example, the following script using PDB entry 3PDZ panics:
As a preliminary fix, I believe it would be easiest to just remove the line
self.serial_number == other.serial_number
frompdbtbx::structs::atom::Atom::corresponds()
entirely - this allows the snippet above to run successfully.For full compatibility with the format, it may even be better to change the handling of serial numbers entirely though, as the
_atom_site.id
field it's read from in PDBx/mmCIF format can apparently be any unique identifier (i.e. not necessarily numeric). Maybe behaviour like this would make sense?Atom.serial_number: int
is unique within each model, but identical for corresponding atoms in different models (= current behaviour)Atom.global_id: String
is unique for every atom in a structurepdbtbx::read::mmcif::parser
extractsAtom.global_id
from the file and calculatesAtom.serial_number
itself (ensuring atom correspondence etc.)pdbtbx::read::pdb::parser
extractsAtom.serial_number
from the file and calculatesAtom.global_id
itself (e.g. as aString
representation of an incrementing integer)pdbtbx::save::mmcif
usesAtom.global_id
onlypdbtbx::save::pdb
usesAtom.serial_number
onlyThis would keep most of the library's behaviour as it is - the only "outward-facing" change would be the way PDBx/mmCIF files are read and written.
In any case, it might also be good to add a multi-conformer structure to the test suite (like in #131), if that doesn't end up being too much bloat.
Very sorry for the long message! This is absolutely not urgent for me as I'm not currently using the library, so there's absolutely no rush to get it fixed. If it would help, I'd be glad to try making a PR of how I'd imagine the changes when I get the chance. Many thanks for your work on this!
The text was updated successfully, but these errors were encountered: