v3.0.0
This version adds several new features as well as general bug fixes and optimizations.
Key Improvements:
- A major Remora datasets update
- Easier dataset composition and manipulation
- Flexible dataset mixing, allowing use of randomers, native, enzymatic, PCR, spike-in, and other dataset types
- Datasets defined by configuration file, which can be generated automatically
- Larger datasets enabled
- Model training has now been demonstrated on over one billion training chunks
- Easier hyper-parameter tuning at training time
- Easier dataset composition and manipulation
- Enhanced signal and metrics plotting and exploration interface
- Improved model inference speed
- Full RNA support, including an m6A model - also available for production modified base calling through Dorado
- ChEBI code support
- Allow any modified base with full pipeline support
- Split reads support
- Use latest POD5 update
- Allow single POD5 or directory of POD5 files as input
- Various bug fixes