Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference of sequencer model needs updating #108

Open
dpark01 opened this issue Aug 3, 2024 · 2 comments
Open

inference of sequencer model needs updating #108

dpark01 opened this issue Aug 3, 2024 · 2 comments

Comments

@dpark01
Copy link
Member

dpark01 commented Aug 3, 2024

As of 2024, illumina_demux's sequencer model emitted in its runinfo.json output is failing to infer the sequencer from recent NextSeq 2000 runs (not sure if they're XLEAP kits or just normal ones) and instead just emitting UNKNOWN. Probably just need to update the heuristics and tables here. Observed behavior both at Broad and ACEGID.

@tomkinsc
Copy link
Member

tomkinsc commented Aug 4, 2024

It seems that all NextSeq 2000 run directories have a file called RunParameters.xml with various helpful values, including InstrumentType, so we may not need to resort to regex matching to sleuth out the model of newer sequencers. Ex.:

<InstrumentType>NextSeq 2000</InstrumentType>

We can obtain that value directly in Python like this:

python3 -c "import xml.etree.ElementTree as ET; tree = ET.parse('RunParameters.xml'); root = tree.getroot(); print(root.find('.//InstrumentType').text)"

(perhaps falling back to the old regex approach if the RunParameters.xml file does not exist)

Example of other values that may be interesting to parse out and/or use:

  <FlowCellLotNumber>20688106</FlowCellLotNumber>
  <FlowCellExpirationDate>2023-09-03</FlowCellExpirationDate>
  <FlowCellVersion>2</FlowCellVersion>
  <FlowCellMode>NextSeq 1000/2000 P2 Flow Cell Cartridge</FlowCellMode>
  <CartridgeSerialNumber>EC1194950-EC11</CartridgeSerialNumber>
  <CartridgePartNumber>20044466</CartridgePartNumber>
  <CartridgeLotNumber>20668878</CartridgeLotNumber>
  <CartridgeExpirationDate>2023-08-28</CartridgeExpirationDate>
  <CartridgeVersion>3</CartridgeVersion>
  <CartridgeMode>NextSeq 1000/2000 P2 Reagent Cartridge (338 Cycles)</CartridgeMode>

I'm curious if we can use CartridgeLotNumber to find any lot-related effects in the data, or if we can relate any data quality metrics to CartridgeExpirationDate.

@tomkinsc
Copy link
Member

tomkinsc commented Oct 22, 2024

Depending on the sequencer model, it seems InstrumentType is not always present in RunParameters.xml.

Fortunately there's another way to identify instrument type if we pip install interop (python bindings to Illumina's interop library):

# pip install interop
from interop.core import interop_run as illumina_interop_run
run_params = illumina_interop_run.parameters()
run_params.read("./run_directory_path")
instrument_type=illumina_interop_run.to_string_instrument_type(run_params.instrument_type())
print(instrument_type) # ex. output: 'NextSeq1k2k'

The function above, run_params.instrument_type(), returns an integer which is mapped to a sequencer model name via interop_run.to_string_instrument_type(), which in turn relies on the enumeration INTEROP_ENUM_INSTRUMENT_TYPES found here. (check for updates to the enum if Illumina releases a new model of sequencer)

A couple complications:

  • The instrument_type() function does not distinguish between a NextSeq 1000 a NextSeq 2000 (to_string_instrument_type() says 'NextSeq1k2k'), but the enumeration of Illumina models in the current NCBI SRA metadata xsd currently lists NextSeq 1000 and NextSeq 2000 separately, and requires one value or the other (i.e. not 'NextSeq 1000/2000', which was previously allowed)
  • Similarly, the interop library does not distinguish a NovaSeq 6000 from a NovaSeq X or X Plus, nor does it distinguish among the various flavors of HiSeq.

At least in the case of a 'NextSeq1k2k', we can fall back to reading InstrumentType from RunParameters.xml, since the field seems to be present for NextSeq runs and its value is more specific than what the Illumina interop library currently returns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants