-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved naming and documentation for flag field accessors and mutators. #868
Conversation
I feel like this is a pretty significant step, so would like fairly broad review if possible. @nh13, @yfarjoun, @jacarey, @lbergelson, @droazen: want to share your 2c each? |
Also, FYI to any reviewers, my apologies for making the review slightly more awkward that needs be. The original code had a block of all the getters followed by a block with all the setters. I've re-arranged to bring the accessor/mutator pair together for each flag, but that makes the diff a little weird looking. |
33f0be3
to
0a2298a
Compare
Codecov Report
@@ Coverage Diff @@
## master #868 +/- ##
===============================================
+ Coverage 64.964% 64.975% +0.012%
- Complexity 7221 7251 +30
===============================================
Files 528 528
Lines 31867 31966 +99
Branches 5442 5457 +15
===============================================
+ Hits 20702 20770 +68
- Misses 9018 9041 +23
- Partials 2147 2155 +8
|
ok for the changes, but please, please keep the java bean nomenclature ! :-) when injecting a SAMRecord in a javascript , servlet , velocity context, etc... we can defacto use the standard APIs.
If my voice matters, I would be happy with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments. not final review.
return (mFlags & SAMFlag.READ_REVERSE_STRAND.flag) != 0; | ||
} | ||
/** Sets whether both reads in the read pair are aligned as expected. */ | ||
public void setProperlyPaired(final boolean paired) { setFlag(SAMFlag.PROPER_PAIR, paired); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that you simply moved the code, but I was wondering if this function should also requiredPaired()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ship has sailed, in general - it would be a breaking change to add those kinds of checks into the setter, and I don't want to be the one who does that ;) More specifically, I think things are the way they are so that records can be cleaned up without having to worry about ordering. E.g. if you want to make a record "unpaired", you don't have to worry about calling setProperlyPaired(false)
before setPaired(false)
etc.
private boolean getMateNegativeStrandFlagUnchecked() { | ||
return (mFlags & SAMFlag.MATE_REVERSE_STRAND.flag) != 0; | ||
} | ||
/** Returns true if the read represented by this record in unmapped. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/in/is/
/** | ||
* It is preferable to use the get*Flag() methods that handle the flag word symbolically. | ||
*/ | ||
/** It is preferable to use the get*Flag() methods that handle the flag word symbolically. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but you deprecated all of them :-) ...please reword the comment.
requireReadPaired(); | ||
return getProperPairFlagUnchecked(); | ||
} | ||
/** Sets the "read paired in sequencing" flag to the given value. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below, there are two styles of comments:
A. set/get the XXX flag on the read
B. returns true if . False otherwise.
Personally, I prefer the latter, but in any case, I think we should be more consistent....happy to help with wording if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a pass through and tried to be more consistent. Am happy to take suggestions for anywhere you think could still be improved.
@Deprecated public void setSupplementaryAlignmentFlag(final boolean flag) { setSupplementaryAlignment(flag); } | ||
|
||
/** Returns true if the read fails vendor quality checks. */ | ||
public boolean failsQc() { return getFlag(SAMFlag.READ_FAILS_VENDOR_QUALITY_CHECK); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not isPassingFilters()/isNotPassingFilters()
? the description of the flag in the SAM spec is "not passing filters, such as platform/vendor quality controls" so, qc is an example, not the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me. I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (mFlags & SAMFlag.READ_REVERSE_STRAND.flag) != 0; | ||
} | ||
/** Sets whether both reads in the read pair are aligned as expected. */ | ||
public void setProperlyPaired(final boolean paired) { setFlag(SAMFlag.PROPER_PAIR, paired); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ship has sailed, in general - it would be a breaking change to add those kinds of checks into the setter, and I don't want to be the one who does that ;) More specifically, I think things are the way they are so that records can be cleaned up without having to worry about ordering. E.g. if you want to make a record "unpaired", you don't have to worry about calling setProperlyPaired(false)
before setPaired(false)
etc.
@Deprecated public void setSupplementaryAlignmentFlag(final boolean flag) { setSupplementaryAlignment(flag); } | ||
|
||
/** Returns true if the read fails vendor quality checks. */ | ||
public boolean failsQc() { return getFlag(SAMFlag.READ_FAILS_VENDOR_QUALITY_CHECK); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me. I'll change it.
requireReadPaired(); | ||
return getProperPairFlagUnchecked(); | ||
} | ||
/** Sets the "read paired in sequencing" flag to the given value. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a pass through and tried to be more consistent. Am happy to take suggestions for anywhere you think could still be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed a few spots. :-)
@Deprecated public void setFirstOfPairFlag(final boolean flag) { setFirstOfPair(flag); } | ||
|
||
|
||
/** Returns true if the second of pair flag is set, false otherwise. Illegal to call on unpaired reads. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...if the read is second in a pair,...."
/** Returns true if the second of pair flag is set, false otherwise. Illegal to call on unpaired reads. */ | ||
public boolean isSecondOfPair() { requireReadPaired(); return getFlag(SAMFlag.SECOND_OF_PAIR); } | ||
|
||
/** Sets the second of pair flag. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Sets whether the read is second in a pair"
public void setReadFailsVendorQualityCheckFlag(final boolean flag) { | ||
setFlag(flag, SAMFlag.READ_FAILS_VENDOR_QUALITY_CHECK.flag); | ||
} | ||
/** Returns true if the first of pair flag is set, false otherwise. Illegal to call on unpaired reads. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...if the read is the first in a pair,...."
/** Returns true if the first of pair flag is set, false otherwise. Illegal to call on unpaired reads. */ | ||
public boolean isFirstOfPair() { requireReadPaired(); return getFlag(SAMFlag.FIRST_OF_PAIR); } | ||
|
||
/** Sets the first of pair flag. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Sets whether the read is first in a pair."
@tfenne It might make sense to make these changes in the context of introducing a new read interface. (We happen to have a new read interface that we think is pretty good https://github.com/broadinstitute/gatk/blob/master/src/main/java/org/broadinstitute/hellbender/utils/read/GATKRead.java) |
|
||
/** | ||
* strand of the mate (false for forward; true for reverse strand). | ||
* True if the read is the second read in a read pair, false otherwise. Illegal to call on unpaired reads. | ||
* Returns the value of flag bit 0x80. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're already making changes changes we might want to update these to match the language in the sam spec. I.e. firstSegment, lastSegment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. While first/last is indeed the language of the sam-spec, the vast majority of the actual usage pair-ended reads. If we are to go this route (of pretending to have more then 2 reads), we would also need to avoid the use of the term "mate" and start using "next" everywhere...I don't think that the confusion that would cause is worth the adherance to the language in the spec. What we can do is specify in the javadoc precisely what flag is being accessed here (and in other places...)
@tfenne Completely agree with @lbergelson -- it would be extremely useful to finally have a |
@lbergelson & @droazen thanks for the feedback. I'm trying to understand the goal of introducing a I did take a look at the linked I don't think the changes proposed are immediately disruptive - the old methods are deprecated, but not removed. That said, changing them multiple times to align with a future interface would be more disruptive than doing it all at once. My goal is to clean up some of the cruft in SAMRecord and make interacting with nicer, without the need to wrapper it since I'm somewhat afraid of the performance overhead of putting another level of indirection between application code and reads. I'd also like, over time, to do things like get rid of |
@tfenne Deprecating these methods in We've had a need for a Read (and Variant!) interface in htsjdk for a long time -- having these interfaces in the GATK, and then having to adapt or convert the underlying htsjdk types has been painful for us. It would be great if we could take this opportunity to finally design a Read interface in htsjdk that we are all happy with. The design process need not take months! I'm sure I don't need to convince you of the advantages and value of coding against interfaces rather than concrete types in a general sense. The specific use case that prompted us to create the |
@droazen I'm not sure that idiosyncrasies of the GATK4 build can be used as an argument to increase the scope of a PR, or significantly slow it's acceptance. We should be able to accept/reject a PR on its merits and make the build systems/CI we use in projects that depend on htsjdk flexible enough to handle the occasional deprecation. Heaven knows we cause this pain to others.... Regarding the the specific ideas being floated around.
Can we agree on the names/javadoc of the methods in this PR and open a separate issue dedicated to the discussion of the read interface? |
I think that the What's about doing something similar as in GATK and create a new HTSJDK repository for version 3, including interfaces and solving known problems with the API? New issues could be back-ported in the meanwhile, and the new SemVer system could be apply from version 3 onwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change, but I rather prefer an interface to be able to use some methods in HTSJDK in custom implementations...
public boolean isSecondaryOrSupplementary() { return isSecondaryAlignment() || isSupplementaryAlignment(); } | ||
|
||
/** Sets a specific flag bit to the provided value. */ | ||
private void setFlag(final SAMFlag flag, final boolean value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have more control and do not rely on the new syntax, could this and the getter be public?
@yfarjoun If we do this rename now, and then introduce a Read interface later, we'll have to make two passes through our code base to migrate method calls instead of one. In the past, as a courtesy to downstream projects like IGV, Picard, and GATK, we've made an effort to try to minimize such unnecessary pain. And to be clear, this change likely affects thousands or tens of thousands of lines of code in GATK alone, so this is on a somewhat different level from typical htsjdk breakage :) Also, consider this: if we did this rename now, and then wanted different names in the final It also seems more productive and ultimately useful to work together to design a generalized |
well. I guess that part I disagree with is the "have to". You could also change GATK's build system (temporarily) to allow deprecated calls. |
@yfarjoun Not really a viable option for us, given the number of our dependencies and the frequency with which APIs change out from under us (particularly on the Spark side of things). But there's a bigger problem here: if we rename a bunch of methods in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tfenne looks really great, though I think I would suggest some extra methods like setMapped(final boolean mapped)
and setPositiveStrand(final boolean positive)
, as they seem a bit more natural, for example a read goes from unmapped to mapped so setMapped
.
Also, reading through the discussion, while it is interesting to think about making an interface on top of SAMRecord
for more general representations of sequencing reads, I think it should to be motivated by having a second implementation in htsjdk versus elsewhere. Furthermore, since SAMRecord
is meant to represent the specifics and peculiarities of the SAM spec, I am not sure it makes sense to think of other formats unless they can be converted back into SAM (ex. CRAM, BAM). A common interface that both SAMRecord
and a second implementation would implement seems like a separate discussion (worth having too).
I am also not convinced about the arguments that deprecations will break other folks' build, or that it is a too of work to switch over to the new methods before they are removed. They could not be removed for quite some time allowing folks to make the switch.
throw new IllegalStateException("Inappropriate call if not paired read"); | ||
} | ||
} | ||
/** Sets whether or not the read is part of a read pair. Sets flag bit 0x1. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets the, or "Sets the 0x1 flag bit"
} | ||
} | ||
/** Sets whether or not the read is part of a read pair. Sets flag bit 0x1. */ | ||
public void setPaired(boolean paired) { setFlag(SAMFlag.READ_PAIRED, paired); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
/** Sets the read's unmapped flag. Sets flag bit 0x4. */ | ||
public void setUnmapped(final boolean unmapped) { | ||
setFlag(SAMFlag.READ_UNMAPPED, unmapped); | ||
setIndexingBin(null); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a setMapped
method too.
return (mFlags & SAMFlag.READ_FAILS_VENDOR_QUALITY_CHECK.flag) != 0; | ||
} | ||
/** Sets whether the read's mate is unmapped. Sets flag bit 0x8. */ | ||
public void setMateUnmapped(final boolean unmapped) { setFlag(SAMFlag.MATE_UNMAPPED, unmapped); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto about a setMateMapped
method.
public final boolean isPositiveStrand() { return !isNegativeStrand(); } | ||
|
||
/** Sets whether the read is mapped to the negative strand of the genome. Sets flag bit 0x10. */ | ||
public void setNegativeStrand(final boolean negative) { setFlag(SAMFlag.READ_REVERSE_STRAND, negative); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why, but I want a setPositiveStrand
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's for symmetry.
I repeat my suggestion about creating a non-backwards compatible API and ask to the community for a clean-up effort of the library. Doing that, this PR could get in and the discussion about what is a good |
@nh13 Well, it's not really practical to push down alternate read implementations into htsjdk, given how strictly we control project dependencies here (eg., I assume you would not want to see a new Google Genomics dependency added to htsjdk, right?). But the fact is that alternate read representations do exist, have been standardized, and the lack of a |
@tfenne I think that this is totally a good direction and I too would love to have better API for sam-record. I also think that a Read API is a good way forward and would hate to change the names twice. I think we are all on the same page. The only question is timing. As we are working towards a presentation in BioIT we are too short-staffed to deal with the breakage that would result (due to the way gatk4 is currently set-up) but would love to be part of designing the Read API. There are important decisions to be made (for example regarding using first / mate / second versus first / next / last ) and we do not have the bandwidth right now to give it the attention it deserves. Can we agree to put this on hold for about a month and then hold a day-long meeting where we will work on the new API including changing names? We're happy to host. I hope I was able to convey our reluctance to making these changes right now and at the same time, our full commitment to getting to them in the near future, and not letting them languish like some htsjdk PRs... |
@yfarjoun: I appreciate your candour. As you know @nh13 and I do most of our development in scala these days, and we discussed two options. The first was to do what this PR is trying to do - i.e. evolve HTSJDK APIs, in this case Waiting a month to have the discussion about starting a longer process of specifying a super-interface for I also echo your own earlier comment that it should not be encumbent on htsjdk to deal with the consequences of choices made in downstream projects. If htsjdk were to be completely beholden to the GATK development team's choices that would be a real problem. Treating use of deprecated code as compile breaks is a choice, but should not mean that we now have to treat deprecation as a breaking change in HTSJDK. If that were true there would be literally no way to evolve the library! In addition, nobody is forcing GATK to use the latest HTSJDK snapshot - if if that's disruptive why not simply pin the dependency at 2.9.1 until you're ready? Also, how is it that GATK is not insulated from changes in SAMRecord via it's use of |
@tfenne Surely changes to a core class like |
If you (@tfenne) and @nh13 are open to the possibility of changing some names (again) in a several weeks, there seems to be a third option that enables you to start moving forward with an experimental API and not disrupting the GATK project, whose objection to the suggested change is temporal, not categorical (as in, in a month or so we will gladly accept the disruption cause by the deprecation, there is no objection to the mere idea of deprecating functions.) The option I'm thinking of is the following: Add the functions that you are currently suggesting annotated with (a new) |
@yfarjoun Thanks for the suggestion. I honestly didn't think this PR was going to generate as much discussion as it did. My goal here was to spruce up the API to SAMRecord here, and then create custom sub-classes in fgbio for use with a The names I picked here are already something of a compromise w.r.t. idiomatic scala. Since the PR generated the response it did, I spent a bit more time on the scala side and have found a neat way of totally encapsulating the API to SAMRecord without having to wrapper it(!). For anyone interested (it's all in scala), the prototype of that work is here. Having done that, I'm not much less dependent on the particulars of the SAMRecord API and am happy to wait. I can either leave the PR open, or close the PR and leave the branch up here, so it's available as input to the future discussion. Thoughts on which is preferable? |
I would prefer that you kept it open and it would serve as the beginning of
a discussion for the HtsJdkRead interface (or whatever we will call it...)
I'm glad that you have a solution that works for you. (if you close it,
the branch could be deleted to easily...)
I hope that you'll be involved in a few weeks when we would like to start
designing the interface.
…On Sat, May 6, 2017 at 1:33 PM, Tim Fennell ***@***.***> wrote:
@yfarjoun <https://github.com/yfarjoun> Thanks for the suggestion. I
honestly didn't think this PR was going to generate as much discussion as
it did. My goal here was to spruce up the API to SAMRecord here, and then
create custom sub-classes in fgbio for use with a SAMRecordFactory.
The names I picked here are already something of a compromise w.r.t.
idiomatic scala. Since the PR generated the response it did, I spent a bit
more time on the scala side and have found a neat way of totally
encapsulating the API to SAMRecord *without* having to wrapper it(!). For
anyone interested (it's all in scala), the prototype of that work is here
<https://github.com/fulcrumgenomics/fgbio/pull/219/files#diff-c9293df9e63c77546b186e59188ff82dR33>
.
Having done that, I'm not much less dependent on the particulars of the
SAMRecord API and am happy to wait. I can either leave the PR open, or
close the PR and leave the branch up here, so it's available as input to
the future discussion. Thoughts on which is preferable?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#868 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACnk0pM1yulr7vALaY4lyz7yZmBjPfnDks5r3K7rgaJpZM4NMWQk>
.
|
can this PR be closed? |
I guess so. This would be a addressed in the htsjdk-next-beta |
Description
The accessor and mutator methods for flag fields in
SAMRecord
have had really long, ugly names forever; names that don't even follow java standards very well. I feel like I/we should have done this a lot sooner! I've deprecated all the existing flag field methods and re-routed them to new methods with much cleaner names. In addition for some flag fields I've added inverted accessors (e.g.isUnmapped()
andisMapped()
together replacegetReadUnmappedFlag()
).I suspect this might be a little controversial, but since I think we're likely stuck with the existing implementation for SAMRecord it would be nice to start tidying up by improving things and deprecating things we don't like.
While I don't think we want to have a long conversation about the exact names of the new methods, if there are any that really don't sit well with anyone, I'm happy to change them. I struggled to come up with good names for the replacements for
getReadFailsVendorQualityCheck()
which ended up withfailsQc()
andpassesQc()
since all other phrasings sounded weird to me (e.g.isFailingQc()
,isQcFailure()
, etc.).Lastly, if/when we are happy with this, I will take on the task of finding/replacing calls to all the newly deprecated methods within HTSJDK, so that upon-merge there will be no direct use of the deprecated methods within HTSJDK.
Checklist