-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable VcfFileReader and other classes to use Path #1018
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree on using GCS for the tests, more when there is already another filesystem for testing (jimfs).
build.gradle
Outdated
@@ -43,6 +43,8 @@ dependencies { | |||
testRuntime 'org.pegdown:pegdown:1.4.2' // Necessary for generating HTML reports with ScalaTest | |||
testCompile "org.testng:testng:6.9.9" | |||
testCompile "com.google.jimfs:jimfs:1.1" | |||
// https://mvnrepository.com/artifact/com.google.cloud/google-cloud-nio | |||
testCompile group: 'com.google.cloud', name: 'google-cloud-nio', version: '0.5.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use jimfs
instead of google-cloud for testing a different filesystem. Like that, you can avoid the issues with your GCS costs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* @param p a {@link Path} that is to be read in as text | ||
* @return an iterator over the lines in the file | ||
*/ | ||
public static IterableOnceIterator<String> readLines(final Path p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be Files.lines(Path)
used instead of creating a new utility method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
//testing GCS files: | ||
|
||
// this is almost a vcf, but not quite it's missing the #CHROM line and it has no content... | ||
{"gs://htsjdk-testdata/htsjdk/variant/Homo_sapiens_assembly38.tile_db_header.vcf", false, false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests for other FS could use Jimfs: copy the local files into Jimfs, and use the filesystem for the test to create the strings...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
26e7194
to
352d92b
Compare
had to rebase to resolve a conflit. Kept all the original commits |
Codecov Report
@@ Coverage Diff @@
## master #1018 +/- ##
===============================================
- Coverage 66.093% 66.034% -0.058%
- Complexity 7558 7592 +34
===============================================
Files 532 532
Lines 32238 32427 +189
Branches 5494 5531 +37
===============================================
+ Hits 21307 21413 +106
- Misses 8775 8847 +72
- Partials 2156 2167 +11
|
|
||
try { | ||
Files.lines(p).forEach(s -> { | ||
if (!s.trim().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of calling String.trim()
multiple times here use map()
, e.g.
Files.lines(p).map(String::trim).forEach(...)
You could also filter there too, e.g. .map(...).filter(s -> !s.isEmpty()).forEach()
|
||
while (!stack.empty()) { | ||
final Path p = stack.pop(); | ||
final String name = p.toUri().toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why call .toUri().toString()
here? Seems like p.toString()
would produce the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it does. it drops the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no scheme for a Path
. Anyway the code is testing the end of the path, so it doesn't matter what the scheme is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
@@ -1064,6 +1076,60 @@ public static String slurp(final InputStream is, final Charset charSet) { | |||
} | |||
|
|||
/** | |||
* Go through the files provided and if they have one of the provided file extensions pass the file into the output | |||
* otherwise assume that file is a list of filenames and unfold it into the output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method also recursively expands files of filenames, if one of the filenames in a file of filenames points to another file of filenames. If this is the intended behavior it should be documented and tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is tested, will update message.
* Returns true if the given path appears to be a BCF file. | ||
*/ | ||
public static boolean isBCF(final Path path) { | ||
return path.toUri().getRawPath().endsWith(".bcf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use .toUri()
here and below? In this case you could write path.toString().endsWith(".bcf")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I wanted to remove the possible junk after a ? in a URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case does a Path
have a query string? A Path
is not a URI and a query string would be invalid, as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that path could have a query string...but I could be wrong.
if (includeFiltered || !vc.isFiltered()) { | ||
String name = vc.getID(); | ||
final Integer intervalEnd = vc.getCommonInfo().getAttributeAsInt("END", vc.getEnd()); | ||
if (".".equals(name) || name == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need {
}
around if
body. Also IMO it's more natural to write if (name == null || name.equals("."))
try { | ||
return reader.iterator(); | ||
} catch (final IOException ioe) { | ||
throw new TribbleException("Could not create an iterator from a feature reader.", ioe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this code using TribbleException
if it's not part of htsjdk.tribble
? If you want an unchecked IO Exception you can use UncheckedIOException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a copy of the VCFFileReader...I made very few changes....I think unless we want a serious rewrite of htsjdk (which is out of scope for this PR) I'd vote for keeping the current (slightly annoying) behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
private static final File FIVE_SPACES_THEN_A_NEWLINE_THEN_FIVE_SPACES_FILE = new File("src/test/resources/htsjdk/samtools/io/5newline5.txt"); | ||
|
||
private static final File TEST_DATA_DIR = new File ("src/test/resources/htsjdk/samtools/io/"); | ||
private static final File SLURP_TEST_FILE = new File(TEST_DATA_DIR,"slurptest.txt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space after ,
, this occurs many places in this checkin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
private static final File EMPTY_FILE = new File("src/test/resources/htsjdk/samtools/io/empty.txt"); | ||
private static final File FIVE_SPACES_THEN_A_NEWLINE_THEN_FIVE_SPACES_FILE = new File("src/test/resources/htsjdk/samtools/io/5newline5.txt"); | ||
|
||
private static final File TEST_DATA_DIR = new File ("src/test/resources/htsjdk/samtools/io/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to use Path
here if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK...though it makes things a bit more cumbersome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. nice. thanks.
@DataProvider | ||
public Object[][] fofnData() throws IOException { | ||
Path fofnPath1 = inMemoryfileSystem.getPath("Level1.fofn"); | ||
Files.copy(new File(TEST_DATA_DIR.getAbsolutePath(),"Level1.fofn").toPath(), fofnPath1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be TEST_DATA_DIR.getPath().resolve("Level1.fofn")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comments from the phone. I will do. a proper review in one week if this is still open!
throws IOException { | ||
|
||
final InputStream inputStream; | ||
if (path.startsWith("http:") || path.startsWith("https:") || path.startsWith("ftp:")) { | ||
inputStream = getURLHelper(new URL(path)).openInputStream(); | ||
} else if (IOUtil.hasScheme(path)) { | ||
} else if (IOUtil.hasScheme(path) && !IOUtil.getScheme(path).equals("file")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe get the scheme before, and use it for checking http, ftp, etc...
inputStream = new SeekablePathStream(IOUtil.getPath(path), wrapper); | ||
} else { | ||
File file = new File(path); | ||
File file = IOUtil.getScheme(path) != null && IOUtil.getScheme(path).equals("file") ? new File(path.substring("file:".length())) : new File(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch can be used with the SeekablePathStream too, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem, I think, is that a file can represent a named pipe, which is not seekable.
@@ -16,74 +16,84 @@ | |||
import java.io.Closeable; | |||
import java.io.File; | |||
import java.io.IOException; | |||
import java.nio.file.Path; | |||
|
|||
/** | |||
* Simplified interface for reading from VCF/BCF files. | |||
*/ | |||
public class VCFFileReader implements Closeable, Iterable<VariantContext> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecate in favor of the Path version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
@@ -100,16 +101,16 @@ public static InputStream openInputStream(String path) | |||
* @return | |||
* @throws IOException | |||
*/ | |||
public static InputStream openInputStream(String path, Function<SeekableByteChannel, SeekableByteChannel> wrapper) | |||
public static InputStream openInputStream(final String path, final Function<SeekableByteChannel, SeekableByteChannel> wrapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument path
should be named uri
since it's not a path but a URI. Making the caller create a URI would be helpful too.
The javadoc for this could be improved too, it should have full sentences for its description text. And omitting the text for a @return
or @throws
is not helpful; either remove the @
form from the javadoc or add the missing documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. I don't think that changing the API is justified here....I'll develop the docs though
*/ | ||
public static SAMSequenceDictionary getSequenceDictionary(final Path path) { | ||
final SAMSequenceDictionary dict = new VCFPathReader(path, false).getFileHeader().getSequenceDictionary(); | ||
CloserUtil.close(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this trying to do? Path
is not closeable and has no close()
method.
It looks like CloserUtil
might predate Closeable
and should really be cleaned up to only accept closeable objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what I was thinking there....thanks.
*/ | ||
public VCFPathReader(final Path path, final boolean requireIndex) { | ||
// Note how we deal with type safety here, just casting to (FeatureCodec) | ||
// in the call to getFeatureReader is not enough for Java 8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the compiler can't infer the common type that both branches of the ternary op produces and the function accepts. If you cast using the parameterized type you can inline this expression, e.g. (FeatureCodec<VariantContext, ?>) (isBCF(path) ? new BCF2Codec() : new VCFCodec())
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like isBCF()
is only called in two places and in both places it's used to determine which codec to return. If so you can avoid the code duplication and and replace with isBCF()
with getCodecForPath(Path)
. Or keep isBCF()
and write:
private static FeatureCodec<VariantContext, ?> getCodecForPath(Path path) {
return isBCF(path) ? new BCF2Codec() : new VCFCodec();
}
This also eliminates the need for a cast or introducing a local variable.
public static IntervalList fromVcf(final Path path, final boolean includeFiltered) { | ||
final VCFPathReader vcfReader = new VCFPathReader(path, false); | ||
final IntervalList intervalList = fromVcf(vcfReader, includeFiltered); | ||
vcfReader.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using try-with-resources would eliminate the need to call close()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
* @return an IntervalList constructed from input vcf | ||
*/ | ||
|
||
public static IntervalList fromVcf(final VCFPathReader vcf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general a static method on object X that also accepts an argument of type X should be converted to a non-static method. E.g. this should be written as:
public IntervalList fromVcf() {
return vcf.fromVcf(false);
}
Although it's unclear to me if the caller would know that this is going to produce an interval list. Normally you see static fromX()
methods to convert object type X to another type. So a better name here might be toIntervalList()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
back to you @pshapiro4broad |
if (URL_SCHEMES.contains(scheme)) { | ||
inputStream = getURLHelper(new URL(uri)).openInputStream(); | ||
} else if (scheme.isEmpty()) { | ||
File file = new File(uri); | ||
inputStream = new FileInputStream(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when possible use Files.newInputStream
instead of new FileInputStream(
( see https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
inputStream = new SeekablePathStream(IOUtil.getPath(path), wrapper); | ||
} else { | ||
File file = new File(path); | ||
final Set<String> URL_SCHEMES = new HashSet<>(Arrays.asList("http","ftp","https")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should make this a private static to avoid creating every time the method is called.
@@ -111,7 +144,10 @@ SAMSequenceDictionary extractDictionary(final File intervalList) { | |||
applicableExtensions = extensions; | |||
} | |||
|
|||
/** @deprecated in favor of extractDictionary(final Path file) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use @link
here so it shows up as a hyperlink, e.g.
/**
* @deprecated use {@link #extractDictionary(Path)} instead
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -111,7 +144,10 @@ SAMSequenceDictionary extractDictionary(final File intervalList) { | |||
applicableExtensions = extensions; | |||
} | |||
|
|||
/** @deprecated in favor of extractDictionary(final Path file) */ | |||
@Deprecated | |||
abstract SAMSequenceDictionary extractDictionary(final File file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to make this non-abstract and implement it as
return extractDictionary(file.toPath());
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
} finally { | ||
CloserUtil.close(vcfFileReader); | ||
CloserUtil.close(vcfReader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace this close()
with try-with-resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
}, | ||
VCF(IOUtil.VCF_EXTENSIONS) { | ||
@Override | ||
SAMSequenceDictionary extractDictionary(final File vcf) { | ||
VCFFileReader vcfFileReader = null; | ||
VCFPathReader vcfReader = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think renaming the class is necessary. Even though the API now accepts/requires Path
it is still reading a VCF File. The implementation detail of how a file is represented has changed but the functionality hasn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -124,6 +160,17 @@ static TYPE forFile(final File dictionaryExtractable) { | |||
throw new SAMException("Cannot figure out type of file " + dictionaryExtractable.getAbsolutePath() + " from extension. Current implementation understands the following types: " + Arrays.toString(TYPE.values())); | |||
} | |||
|
|||
static TYPE forPath(final Path dictionaryExtractable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum should really be called Type
as uppercase is typically reserved for static final objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry...that ship has sailed....I can file it under "things to do in htsjdk3"...
Tests now pass... @jacarey ? @magicDGS ? @pshapiro4broad ? any more comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below.
In addition, some test data is removed but it does not look that it correspond to this PR. Maybe it should be done in a different one...
inputStream = getURLHelper(new URL(uri)).openInputStream(); | ||
} else if (scheme.isEmpty()) { | ||
File file = new File(uri); | ||
inputStream = Files.newInputStream(file.toPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this handle the case of named-pipes, as @yfarjoun commented in my previous comment? In addition, is there any way to differentiate the pipes from files? If the scheme is empty and it is a File
, it feels wrong to me that the method does not return a SeekablePathStream
and it does if the scheme is "file:".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear.... @lbergelson ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume named pipes can be referred to as either a file or a path, which means that we'd need a different mechanism to check for them. We don't ever do a check for isRegularFile
which is I think what we'd need to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much we care if we return a SeekablePathStream for a file, vs a simple input stream. They should have similar performance I hope, and this function only guarantees an input stream. It is weird though to return different types depending on how you write the path though and it probably has some weird knock on effects. That said, it's not a change from how it was set up before is it?
Why did you change from the previous order and using hasScheme? It seems like you added a new local variable and some additional logic and then only use it to do what the simpler check accomplished?
@@ -142,6 +142,9 @@ public SamReader open(final Path path, | |||
/** Utility method to open the file get the header and close the file */ | |||
abstract public SAMFileHeader getFileHeader(File samFile); | |||
|
|||
/** Utility method to open the file get the header and close the file */ | |||
abstract public SAMFileHeader getFileHeader(Path samFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe HTSJDK should start deprecating the File
versions of the methods in favor of the Path
versions. Anyway, for users using File
the change is as simple as using File.toPath
if there are no differences in the methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably only start doing that when we have full support of Path....
*/ | ||
@Deprecated | ||
public VCFFileReader(final File file, final boolean requireIndex) { | ||
// Note how we deal with type safety here, just casting to (FeatureCodec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to call this(file.toPath(), requireIndex)
instead of doing the magic in place (it should result in the same reader).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
requireIndex); | ||
// Note how we deal with type safety here, just casting to (FeatureCodec) | ||
// in the call to getFeatureReader is not enough for Java 8. | ||
FeatureCodec<VariantContext, ?> codec = isBCF(file) ? new BCF2Codec() : new VCFCodec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@Test | ||
public void testCloneAndEquals() throws Exception { | ||
final SRAFileReader reader = new SRAFileReader(DEFAULT_ACCESSION); | ||
@Test(dataProvider = "serializationTestData") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this correspond to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no....but I noticed it, so I fixed it...it requires a provider but wasn't given one....
|
||
/** | ||
* User: jacob | ||
* Date: 2012-Dec-13 | ||
*/ | ||
public class TestUtils { | ||
public static String DATA_DIR = "src/test/resources/htsjdk/tribble/"; | ||
|
||
|
||
public static Path getTribbleFileInJimfs(String vcf, String index, FileSystem fileSystem) throws IOException, URISyntaxException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a javadoc explaining that the FileSystem
should be a JimfsFileSystem
. It took me a while to figure out what was it.
Another option is to provide a destination JimfsPath
instead to get the FileSystem
from it.
if (index != null) { | ||
final Path idxPath = Paths.get(index); | ||
final Path idxDestination = Paths.get(AbstractFeatureReader.isTabix(vcf, index) ? Tribble.tabixIndexFile(vcf) : Tribble.indexFile(vcf)); | ||
Files.copy(idxPath, root.resolve(idxDestination.getFileName().toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can resolve with the root the vcfPath for the JimfsFileSystem
before, and use the Tribble methods for get the idxDestination name without the need to resolve it again against the root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. that's a bit simpler.
@@ -48,7 +48,6 @@ | |||
new Object[]{"Homo_sapiens_assembly18.trimmed.fasta", "Homo_sapiens_assembly18.trimmed.dict"}, | |||
new Object[]{"test2_comp.interval_list", "Homo_sapiens_assembly18.trimmed.dict"}, | |||
new Object[]{"ScreenSamReads.100.input.sam", "test3_comp.interval_list"}, | |||
new Object[]{"ScreenSamReads.100.input.sam", "test4_comp.interval_list"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test case removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure...putting it back.
/** | ||
* Created by farjoun on 10/12/17. | ||
*/ | ||
public class VCFPathReaderTest extends HtsjdkTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you unified the classes, this should be named VCFFileReaderTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks.
@@ -48,6 +49,7 @@ | |||
import java.io.PrintWriter; | |||
import java.io.StringReader; | |||
import java.math.BigInteger; | |||
import java.nio.file.Path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this file are not longer need it.
…leReader and deprecating the File versions
e76de8d
to
2d26b7a
Compare
- responding to review comments.
@magicDGS responded to your comments. @jacarey @tfenne @lbergelson, this needs a directive from the maintainers I think... |
👍 for my point of view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfarjoun Some minor quibbles. Seems fine. There's bunch of now deprecated methods that are now untested, we might want to test both versions until we remove the deprecated ones. We've not been deprecating the file methods until now, but it seems fine to do so, it' just a bit annoying for downstream projects. I'm not sure if it's better to deprecate piecemeal as we add the new versions or all at once when we're ready to pull the plug on files.
* Go through the files provided and if they have one of the provided file extensions pass the file to the output | ||
* otherwise assume that file is a list of filenames and unfold it into the output (recursively). | ||
*/ | ||
public static List<Path> unrollPaths(final Collection<Path> inputs, final String... extensions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfarjoun Is there a reason we can't replace the file version with a call to the new path code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, it returns a List<Path>
instead of a List<File>
... is that a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do the conversion on the way in and out. It's slightly less efficient, but probably worth it to avoid a bunch of lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough.
@@ -1101,6 +1181,20 @@ public static Path getPath(String uriString) throws IOException { | |||
} | |||
} | |||
|
|||
public static List<Path> getPaths(List<String> uriStrings) throws RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is untested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
try { | ||
return IOUtil.getPath(s); | ||
} catch (IOException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RuntimeIOException maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
}).collect(Collectors.toList()); | ||
} | ||
|
||
public static List<Path> promoteFiles(List<File> files){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a better name. maybe filesToPaths
? It's also untested and undocumented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
* @param wrapper to wrap the input stream in, may be used to implement caching or prefetching, etc | ||
* @return | ||
* @throws IOException | ||
* @return An inputStream appropriately created from uri and conditionally wrapped with wrapper (only in certain cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for updating the doc. missing a closing )
inputStream = getURLHelper(new URL(uri)).openInputStream(); | ||
} else if (scheme.isEmpty()) { | ||
File file = new File(uri); | ||
inputStream = Files.newInputStream(file.toPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume named pipes can be referred to as either a file or a path, which means that we'd need a different mechanism to check for them. We don't ever do a check for isRegularFile
which is I think what we'd need to do.
throws IOException { | ||
|
||
// will be empty string if there's no scheme | ||
final String scheme = Optional.ofNullable(IOUtil.getScheme(uri)).orElse(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getScheme
doesn't return a null, there's no need for this complex dance. It's also definitely preferable to just have a ternary expression that checks for null instead of creating a new nullable just to check if it's empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at getScheme:
* @return The scheme component of this URI,
* or {@code null} if the scheme is undefined
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I use a ternary, I need to either add a line of code (to save the result of getSceme) or call it twice....I'll do it if you insist, but I thought that this is a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not calling uri.get scheme, you're calling your own version that you added, IOUtil.getScheme() which returns true/false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ternary or not doesn't really matter, it's just kind of ugly to create a new wrapper object just to hide the null check. I see your point about needing to call it twice or store it in a temp variable. In any case, it feels like something is weird here. You never actually use the scheme you get, you just check if it's empty. Isn't calling IOUtil.hasScheme
, which is what was originally done here correct?
inputStream = getURLHelper(new URL(uri)).openInputStream(); | ||
} else if (scheme.isEmpty()) { | ||
File file = new File(uri); | ||
inputStream = Files.newInputStream(file.toPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much we care if we return a SeekablePathStream for a file, vs a simple input stream. They should have similar performance I hope, and this function only guarantees an input stream. It is weird though to return different types depending on how you write the path though and it probably has some weird knock on effects. That said, it's not a change from how it was set up before is it?
Why did you change from the previous order and using hasScheme? It seems like you added a new local variable and some additional logic and then only use it to do what the simpler check accomplished?
* | ||
* @deprecated in favor of {@link VCFFileReader#isBCF(Path)} | ||
*/ | ||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've typically been avoiding deprecating the file methods so far, but it would be nice to move away from them. Probably fine to do so here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, if we deprecate them, we should remove uses of them except for direct tests. Lots of test code is using the File
constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a larger effort than this PR is meant to do...reversing the deprecations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be on the roadmap for htsjdk3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any plan appart of #520 to implement htsjdk3?
not sure what you're asking me to do about the |
another look @lbergelson ? |
@yfarjoun A fwe new comments. What I meant with regular file is that we don't ever check for fifos so we're handling them weirdly to begin with, your pr doesn't make it worse as far as I can tell. We should do it correctly, but I think that's a different pr that should hit a lot more places in the codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
🎉 |
Description
Enables the use of Path, and therefore NIO implementations such as google-cloud-nio when opening VCF readers. (though the google-cloud-nio library is not added)
Cleans up ununsed test files and some white-space/java8 issues
Checklist