Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable VcfFileReader and other classes to use Path #1018

Merged
merged 24 commits into from
Dec 21, 2017
Merged

Conversation

yfarjoun
Copy link
Contributor

@yfarjoun yfarjoun commented Oct 26, 2017

Description

  • Enables the use of Path, and therefore NIO implementations such as google-cloud-nio when opening VCF readers. (though the google-cloud-nio library is not added)

  • Cleans up ununsed test files and some white-space/java8 issues

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

Copy link
Member

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree on using GCS for the tests, more when there is already another filesystem for testing (jimfs).

build.gradle Outdated
@@ -43,6 +43,8 @@ dependencies {
testRuntime 'org.pegdown:pegdown:1.4.2' // Necessary for generating HTML reports with ScalaTest
testCompile "org.testng:testng:6.9.9"
testCompile "com.google.jimfs:jimfs:1.1"
// https://mvnrepository.com/artifact/com.google.cloud/google-cloud-nio
testCompile group: 'com.google.cloud', name: 'google-cloud-nio', version: '0.5.0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use jimfs instead of google-cloud for testing a different filesystem. Like that, you can avoid the issues with your GCS costs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param p a {@link Path} that is to be read in as text
* @return an iterator over the lines in the file
*/
public static IterableOnceIterator<String> readLines(final Path p) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be Files.lines(Path) used instead of creating a new utility method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

//testing GCS files:

// this is almost a vcf, but not quite it's missing the #CHROM line and it has no content...
{"gs://htsjdk-testdata/htsjdk/variant/Homo_sapiens_assembly38.tile_db_header.vcf", false, false},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for other FS could use Jimfs: copy the local files into Jimfs, and use the filesystem for the test to create the strings...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@yfarjoun
Copy link
Contributor Author

had to rebase to resolve a conflit. Kept all the original commits

@codecov-io
Copy link

codecov-io commented Nov 15, 2017

Codecov Report

Merging #1018 into master will decrease coverage by 0.058%.
The diff coverage is 60.274%.

@@               Coverage Diff               @@
##              master     #1018       +/-   ##
===============================================
- Coverage     66.093%   66.034%   -0.058%     
- Complexity      7558      7592       +34     
===============================================
  Files            532       532               
  Lines          32238     32427      +189     
  Branches        5494      5531       +37     
===============================================
+ Hits           21307     21413      +106     
- Misses          8775      8847       +72     
- Partials        2156      2167       +11
Impacted Files Coverage Δ Complexity Δ
...va/htsjdk/tribble/TribbleIndexedFeatureReader.java 68.889% <ø> (+1.667%) 24 <0> (+2) ⬆️
...rc/main/java/htsjdk/samtools/SamReaderFactory.java 63.861% <100%> (+0.73%) 7 <0> (ø) ⬇️
...rc/main/java/htsjdk/tribble/util/ParsingUtils.java 74.346% <100%> (+0.801%) 46 <3> (-1) ⬇️
...rc/main/java/htsjdk/variant/vcf/VCFFileReader.java 50% <55.556%> (-22.549%) 21 <18> (+5)
src/main/java/htsjdk/samtools/util/IOUtil.java 38.083% <57.143%> (+2.721%) 80 <9> (+9) ⬆️
.../variant/utils/SAMSequenceDictionaryExtractor.java 72.5% <61.538%> (-6.987%) 2 <2> (+1)
...samtools/util/AsyncBlockCompressedInputStream.java 72% <0%> (-4%) 12% <0%> (-1%)
...a/htsjdk/samtools/util/AbstractProgressLogger.java 38.889% <0%> (-1.111%) 9% <0%> (+3%)
... and 4 more

@yfarjoun
Copy link
Contributor Author

@magicDGS @jacarey I fixed things as per your comments...care to take another glance?


try {
Files.lines(p).forEach(s -> {
if (!s.trim().isEmpty()) {
Copy link
Contributor

@pshapiro4broad pshapiro4broad Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of calling String.trim() multiple times here use map(), e.g.

Files.lines(p).map(String::trim).forEach(...)

You could also filter there too, e.g. .map(...).filter(s -> !s.isEmpty()).forEach()


while (!stack.empty()) {
final Path p = stack.pop();
final String name = p.toUri().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call .toUri().toString() here? Seems like p.toString() would produce the same result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it does. it drops the schema.

Copy link
Contributor

@pshapiro4broad pshapiro4broad Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no scheme for a Path. Anyway the code is testing the end of the path, so it doesn't matter what the scheme is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@@ -1064,6 +1076,60 @@ public static String slurp(final InputStream is, final Charset charSet) {
}

/**
* Go through the files provided and if they have one of the provided file extensions pass the file into the output
* otherwise assume that file is a list of filenames and unfold it into the output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method also recursively expands files of filenames, if one of the filenames in a file of filenames points to another file of filenames. If this is the intended behavior it should be documented and tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is tested, will update message.

* Returns true if the given path appears to be a BCF file.
*/
public static boolean isBCF(final Path path) {
return path.toUri().getRawPath().endsWith(".bcf");
Copy link
Contributor

@pshapiro4broad pshapiro4broad Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use .toUri() here and below? In this case you could write path.toString().endsWith(".bcf")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I wanted to remove the possible junk after a ? in a URL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case does a Path have a query string? A Path is not a URI and a query string would be invalid, as far as I know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that path could have a query string...but I could be wrong.

if (includeFiltered || !vc.isFiltered()) {
String name = vc.getID();
final Integer intervalEnd = vc.getCommonInfo().getAttributeAsInt("END", vc.getEnd());
if (".".equals(name) || name == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need { } around if body. Also IMO it's more natural to write if (name == null || name.equals("."))

try {
return reader.iterator();
} catch (final IOException ioe) {
throw new TribbleException("Could not create an iterator from a feature reader.", ioe);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code using TribbleException if it's not part of htsjdk.tribble? If you want an unchecked IO Exception you can use UncheckedIOException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a copy of the VCFFileReader...I made very few changes....I think unless we want a serious rewrite of htsjdk (which is out of scope for this PR) I'd vote for keeping the current (slightly annoying) behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

private static final File FIVE_SPACES_THEN_A_NEWLINE_THEN_FIVE_SPACES_FILE = new File("src/test/resources/htsjdk/samtools/io/5newline5.txt");

private static final File TEST_DATA_DIR = new File ("src/test/resources/htsjdk/samtools/io/");
private static final File SLURP_TEST_FILE = new File(TEST_DATA_DIR,"slurptest.txt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space after ,, this occurs many places in this checkin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

private static final File EMPTY_FILE = new File("src/test/resources/htsjdk/samtools/io/empty.txt");
private static final File FIVE_SPACES_THEN_A_NEWLINE_THEN_FIVE_SPACES_FILE = new File("src/test/resources/htsjdk/samtools/io/5newline5.txt");

private static final File TEST_DATA_DIR = new File ("src/test/resources/htsjdk/samtools/io/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to use Path here if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK...though it makes things a bit more cumbersome

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. nice. thanks.

@DataProvider
public Object[][] fofnData() throws IOException {
Path fofnPath1 = inMemoryfileSystem.getPath("Level1.fofn");
Files.copy(new File(TEST_DATA_DIR.getAbsolutePath(),"Level1.fofn").toPath(), fofnPath1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be TEST_DATA_DIR.getPath().resolve("Level1.fofn")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. thanks.

Copy link
Member

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comments from the phone. I will do. a proper review in one week if this is still open!

throws IOException {

final InputStream inputStream;
if (path.startsWith("http:") || path.startsWith("https:") || path.startsWith("ftp:")) {
inputStream = getURLHelper(new URL(path)).openInputStream();
} else if (IOUtil.hasScheme(path)) {
} else if (IOUtil.hasScheme(path) && !IOUtil.getScheme(path).equals("file")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe get the scheme before, and use it for checking http, ftp, etc...

inputStream = new SeekablePathStream(IOUtil.getPath(path), wrapper);
} else {
File file = new File(path);
File file = IOUtil.getScheme(path) != null && IOUtil.getScheme(path).equals("file") ? new File(path.substring("file:".length())) : new File(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch can be used with the SeekablePathStream too, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem, I think, is that a file can represent a named pipe, which is not seekable.

@@ -16,74 +16,84 @@
import java.io.Closeable;
import java.io.File;
import java.io.IOException;
import java.nio.file.Path;

/**
* Simplified interface for reading from VCF/BCF files.
*/
public class VCFFileReader implements Closeable, Iterable<VariantContext> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecate in favor of the Path version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@@ -100,16 +101,16 @@ public static InputStream openInputStream(String path)
* @return
* @throws IOException
*/
public static InputStream openInputStream(String path, Function<SeekableByteChannel, SeekableByteChannel> wrapper)
public static InputStream openInputStream(final String path, final Function<SeekableByteChannel, SeekableByteChannel> wrapper)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument path should be named uri since it's not a path but a URI. Making the caller create a URI would be helpful too.

The javadoc for this could be improved too, it should have full sentences for its description text. And omitting the text for a @return or @throws is not helpful; either remove the @ form from the javadoc or add the missing documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. I don't think that changing the API is justified here....I'll develop the docs though

*/
public static SAMSequenceDictionary getSequenceDictionary(final Path path) {
final SAMSequenceDictionary dict = new VCFPathReader(path, false).getFileHeader().getSequenceDictionary();
CloserUtil.close(path);
Copy link
Contributor

@pshapiro4broad pshapiro4broad Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this trying to do? Path is not closeable and has no close() method.

It looks like CloserUtil might predate Closeable and should really be cleaned up to only accept closeable objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what I was thinking there....thanks.

*/
public VCFPathReader(final Path path, final boolean requireIndex) {
// Note how we deal with type safety here, just casting to (FeatureCodec)
// in the call to getFeatureReader is not enough for Java 8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the compiler can't infer the common type that both branches of the ternary op produces and the function accepts. If you cast using the parameterized type you can inline this expression, e.g. (FeatureCodec<VariantContext, ?>) (isBCF(path) ? new BCF2Codec() : new VCFCodec()) works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like isBCF() is only called in two places and in both places it's used to determine which codec to return. If so you can avoid the code duplication and and replace with isBCF() with getCodecForPath(Path). Or keep isBCF() and write:

    private static FeatureCodec<VariantContext, ?> getCodecForPath(Path path) {
        return isBCF(path) ? new BCF2Codec() : new VCFCodec();
    }

This also eliminates the need for a cast or introducing a local variable.

public static IntervalList fromVcf(final Path path, final boolean includeFiltered) {
final VCFPathReader vcfReader = new VCFPathReader(path, false);
final IntervalList intervalList = fromVcf(vcfReader, includeFiltered);
vcfReader.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using try-with-resources would eliminate the need to call close() here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

* @return an IntervalList constructed from input vcf
*/

public static IntervalList fromVcf(final VCFPathReader vcf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general a static method on object X that also accepts an argument of type X should be converted to a non-static method. E.g. this should be written as:

public IntervalList fromVcf() {
    return vcf.fromVcf(false);
}

Although it's unclear to me if the caller would know that this is going to produce an interval list. Normally you see static fromX() methods to convert object type X to another type. So a better name here might be toIntervalList().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

@yfarjoun
Copy link
Contributor Author

back to you @pshapiro4broad

if (URL_SCHEMES.contains(scheme)) {
inputStream = getURLHelper(new URL(uri)).openInputStream();
} else if (scheme.isEmpty()) {
File file = new File(uri);
inputStream = new FileInputStream(file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when possible use Files.newInputStream instead of new FileInputStream( ( see https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

inputStream = new SeekablePathStream(IOUtil.getPath(path), wrapper);
} else {
File file = new File(path);
final Set<String> URL_SCHEMES = new HashSet<>(Arrays.asList("http","ftp","https"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should make this a private static to avoid creating every time the method is called.

@@ -111,7 +144,10 @@ SAMSequenceDictionary extractDictionary(final File intervalList) {
applicableExtensions = extensions;
}

/** @deprecated in favor of extractDictionary(final Path file) */
Copy link
Contributor

@pshapiro4broad pshapiro4broad Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use @link here so it shows up as a hyperlink, e.g.

/**
 * @deprecated use {@link #extractDictionary(Path)} instead
 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -111,7 +144,10 @@ SAMSequenceDictionary extractDictionary(final File intervalList) {
applicableExtensions = extensions;
}

/** @deprecated in favor of extractDictionary(final Path file) */
@Deprecated
abstract SAMSequenceDictionary extractDictionary(final File file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to make this non-abstract and implement it as

return extractDictionary(file.toPath());

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

} finally {
CloserUtil.close(vcfFileReader);
CloserUtil.close(vcfReader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace this close() with try-with-resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

},
VCF(IOUtil.VCF_EXTENSIONS) {
@Override
SAMSequenceDictionary extractDictionary(final File vcf) {
VCFFileReader vcfFileReader = null;
VCFPathReader vcfReader = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't think renaming the class is necessary. Even though the API now accepts/requires Path it is still reading a VCF File. The implementation detail of how a file is represented has changed but the functionality hasn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -124,6 +160,17 @@ static TYPE forFile(final File dictionaryExtractable) {
throw new SAMException("Cannot figure out type of file " + dictionaryExtractable.getAbsolutePath() + " from extension. Current implementation understands the following types: " + Arrays.toString(TYPE.values()));
}

static TYPE forPath(final Path dictionaryExtractable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum should really be called Type as uppercase is typically reserved for static final objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry...that ship has sailed....I can file it under "things to do in htsjdk3"...

@yfarjoun yfarjoun changed the title Yf vcf from path Enable VcfFileReader and other classes to use Path Nov 18, 2017
@samtools samtools deleted a comment from jacarey Nov 18, 2017
@yfarjoun
Copy link
Contributor Author

Tests now pass... @jacarey ? @magicDGS ? @pshapiro4broad ? any more comments?

Copy link
Member

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments below.

In addition, some test data is removed but it does not look that it correspond to this PR. Maybe it should be done in a different one...

inputStream = getURLHelper(new URL(uri)).openInputStream();
} else if (scheme.isEmpty()) {
File file = new File(uri);
inputStream = Files.newInputStream(file.toPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this handle the case of named-pipes, as @yfarjoun commented in my previous comment? In addition, is there any way to differentiate the pipes from files? If the scheme is empty and it is a File, it feels wrong to me that the method does not return a SeekablePathStream and it does if the scheme is "file:".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear.... @lbergelson ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume named pipes can be referred to as either a file or a path, which means that we'd need a different mechanism to check for them. We don't ever do a check for isRegularFile which is I think what we'd need to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much we care if we return a SeekablePathStream for a file, vs a simple input stream. They should have similar performance I hope, and this function only guarantees an input stream. It is weird though to return different types depending on how you write the path though and it probably has some weird knock on effects. That said, it's not a change from how it was set up before is it?

Why did you change from the previous order and using hasScheme? It seems like you added a new local variable and some additional logic and then only use it to do what the simpler check accomplished?

@@ -142,6 +142,9 @@ public SamReader open(final Path path,
/** Utility method to open the file get the header and close the file */
abstract public SAMFileHeader getFileHeader(File samFile);

/** Utility method to open the file get the header and close the file */
abstract public SAMFileHeader getFileHeader(Path samFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe HTSJDK should start deprecating the File versions of the methods in favor of the Path versions. Anyway, for users using File the change is as simple as using File.toPath if there are no differences in the methods...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably only start doing that when we have full support of Path....

*/
@Deprecated
public VCFFileReader(final File file, final boolean requireIndex) {
// Note how we deal with type safety here, just casting to (FeatureCodec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to call this(file.toPath(), requireIndex) instead of doing the magic in place (it should result in the same reader).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

requireIndex);
// Note how we deal with type safety here, just casting to (FeatureCodec)
// in the call to getFeatureReader is not enough for Java 8.
FeatureCodec<VariantContext, ?> codec = isBCF(file) ? new BCF2Codec() : new VCFCodec();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@Test
public void testCloneAndEquals() throws Exception {
final SRAFileReader reader = new SRAFileReader(DEFAULT_ACCESSION);
@Test(dataProvider = "serializationTestData")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this correspond to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no....but I noticed it, so I fixed it...it requires a provider but wasn't given one....


/**
* User: jacob
* Date: 2012-Dec-13
*/
public class TestUtils {
public static String DATA_DIR = "src/test/resources/htsjdk/tribble/";


public static Path getTribbleFileInJimfs(String vcf, String index, FileSystem fileSystem) throws IOException, URISyntaxException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a javadoc explaining that the FileSystem should be a JimfsFileSystem. It took me a while to figure out what was it.

Another option is to provide a destination JimfsPath instead to get the FileSystem from it.

if (index != null) {
final Path idxPath = Paths.get(index);
final Path idxDestination = Paths.get(AbstractFeatureReader.isTabix(vcf, index) ? Tribble.tabixIndexFile(vcf) : Tribble.indexFile(vcf));
Files.copy(idxPath, root.resolve(idxDestination.getFileName().toString()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can resolve with the root the vcfPath for the JimfsFileSystem before, and use the Tribble methods for get the idxDestination name without the need to resolve it again against the root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. that's a bit simpler.

@@ -48,7 +48,6 @@
new Object[]{"Homo_sapiens_assembly18.trimmed.fasta", "Homo_sapiens_assembly18.trimmed.dict"},
new Object[]{"test2_comp.interval_list", "Homo_sapiens_assembly18.trimmed.dict"},
new Object[]{"ScreenSamReads.100.input.sam", "test3_comp.interval_list"},
new Object[]{"ScreenSamReads.100.input.sam", "test4_comp.interval_list"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test case removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure...putting it back.

/**
* Created by farjoun on 10/12/17.
*/
public class VCFPathReaderTest extends HtsjdkTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you unified the classes, this should be named VCFFileReaderTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

@@ -48,6 +49,7 @@
import java.io.PrintWriter;
import java.io.StringReader;
import java.math.BigInteger;
import java.nio.file.Path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this file are not longer need it.

@yfarjoun
Copy link
Contributor Author

@magicDGS responded to your comments.

@jacarey @tfenne @lbergelson, this needs a directive from the maintainers I think...

@magicDGS
Copy link
Member

👍 for my point of view.

@yfarjoun
Copy link
Contributor Author

@lbergelson ?

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfarjoun Some minor quibbles. Seems fine. There's bunch of now deprecated methods that are now untested, we might want to test both versions until we remove the deprecated ones. We've not been deprecating the file methods until now, but it seems fine to do so, it' just a bit annoying for downstream projects. I'm not sure if it's better to deprecate piecemeal as we add the new versions or all at once when we're ready to pull the plug on files.

* Go through the files provided and if they have one of the provided file extensions pass the file to the output
* otherwise assume that file is a list of filenames and unfold it into the output (recursively).
*/
public static List<Path> unrollPaths(final Collection<Path> inputs, final String... extensions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfarjoun Is there a reason we can't replace the file version with a call to the new path code?

Copy link
Contributor Author

@yfarjoun yfarjoun Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it returns a List<Path> instead of a List<File>... is that a reason?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do the conversion on the way in and out. It's slightly less efficient, but probably worth it to avoid a bunch of lines of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough.

@@ -1101,6 +1181,20 @@ public static Path getPath(String uriString) throws IOException {
}
}

public static List<Path> getPaths(List<String> uriStrings) throws RuntimeException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is untested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

try {
return IOUtil.getPath(s);
} catch (IOException e) {
throw new RuntimeException(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeIOException maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}).collect(Collectors.toList());
}

public static List<Path> promoteFiles(List<File> files){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a better name. maybe filesToPaths? It's also untested and undocumented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

* @param wrapper to wrap the input stream in, may be used to implement caching or prefetching, etc
* @return
* @throws IOException
* @return An inputStream appropriately created from uri and conditionally wrapped with wrapper (only in certain cases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for updating the doc. missing a closing )

inputStream = getURLHelper(new URL(uri)).openInputStream();
} else if (scheme.isEmpty()) {
File file = new File(uri);
inputStream = Files.newInputStream(file.toPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume named pipes can be referred to as either a file or a path, which means that we'd need a different mechanism to check for them. We don't ever do a check for isRegularFile which is I think what we'd need to do.

throws IOException {

// will be empty string if there's no scheme
final String scheme = Optional.ofNullable(IOUtil.getScheme(uri)).orElse("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getScheme doesn't return a null, there's no need for this complex dance. It's also definitely preferable to just have a ternary expression that checks for null instead of creating a new nullable just to check if it's empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at getScheme:

     * @return  The scheme component of this URI,
     *          or {@code null} if the scheme is undefined
     */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I use a ternary, I need to either add a line of code (to save the result of getSceme) or call it twice....I'll do it if you insist, but I thought that this is a better solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not calling uri.get scheme, you're calling your own version that you added, IOUtil.getScheme() which returns true/false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ternary or not doesn't really matter, it's just kind of ugly to create a new wrapper object just to hide the null check. I see your point about needing to call it twice or store it in a temp variable. In any case, it feels like something is weird here. You never actually use the scheme you get, you just check if it's empty. Isn't calling IOUtil.hasScheme, which is what was originally done here correct?

inputStream = getURLHelper(new URL(uri)).openInputStream();
} else if (scheme.isEmpty()) {
File file = new File(uri);
inputStream = Files.newInputStream(file.toPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much we care if we return a SeekablePathStream for a file, vs a simple input stream. They should have similar performance I hope, and this function only guarantees an input stream. It is weird though to return different types depending on how you write the path though and it probably has some weird knock on effects. That said, it's not a change from how it was set up before is it?

Why did you change from the previous order and using hasScheme? It seems like you added a new local variable and some additional logic and then only use it to do what the simpler check accomplished?

*
* @deprecated in favor of {@link VCFFileReader#isBCF(Path)}
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've typically been avoiding deprecating the file methods so far, but it would be nice to move away from them. Probably fine to do so here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, if we deprecate them, we should remove uses of them except for direct tests. Lots of test code is using the File constructors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a larger effort than this PR is meant to do...reversing the deprecations...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be on the roadmap for htsjdk3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any plan appart of #520 to implement htsjdk3?

@yfarjoun
Copy link
Contributor Author

not sure what you're asking me to do about the isRegularFile...can you elaborate?

@yfarjoun
Copy link
Contributor Author

another look @lbergelson ?

@lbergelson
Copy link
Member

@yfarjoun A fwe new comments. What I meant with regular file is that we don't ever check for fifos so we're handling them weirdly to begin with, your pr doesn't make it worse as far as I can tell. We should do it correctly, but I think that's a different pr that should hit a lot more places in the codebase.

@yfarjoun
Copy link
Contributor Author

@lbergelson ?

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lbergelson lbergelson merged commit 80122b9 into master Dec 21, 2017
@lbergelson lbergelson deleted the yf_vcf_from_path branch December 21, 2017 16:24
@yfarjoun
Copy link
Contributor Author

yfarjoun commented Dec 21, 2017

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants