Skip to content

Commit

Permalink
Ensure streams support marks during detection
Browse files Browse the repository at this point in the history
Ensure we pass in a rewindable InputStream to Tika so that we can start from the beginning of the stream when we do the actual file parsing.

Fixes #20.
  • Loading branch information
sagebind committed Sep 14, 2018
1 parent 22ae98d commit 82d29e5
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions src/main/java/com/widen/tabitha/RowReaderFactory.java
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
package com.widen.tabitha;

import com.widen.tabitha.formats.delimited.DelimitedRowReader;
import com.widen.tabitha.formats.delimited.DelimitedFormat;
import com.widen.tabitha.formats.excel.WorkbookRowReader;
import com.widen.tabitha.formats.delimited.DelimitedRowReader;
import com.widen.tabitha.formats.excel.XLSRowReader;
import com.widen.tabitha.formats.excel.XLSXRowReader;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.tika.Tika;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
Expand Down Expand Up @@ -69,6 +68,9 @@ public static Optional<RowReader> open(InputStream inputStream) throws IOExcepti
* @return A row reader if the stream is in a supported format.
*/
public static Optional<RowReader> open(InputStream inputStream, String filename) throws IOException {
// If our input stream supports marks, Tika will rewind the stream back to the start for us after detecting the
// format, so ensure our input stream supports it.
inputStream = createRewindableInputStream(inputStream);
String mimeType = tika.detect(inputStream, filename);

switch (mimeType) {
Expand All @@ -90,6 +92,10 @@ public static Optional<RowReader> open(InputStream inputStream, String filename)
return Optional.empty();
}

private static InputStream createRewindableInputStream(InputStream inputStream) {
return inputStream.markSupported() ? inputStream : new BufferedInputStream(inputStream);
}

// Apache Tika instance for detecting MIME types.
private static final Tika tika = new Tika();
}

0 comments on commit 82d29e5

Please sign in to comment.