-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading OpenDocument spreadsheet (.ods) files #3
Comments
Did some digging into this one. There certainly aren't many ODF/ODS JVM processing libraries, but I did find a couple. This library seems to be a low level-ish wrapper over the raw ODS XML. We could potentially write our own ODS implementation on top of this, pending some more investigation. This is a higher level library that appears to be written by the same people who created ODFDOM, and is a nice wrapper over ODFDOM. This seemed to be what we were looking for, so I decided to give it a test run to make sure it used constant memory. Here was my test program: import org.odftoolkit.simple.SpreadsheetDocument;
import org.odftoolkit.simple.table.Table;
public class Sandbox {
public static void main(String[] args)
{
try {
SpreadsheetDocument doc = SpreadsheetDocument.loadDocument(Sandbox.class.getResourceAsStream("one_hundred_thousand.ods"));
Table table = doc.getSheetByIndex(0);
table.getRowIterator().forEachRemaining(row -> System.out.println(row.toString()));
} catch (Exception e) {
System.err.println(e);
}
}
} Shortly after starting this up with a 100k row ODS, memory usage for my Java process jumped to max usage (currently configured to be ~2GB), and printed nothing for the couple minutes I let it run. I tried again with an ODS file that had 1 million rows and was immediately met with an Their license scheme requires money to be paid for closed source applications, which isn't what we seem to be going for with this project: http://www.jopendocument.org/support.html. Seems like a showstopper. |
The ODFDOM part of the Apache ODF Toolkit is also out of the picture. It is well designed, and they have a clean split between the XML handling and the ZIP handling, but neither support streaming.
While the file format of an ODS file is surprisingly similar to an XLSX (they are both ZIP files with XML in them), the OpenDocument format is much better designed and has none of the problems outlined in #17 if we want to write our own streaming reader. I will see if I can write a basic reader from scratch that supports streaming. The whole format spec is online, and this document actually is really helpful to see how you might implement your own parser: http://incubator.apache.org/odftoolkit/odfdom/Layers.html. Almost as easy as parsing CSV, dare I say... |
OpenDocument spreadsheets are the preferred open spreadsheet format outside of Microsoft Office. We should support reading them at the very least for completeness.
The text was updated successfully, but these errors were encountered: