Skip to content

Parse Analyze Operate (Big) Data (on N servers in parallel)

landawn edited this page May 1, 2018 · 2 revisions

With the APIs provided in CSVUtil(load/import/export/parse/...), JdbcUtil(extractData/importData/parse/copy/...), IOUtil(parse/read/write/...), DataSet(count/filter/join/group/merge/...), N(...), JSONParser/XMLParser and Lambad/Stream in Java 8, It's super easy/fast to parse/analyze/operate GB/TB data stored in files (with format: CSV/JSON/XML/...) or database in single/multiple machines. RemoteExecutor is designed to run the Big/Heavy data processes on N servers in parallel.

// export the account data to CSV file from database
CSVUtil.exportCSV(file, conn, sql, 0, 1000, true, true);

// load data from CSV file.
DataSet dataset = CSVUtil.loadCSV(Account.class, file);

//find out all the account with first name ended with "6".
DataSet account6 = dataset.filter("first_name", (String fn) -> fn.endsWith("6"));

// group by last name and count it.
DataSet groupedAccount6 = account6.groupBy("last_name", "last_name", "count", Collectors.counting());

// save the result into CSV
File out = new File("./unittest/result.csv");
groupedAccount6.toCSV(out);