Skip to content
dvoros edited this page May 18, 2018 · 21 revisions

Budapest Data Forum 2017 - Hive Workshop

A refined/extended version of this workshop is available at https://github.com/dvoros/best17hive

This repository contains all the training material that was part of the Hive workshop on Budapest Data Forum 2017.

On the transcript page you'll find all the queries we were running with some explanations.

There is a separate summary page on the query language that's intended to be used as a reference, with examples for all the query types we've seen during the workshop.

In the discofull.db folder you'll find the Discogs data dump of 2017.06.01 as three ORC tables. For the schema and usage examples see the end of the transcript. To get the original XML dataset, visit http://data.discogs.com/.


by Zsolt Fekete, Zoltan Haindrich, Daniel Voros -- Hortonworks

Clone this wiki locally