-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
78 lines (60 loc) · 2.25 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
LWES Journal File SerDe README
***
In order to read journal files from Hive, a SerDe (Serialize/Deserializer)
is needed, to map Hive columns to LWES attributes.
***
Prerequisites
- JDK 1.6.x (http://java.sun.com/)
- Maven 2.2.x (http://apache.maven.org/)
***
How to build
% mvn clean package
***
How to install
Hive looks for extensions in a directory defined in the environment
variable HIVE_AUX_JARS_PATH.
If that variable is not defined, set it to a directory of your choice
Copy JournalSerDe-x.x.x.jar into that directory and launch hive
***
Creating tables
This is an example of table creation.
Just one event type is currently allowed per table.
The SerDe will automatically map a lwes attribute to the correspondent
hive column with the same name. Unfortunately, lwes attributes are case
sensitive while hive columns are not; you may also want a hive column
with a different name from the lwes attribute. In either case, you can
change the attribute/column mapping with serde properties as shown below:
the column sender_ip is mapped to the lwes attribute 'SenderIP'.
Classes for input/output are
INPUTFORMAT 'org.lwes.hadoop.io.JournalInputFormat'
OUTPUTFORMAT 'org.lwes.hadoop.io.JournalOutputFormat'
CREATE TABLE mrkt_auction_complete_hourly (
a_bid string,
a_price string,
a_act_id bigint,
......
x_revenue string
)
PARTITIONED BY(dt STRING)
ROW FORMAT SERDE 'org.lwes.hadoop.hive.EventSerDe'
WITH SERDEPROPERTIES (
'lwes.event_name'='Auction::Complete',
'sender_ip'='SenderIP',
'sender_port'='SenderPort',
'receipt_time'='ReceiptTime',
'site_id'='SiteID')
STORED AS
INPUTFORMAT 'org.lwes.hadoop.io.JournalInputFormat'
OUTPUTFORMAT 'org.lwes.hadoop.io.JournalOutputFormat'
;
Also, lwes does not support FLOAT nor DOUBLE but hive does.
You can have define those columns as float/double and the serde
will convert its values according to Float.parseFloat(String) and
Double.parseDouble(String).
I also built a tool to create table definitions from the ESF file
and will post it too to sourceforge.
***
Limitations
Since LWES is basically a key/value format, it does not support nested
columns so arrays and hashes are for now not allowed in a hive table that
uses this SerDe