-
Notifications
You must be signed in to change notification settings - Fork 339
Performance Optimization Exercises Using TPC H
We use TPC-H tool to generate the original data of text file format. There are altogether 8 tables, with a total size of 100G (See table sizes below).
Then we generate SPL binary files for these text files. Each data table corresponds to a binary file.
Acomposite table has index information that always requires at least a minimum space utilization, so it is not suitable for storing small data tables. That’s why we store small data tables in row-wise storage bin files, which helps decrease the size of data stored and enables loading all data at one time. A large data table, on the other hand, is stored in a column-wise storage composite table that uses primary key as the dimension. When dumping data to the composite table, use sortx() function to sort the data table.
Generate a bin file (Take region table as an example):
A | |
---|---|
1 | =file("region.tbl").import(; , "|").new(_1:R_REGIONKEY, _2:R_NAME, _3:R_COMMENT).sort(R_REGIONKEY) |
2 | =file("region.btx").export@b(A1) |
There is no need to explicitly create data structure for a bin file, we just write data to it directly.
Generate a composite table (Take customer table as an example):
A | |
---|---|
1 | =file("customer.tbl").cursor(; , "|").new(_1:C_CUSTKEY, _2:C_NAME, _3:C_ADDRESS, _4:C_NATIONKEY, _5:C_PHONE, _6:C_ACCTBAL, _7:C_MKTSEGMENT, _8:C_COMMENT).sortx(C_CUSTKEY;15000000) |
2 | =file("customer.ctx").create(#C_CUSTKEY, C_NAME, C_ADDRESS, C_NATIONKEY, C_PHONE, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT) |
3 | >A2.append(A1) |
Creating a composite table requires listing data structure, specifying primary key field(s) (by preceding them with #), and sorting data by primary key. The second parameter in sortx() function is determined by memory size, and can be omitted without impact.
partsupp and lineitem are sub tables of part and orders respectively. To ensure synced segmentation between sub table and primary table, use @p option at composite table creation. Take lineitem table as an example:
A | |
---|---|
1 | =file("lineitem.tbl").cursor(; , "|").new(_1:L_ORDERKEY, _4:L_LINENUMBER, _2:L_PARTKEY, _3:L_SUPPKEY, _5:L_QUANTITY, _6:L_EXTENDEDPRICE,_7:L_DISCOUNT, _8:L_TAX, _9:L_RETURNFLAG, _10:L_LINESTATUS,_11:L_SHIPDATE, _12:L_COMMITDATE, _13:L_RECEIPTDATE,_14:L_SHIPINSTRUCT, _15:L_SHIPMODE, _16:L_COMMENT).sortx(L_ORDERKEY,L_LINENUMBER;4000000) |
2 | =file("lineitem.ctx").create@p(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT) |
3 | >A2.append(A1) |
A2 uses @p option to ensure that records having same L_ORDERKEY will be put into same segment, so that we can get result of parallel computation correctly.
Below lists sizes of the original files and SPL files:
Table | Row count | tbl size | ctx/btxsize |
---|---|---|---|
region | 5 | 1K | 8K |
nation | 25 | 2K | 10K |
customer | 15M | 2.4G | 1.3G |
part | 20M | 2.4G | 1.0G |
supplier | 1M | 140M | 0.1G |
partsupp | 80M | 12.2G | 4.9G |
orders | 150M | 17.8G | 7.2G |
lineitem | 600M | 79.5G | 29.4G |
Performance Optimization Exercises Using TPC-H – Q1
Performance Optimization Exercises Using TPC-H – Q2
Performance Optimization Exercises Using TPC-H – Q3
Performance Optimization Exercises Using TPC-H – Q4
Performance Optimization Exercises Using TPC-H – Q5
Performance Optimization Exercises Using TPC-H – Q6
Performance Optimization Exercises Using TPC-H – Q7
Performance Optimization Exercises Using TPC-H – Q8
Performance Optimization Exercises Using TPC-H – Q9
Performance Optimization Exercises Using TPC-H – Q10
Performance Optimization Exercises Using TPC-H – Q11
Performance Optimization Exercises Using TPC-H – Q12
Performance Optimization Exercises Using TPC-H – Q13
Performance Optimization Exercises Using TPC-H – Q14
Performance Optimization Exercises Using TPC-H – Q15
Performance Optimization Exercises Using TPC-H – Q16
Performance Optimization Exercises Using TPC-H – Q17
Performance Optimization Exercises Using TPC-H – Q18
Performance Optimization Exercises Using TPC-H – Q19
Performance Optimization Exercises Using TPC-H – Q20
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code