-
Notifications
You must be signed in to change notification settings - Fork 339
SPL:One to One Table Associations
A JOIN operation is used to join records of two or more tables. This essay explains SPL’s way of handling JOINs and the underlying principles.
Table association relationships are data dependence relationships between tables. There are four types of table association relationships – one-to-one, many-to-one, one-to-many and many-to-many. We can combine two or more tables through one of those relationships to achieve the multi-table join query.
In a one-to-one association relationship, each record of a table corresponds to one and only one record in the other table and vice versa. The relationship often exists between two tables using the same primary key field. The common primary key creates a one-to-one correspondence between records of the two tables.
One example is between the student table and the student contact table that use the same primary key student ID. The former stores basic student information like name, gender, birthday, department and class. The latter stores students’ contact information like contacts and addresses. Each student record corresponds to a unique contact record, and each contact record matches one student record only.
【Example 1】 Based on STUDENT table and STUDENT_CONTACTS table, we are trying to find names, contacts and addresses of students who provide their contact information. Below is part of the data:
STUDENT:
ID | NAME | DEPARTMENTID | CLASSID | GENDER | BIRTHDAY |
---|---|---|---|---|---|
1 | Rebecca | 1 | 1 | F | 2010/09/08 |
2 | Ashley | 1 | 1 | F | 2010/10/09 |
3 | Rachel | 1 | 1 | F | 2011/04/29 |
4 | Emily | 1 | 1 | F | 2010/11/24 |
5 | Ashley | 1 | 1 | F | 2011/03/03 |
… | … | … | … | … | … |
STUDENT_CONTACTS:
ID | CONTACTS | ADDRESS |
---|---|---|
1 | Mrs. Moore | 124 Guangming North Road |
2 | Mrs. Wilson | 116 Baishi Road |
3 | Mr. Johnson | No.8, Mingcheng Road, Haidian District |
4 | Mr. Smith | 12 Fuxing Road |
5 | Mr. Smith | 462 Shijingshan Road |
… | … | … |
According to the task description, there are students who don’t leave their contact information. To find the names and contact information of eligible students, we need an inner join that only selects records whose student ID exist in both tables. An inner join is just called the join, which will delete all rows from each of the two tables that cannot match any of the row in the other table.
In SPL, the two or multiple tables that have a one-to-one correspondence relationship are called homo-dimension tables. The SPL join() function is used to perform a join, which is an inner join by default.
The SPL script for achieving the above join query:
A | |
---|---|
1 | =T("Student.txt") |
2 | =T("StudentContacts.txt") |
3 | =join(A1:S,ID;A2:C,ID) |
4 | =A3.new(S.NAME,C.CONTACTS,C.ADDRESS) |
A1: Import STUDENT table from the corresponding file.
A2: Import STUDENT_CONTACTS table from the corresponding file.
A3: Inner joins the two tables on student ID.
A4: Create a new table sequence made up of three fields – student name, contact and address.
SPL can retrieve a data table from a database. Suppose data comes from "STUDENT" table stored in database "db", A1’s code will like this:
A | |
---|---|
1 | =connect("db").query("select * from STUDENT") |
【Example 2】 Based on EMPLOYEE table and MANAGER table, we are trying to get incomes (including the allowance) of all employees (including the manager). Below is part of the data:
EMPLOYEE:
ID | NAME | BIRTHDAY | STATE | DEPT | SALARY |
---|---|---|---|---|---|
1 | Rebecca | 1974/11/20 | California | R&D | 7000 |
2 | Ashley | 1980/07/19 | New York | Finance | 11000 |
3 | Rachel | 1970/12/17 | New Mexico | Sales | 9000 |
4 | Emily | 1985/03/07 | Texas | HR | 7000 |
5 | Ashley | 1975/05/13 | Texas | R&D | 16000 |
… | … | … | … | … | … |
MANAGER:
ID | ALLOWANCE |
---|---|
18 | 7000 |
2 | 11000 |
4 | 7000 |
6 | 10000 |
7 | 9000 |
… | … |
Managers are employees though the MANAGER table stores special information like allowance. To find incomes of all employees, we need a left join that selects all employees, including managers. A left join is also called left outer join, which joins two tables based on the left table. Its result set contains all records in the left table and the records in the right table that have matches in the left table according to the joining field. SPL join() function works with @1 option to perform a left join.
The SPL script for achieving the join query:
A | |
---|---|
1 | =T("Employee.csv") |
2 | =T("Manager.txt") |
3 | =join@1(A1:E, ID; A2:M, ID) |
4 | =A3.new(E.ID, E.NAME, E.SALARY+M.ALLOWANCE:INCOME) |
A1: Import EMPLOYEE table from the corresponding file.
A2: Import MANAGER table from the corresponding file.
A3: Left join the two tables on employee ID based on the first table (EMPLOYEE table).
A4: Create a new table sequence made up of three fields – employee ID, name and total income.
The one-to-one association relationship is the simplest among all association relationships. With such a relationship, two tables are joined directly by their primary keys. Both SQL and SPL can handle join queries with one-to-one association well.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code