提示
Spark SQL 教程 正在筹划编写中,使用过程中有任何建议,提供意见、建议、纠错、催更加微信 gr99123。
Spark SQL 的联接用于根据联接条件组合两个关系中的行。将介绍了整体连接语法,然后各小节介绍了不同类型的连接以及示例。
Spark SQL 的 LIMIT 子句结构为:
relation { [ join_type ] JOIN relation [ join_criteria ] | NATURAL join_type JOIN relation }
参数:
join_criteria 语法:
ON boolean_expression | USING ( column_name [ , ... ] )
join_type 语法:
[ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTI
接下来会详细介绍各个联接的逻辑。
内部联接是 Spark SQL 中的默认联接,它选择在两个关系中都具有匹配值的行。
语法:
relation [ INNER ] JOIN relation [ join_criteria ]
返回左关系中的所有值和右关系中的匹配值,如果不匹配,则追加NULL。它也被称为左外连接。
语法:
relation LEFT [ OUTER ] JOIN relation [ join_criteria ]
返回右关系中的所有值和左关系中的匹配值,如果不匹配,则追加NULL。它也被称为右外连接。
语法:
relation RIGHT [ OUTER ] JOIN relation [ join_criteria ]
返回两个关系中的所有值,在不匹配的一侧追加空值。它也被称为完全外部连接。
语法:
relation FULL [ OUTER ] JOIN relation [ join_criteria ]
返回两个关系的笛卡尔乘积。
语法:
relation CROSS JOIN relation [ join_criteria ]
半联接从与右侧匹配的关系的左侧返回值。它也被称为左半连接。
语法:
relation [ LEFT ] SEMI JOIN relation [ join_criteria ]
反联接从左关系返回与右关系不匹配的值。它也被称为左反连接。
语法:
relation [ LEFT ] ANTI JOIN relation [ join_criteria ]
以下是一些示例讲解:
-- 使用 employee 表和 department 表演示不同类型的联接。
SELECT * FROM employee;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe| 5|
|103| Paul| 3|
|101| John| 1|
|102| Lisa| 2|
|104| Evan| 4|
|106| Amy| 6|
+---+-----+------+
SELECT * FROM department;
+------+-----------+
|deptno| deptname|
+------+-----------+
| 3|Engineering|
| 2| Sales|
| 1| Marketing|
+------+-----------+
-- 内联接。
SELECT id, name, employee.deptno, deptname
FROM employee INNER
JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|103| Paul| 3|Engineering|
|101| John| 1| Marketing|
|102| Lisa| 2| Sales|
+---+-----+------+-----------|
-- left join.
SELECT id, name, employee.deptno, deptname
FROM employee
LEFT JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|105|Chloe| 5| NULL|
|103| Paul| 3|Engineering|
|101| John| 1| Marketing|
|102| Lisa| 2| Sales|
|104| Evan| 4| NULL|
|106| Amy| 6| NULL|
+---+-----+------+-----------|
-- right join.
SELECT id, name, employee.deptno, deptname
FROM employee RIGHT
JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|103| Paul| 3|Engineering|
|101| John| 1| Marketing|
|102| Lisa| 2| Sales|
+---+-----+------+-----------|
-- full join.
SELECT id, name, employee.deptno, deptname
FROM employee
FULL JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|101| John| 1| Marketing|
|106| Amy| 6| NULL|
|103| Paul| 3|Engineering|
|105|Chloe| 5| NULL|
|104| Evan| 4| NULL|
|102| Lisa| 2| Sales|
+---+-----+------+-----------|
-- cross join.
SELECT id, name, employee.deptno, deptname FROM employee
CROSS JOIN department;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|105|Chloe| 5|Engineering|
|105|Chloe| 5| Marketing|
|105|Chloe| 5| Sales|
|103| Paul| 3|Engineering|
|103| Paul| 3| Marketing|
|103| Paul| 3| Sales|
|101| John| 1|Engineering|
|101| John| 1| Marketing|
|101| John| 1| Sales|
|102| Lisa| 2|Engineering|
|102| Lisa| 2| Marketing|
|102| Lisa| 2| Sales|
|104| Evan| 4|Engineering|
|104| Evan| 4| Marketing|
|104| Evan| 4| Sales|
|106| Amy| 4|Engineering|
|106| Amy| 4| Marketing|
|106| Amy| 4| Sales|
+---+-----+------+-----------|
-- semi join.
SELECT * FROM employee
SEMI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|103| Paul| 3|
|101| John| 1|
|102| Lisa| 2|
+---+-----+------+
-- anti join.
SELECT * FROM employee
ANTI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe| 5|
|104| Evan| 4|
|106| Amy| 6|
+---+-----+------+
更新时间:2021-08-20 19:38:50 标签:sql spark join