提示
Spark SQL 教程 正在筹划编写中,使用过程中有任何建议,提供意见、建议、纠错、催更加微信 gr99123。
集合运算符用于将两个输入关系合并为一个输入关系。Spark SQL支持三种类型的集合运算符:EXCEPT/MINUS、INTERSECT、UNION。请注意,输入关系必须具有相同的列数和相应列的兼容数据类型。
EXCEPT 和 EXCEPT ALL 返回在一个关系中找到但在另一个关系中找不到的行。EXCEPT(或者,EXCEPT DISTINCT)只接受不同的行,而 EXCEPT ALL 不会从结果行中删除重复项。请注意,MINUS 是 EXCEPT 的别名。
Spark SQL 的 EXCEPT 语法为:
[ ( ] relation [ ) ] EXCEPT | MINUS [ ALL | DISTINCT ] [ ( ] relation [ ) ]
以下是一些示例讲解:
-- 使用 number1 和 number2 表格演示本页中的 set 运算
SELECT * FROM number1;
+---+
| c|
+---+
| 3|
| 1|
| 2|
| 2|
| 3|
| 4|
+---+
SELECT * FROM number2;
+---+
| c|
+---+
| 5|
| 1|
| 2|
| 2|
+---+
SELECT c FROM number1 EXCEPT SELECT c FROM number2;
+---+
| c|
+---+
| 3|
| 4|
+---+
SELECT c FROM number1 MINUS SELECT c FROM number2;
+---+
| c|
+---+
| 3|
| 4|
+---+
SELECT c FROM number1 EXCEPT ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 3|
| 4|
+---+
SELECT c FROM number1 MINUS ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 3|
| 4|
+---+
INTERSECT 和 INTERSECT ALL 返回在两个关系中找到的行。INTERSECT (或者, INTERSECT DISTINCT) 仅获取不同的行,而 INTERSECT ALL 不会从结果行中删除重复项。
Spark SQL 的 INTERSECT 语法为:
[ ( ] relation [ ) ] INTERSECT [ ALL | DISTINCT ] [ ( ] relation [ ) ]
以下是一些示例讲解:
(SELECT c FROM number1) INTERSECT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
+---+
(SELECT c FROM number1) INTERSECT DISTINCT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
+---+
(SELECT c FROM number1) INTERSECT ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
| 2|
+---+
UNION and UNION ALL 返回在任一关系中找到的行。UNION (或者, UNION DISTINCT) 仅获取不同的行,而 UNION ALL 不会从结果行中删除重复项。
Spark SQL 的 UNION 语法为:
[ ( ] relation [ ) ] UNION [ ALL | DISTINCT ] [ ( ] relation [ ) ]
以下是一些示例讲解:
(SELECT c FROM number1) UNION (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 3|
| 5|
| 4|
| 2|
+---+
(SELECT c FROM number1) UNION DISTINCT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 3|
| 5|
| 4|
| 2|
+---+
SELECT c FROM number1 UNION ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 1|
| 2|
| 2|
| 3|
| 4|
| 5|
| 1|
| 2|
| 2|
+---+
更新时间:2021-08-20 14:48:28 标签:sql spark 集合