Linking data with JOIN
JOIN
is used to link rows from two or more tables together. Hive supports most SQL JOIN
operations, such as INNER JOIN
and OUTER JOIN
. In addition, HQL supports some special joins, such as MapJoin and Semi-Join too. In its earlier version, Hive only supported equal join. After v2.2.0, unequal join is also supported. However, you should be more careful when using unequal join unless you know what is expected, since unequal join is likely to return many rows by producing a Cartesian product of joined tables. When you want to restrict the output of a join, you should apply a WHERE
clause after join as JOIN
occurs before the WHERE
clause. If possible, push filter conditions on the join conditions rather than where conditions to have data filtered earlier. What's more, all types of left/right joins are not commutative and always left/right associative, while INNER
and FULL OUTER JOINS
are both commutative and associative.
INNER JOIN
INNER JOIN
or JOIN
returns rows meeting...