Skip to content

MDEV-39014: FULL JOIN Phase 2#4940

Open
DaveGosselin-MariaDB wants to merge 9 commits intomainfrom
13.2-mdev-39014-full-join-p2
Open

MDEV-39014: FULL JOIN Phase 2#4940
DaveGosselin-MariaDB wants to merge 9 commits intomainfrom
13.2-mdev-39014-full-join-p2

Conversation

@DaveGosselin-MariaDB
Copy link
Copy Markdown
Member

In phase 1, FULL [OUTER] JOIN was only supported when simplify_joins()
could rewrite it into an equivalent LEFT, RIGHT, or INNER JOIN based
on NULL-rejecting WHERE predicates.  Queries that could not be
rewritten raised ER_NOT_SUPPORTED_YET.  (Phase 1 was not released.)

This commit removes that restriction by adding proper support for FULL
JOIN by executing a 'LEFT JOIN pass' that emits matched rows and left
null-complemented rows, then a second "null-complement" pass which
rescans the right table to emit null-complement rows that were never
matched.

FULL JOIN supports nested joins on the left of the FULL JOIN,
NATURAL FULL JOIN, semi-joins, CTEs / derived tables (kept
materialized when they participate in a FULL JOIN), prepared
statements, stored procedures, and aggregates.  Examples:

  SELECT * FROM (d1 FULL JOIN d2 ON d1.a = d2.a)
              FULL JOIN t3 ON d1.a = t3.a;

  SELECT * FROM t1 NATURAL FULL JOIN t2;

  SELECT * FROM t1 INNER JOIN t2 FULL JOIN t3 ON t1.a = t3.a;

  PREPARE st FROM
    'SELECT COUNT(*) FROM t1 FULL JOIN t2 ON t1.a = t2.a';

Limitations:
  - The join cache is disabled whenever a FULL JOIN is present, which
    can regress plans for large FULL JOINs compared to the rewritten
    cases.  A follow-up will re-enable it where safe.
  - Statistics and cost estimates for the null-complement pass have
    not been fully implemented; the optimizer may under- or
    over-estimate FULL JOIN costs in plans involving multiple
    FULL JOINs.  Again, a follow-up will optimize the cost calculations.
  - Optimizations for constant tables not fully supported.
  - Nested tables on the right side of a FULL JOIN are not yet supported.

Syntax support for FULL JOIN, FULL OUTER JOIN, NATURAL FULL JOIN, and
NATURAL FULL OUTER JOIN in the parser.

While we accept full join syntax, such joins are not yet supported.
Queries specifying any of the above joins will fail with
ER_NOT_SUPPORTED_YET.
Allow FULL OUTER JOIN queries to proceed through name resolution.

Permits limited EXPLAIN EXTENDED support so tests can prove that the
JOIN_TYPE_* table markings are reflected when the query is echoed back by the
server.  This happens in at least two places:  via a Warning message during
EXPLAIN EXTENDED and during VIEW .frm file creation.

While the query plan output is mostly meaningless at this point, this
limited EXPLAIN support improves the SELECT_LEX print function for the new
JOIN types.

TODO: fix PS protocol before end of FULL OUTER JOIN development
Rewrite FULL OUTER JOIN queries as either LEFT, RIGHT, or INNER JOIN
by checking if and how the WHERE clause rejects nulls.

For example, the following two queries are equivalent because the
WHERE condition rejects nulls from the left table and allows matches
in the right table (or NULL from the right table) for the remaining
rows:

  SELECT * FROM t1 FULL JOIN t2 ON t1.v = t2.v WHERE t1.v IS NOT NULL;
  SELECT * FROM t1 LEFT JOIN t2 ON t1.v = t2.v;

  SELECT * FROM t1 FULL JOIN t2 ON t1.v = t2.v WHERE t1.a=t2.a;
  SELECT * FROM t1 INNER JOIN t2 ON t1.v = t2.v WHERE t1.a=t2.a;
FULL JOIN yields result sets with columns from both tables participating in
the join (for the sake of explanation, assume base tables).  However,
NATURAL FULL JOIN should show unique columns in the output.

Given the following query:
  SELECT * FROM t1 NATURAL JOIN t2;
transform it into:
  SELECT COALESCE(t1.f_1, t2.f_1), ..., COALESCE(t1.f_n, t2.f_n) FROM
    t1 NATURAL JOIN t2;

This change applies only in the case of NATURAL FULL JOIN.  Otherwise,
NATURAL JOINs work as they have in the past, which is using columns
from the left table for the resulting column set.
Prevent elimination of tables participating in a FULL OUTER JOIN during
eliminate_tables as part of phase one FULL OUTER JOIN development.

Move the functionality gate for FULL JOIN further into the codebase: convert
LEX::has_full_outer_join to a counter so we can see how many FULL JOINs
remain which makes the gate work correctly after simplify_joins and
eliminate_tables are called.

Fixes an old bug where, when running the server as a debug build and in
debug mode, a null pointer deference in
Dep_analysis_context::dbug_print_deps would cause a crash.
Move the temporary gate against FULL OUTER JOIN deeper into the
codebase, which causes the FULL OUTER JOIN query plans to have
more relevant information (hence the change).  In some cases, the
join order of nested INNER JOINs within the FULL OUTER JOIN changed.

Small cleanups in get_sargable_cond ahead of the feature work in
the next commit.
Fetches the ON condition from the FULL OUTER JOIN as the sargable condition.
We ignore the WHERE clause here because we don't want accidental conversions
from FULL JOIN to INNER JOIN during, for example, range analysis, as that
would produce wrong results.

GCOV shows that existing FULL OUTER JOIN tests exercise this new codepath.
In phase 1, FULL [OUTER] JOIN was only supported when simplify_joins()
could rewrite it into an equivalent LEFT, RIGHT, or INNER JOIN based
on NULL-rejecting WHERE predicates.  Queries that could not be
rewritten raised ER_NOT_SUPPORTED_YET.  (Phase 1 was not released.)

This commit removes that restriction by adding proper support for FULL
JOIN by executing a 'LEFT JOIN pass' that emits matched rows and left
null-complemented rows, then a second "null-complement" pass which
rescans the right table to emit null-complement rows that were never
matched.

FULL JOIN supports nested joins on the left of the FULL JOIN,
NATURAL FULL JOIN, semi-joins, CTEs / derived tables (kept
materialized when they participate in a FULL JOIN), prepared
statements, stored procedures, and aggregates.  Examples:

  SELECT * FROM (d1 FULL JOIN d2 ON d1.a = d2.a)
              FULL JOIN t3 ON d1.a = t3.a;

  SELECT * FROM t1 NATURAL FULL JOIN t2;

  SELECT * FROM t1 INNER JOIN t2 FULL JOIN t3 ON t1.a = t3.a;

  PREPARE st FROM
    'SELECT COUNT(*) FROM t1 FULL JOIN t2 ON t1.a = t2.a';

Limitations:
  - The join cache is disabled whenever a FULL JOIN is present, which
    can regress plans for large FULL JOINs compared to the rewritten
    cases.  A follow-up will re-enable it where safe.
  - Statistics and cost estimates for the null-complement pass have
    not been fully implemented; the optimizer may under- or
    over-estimate FULL JOIN costs in plans involving multiple
    FULL JOINs.  Again, a follow-up will optimize the cost calculations.
  - Optimizations for constant tables not fully supported.
  - Nested tables on the right side of a FULL JOIN are not yet supported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant