10g中distinct加强以及anti jion,semi jion

SQL> conn roger/roger
Connected.

SQL> create table t1 as select * from dba_objects where object_id < 30001;

Table created.

SQL> create table t2 as select * from dba_objects where object_id < 20001;

Table created.
SQL>
SQL> set lines 160
SQL> set pagesize 50
SQL> set pagesize 50
SQL> analyze table t1 compute statistics;

Table analyzed.

SQL> analyze table t2 compute statistics;

Table analyzed.

SQL> set autot traceonly
SQL> select distinct a.owner from t2 a where not exists
  2  (select b.owner from t1 b where b.owner=a.owner and b.owner='SYS');

7 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4203366459

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     8 |   193   (2)| 00:00:03 |
|   1 |  HASH UNIQUE          |      |     1 |     8 |   193   (2)| 00:00:03 |
|*  2 |   HASH JOIN RIGHT ANTI|      | 17022 |   132K|   192   (1)| 00:00:03 |
|*  3 |    TABLE ACCESS FULL  | T1   |  3682 | 14728 |   115   (0)| 00:00:02 |
|   4 |    TABLE ACCESS FULL  | T2   | 19454 | 77816 |    76   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"."OWNER"="A"."OWNER")
   3 - filter("B"."OWNER"='SYS')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        670  consistent gets
          0  physical reads
          0  redo size
        516  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          7  rows processed

我们可以看到此时逻辑读之和为671,没有排序消耗。下面修改一个隐含参数继续比较:

SQL> alter session set "_gby_hash_aggregation_enabled"=false;

Session altered.

SQL> select distinct a.owner from t2 a where not exists
  2  (select b.owner from t1 b where b.owner=a.owner and b.owner='SYS');

7 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1130070542

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     8 |   193   (2)| 00:00:03 |
|   1 |  SORT UNIQUE          |      |     1 |     8 |   193   (2)| 00:00:03 |
|*  2 |   HASH JOIN RIGHT ANTI|      | 17022 |   132K|   192   (1)| 00:00:03 |
|*  3 |    TABLE ACCESS FULL  | T1   |  3682 | 14728 |   115   (0)| 00:00:02 |
|   4 |    TABLE ACCESS FULL  | T2   | 19454 | 77816 |    76   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"."OWNER"="A"."OWNER")
   3 - filter("B"."OWNER"='SYS')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        670  consistent gets
          0  physical reads
          0  redo size
        516  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          7  rows processed

我们发现,此时虽然逻辑读是一样的,但是多了1个memory的排序操作,可见10.2里面跟以前版本
在distinct 排序操作上有更进一步的优化了。

我们继续来看上面的执行计划,发现是HASH JOIN RIGHT ANTI,也就是anti jion。
我们知道oracle 在处理exist和in时,是使用的semi jion,然而在处理not exists和not in时
是用的anti jion,下面我来验证一下:

SQL> select distinct a.owner from t2 a where  exists
  2  (select b.owner from t1 b where b.owner=a.owner and b.owner='SYS')
  3  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1951262108

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |     8 |   193   (2)| 00:00:03 |
|   1 |  SORT UNIQUE NOSORT |      |     1 |     8 |   193   (2)| 00:00:03 |
|*  2 |   HASH JOIN SEMI    |      |  2432 | 19456 |   192   (1)| 00:00:03 |
|*  3 |    TABLE ACCESS FULL| T2   |  2432 |  9728 |    76   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |  3682 | 14728 |   115   (0)| 00:00:02 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"."OWNER"="A"."OWNER")
   3 - filter("A"."OWNER"='SYS')
   4 - filter("B"."OWNER"='SYS')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        268  consistent gets
          0  physical reads
          0  redo size
        409  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

可以看到此时是semi jion了。

下面来测试下not in的情况:

SQL> select distinct a.owner from t2 a where a.owner  not in
  2  (select b.owner from t1 b where b.owner=a.owner and b.owner='SYS');

7 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2122336124

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     8 |    32 |  1000   (1)| 00:00:13 |
|   1 |  HASH UNIQUE         |      |     8 |    32 |  1000   (1)| 00:00:13 |
|*  2 |   FILTER             |      |       |       |            |          |
|   3 |    TABLE ACCESS FULL | T2   | 19454 | 77816 |    76   (0)| 00:00:01 |
|*  4 |    FILTER            |      |       |       |            |          |
|*  5 |     TABLE ACCESS FULL| T1   |   460 |  1840 |   115   (0)| 00:00:02 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT 0 FROM "T1" "B" WHERE :B1='SYS' AND
              "B"."OWNER"=:B2 AND LNNVL("B"."OWNER":B3)))
   4 - filter(:B1='SYS')
   5 - filter("B"."OWNER"=:B1 AND LNNVL("B"."OWNER":B2))

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        268  consistent gets
          0  physical reads
          0  redo size
        516  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          7  rows processed

可以看到这里非常奇怪,居然走filter了。

SQL> alter table t1 modify (owner VARCHAR2(30) not null);

Table altered.

SQL> alter table t2 modify (owner VARCHAR2(30) not null);

Table altered.

SQL> select distinct a.owner from t2 a where a.owner  not in
  2  (select b.owner from t1 b where b.owner=a.owner and b.owner='SYS');

7 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4203366459

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     8 |   193   (2)| 00:00:03 |
|   1 |  HASH UNIQUE          |      |     1 |     8 |   193   (2)| 00:00:03 |
|*  2 |   HASH JOIN RIGHT ANTI|      | 17022 |   132K|   192   (1)| 00:00:03 |
|*  3 |    TABLE ACCESS FULL  | T1   |  3682 | 14728 |   115   (0)| 00:00:02 |
|   4 |    TABLE ACCESS FULL  | T2   | 19454 | 77816 |    76   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"."OWNER"="A"."OWNER")
   3 - filter("B"."OWNER"='SYS')

Statistics
----------------------------------------------------------
        349  recursive calls
          0  db block gets
        715  consistent gets
          1  physical reads
          0  redo size
        516  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          8  sorts (memory)
          0  sorts (disk)
          7  rows processed

可以看到,目前走hash jion了,问兔子,他是这样说的:
not in如果没有约束或语句避免null问题,内部会使用lnnvl函数,11g有null aware优化,
比10g好点,因此not in写法特别注意null,lnnvl是undocument函数.

所以我这里将自动owner修改为not null,下面来看看11g中是不是这样的,我这里以11gR2为例:
——10.2.0.5

SQL> set autot traceonly exp
SQL> select distinct a.object_id from t2 a where a.object_id  not in
  2  (select b.object_id from t1 b);

Execution Plan
----------------------------------------------------------
Plan hash value: 3142026835

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 19453 | 77812 |       |  1122K  (1)| 03:44:29 |
|   1 |  HASH UNIQUE        |      | 19453 | 77812 |   240K|  1122K  (1)| 03:44:29 |
|*  2 |   FILTER            |      |       |       |       |            |          |
|   3 |    TABLE ACCESS FULL| T2   | 19454 | 77816 |       |    76   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |     1 |     4 |       |   115   (0)| 00:00:02 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT 0 FROM "T1" "B" WHERE
              LNNVL("B"."OBJECT_ID":B1)))
   4 - filter(LNNVL("B"."OBJECT_ID":B1))

——-11.2.0.2

SQL>  select * from v$version where rownum < 2;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production

SQL>
SQL> set autot traceonly exp
SQL> select distinct a.object_id from t2 a where a.object_id  not in
  2  (select b.object_id from t1 b);

Execution Plan
----------------------------------------------------------
Plan hash value: 1298667172

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   195 |  1560 |   196   (2)| 00:00:03 |
|   1 |  HASH UNIQUE        |      |   195 |  1560 |   196   (2)| 00:00:03 |
|*  2 |   HASH JOIN ANTI NA |      |   195 |  1560 |   195   (1)| 00:00:03 |
|   3 |    TABLE ACCESS FULL| T2   | 19490 | 77960 |    77   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   | 29490 |   115K|   117   (0)| 00:00:02 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

可以看到11g中,即使object_id字段默认允许为空,sql仍然可以走hash anti jion,
而不是去像10g一样走filter。 这里的NA表示null aware.


评论

《 “10g中distinct加强以及anti jion,semi jion” 》 有 2 条评论

  1. “我们知道oracle 在处理exist和in时,是使用的anti jion,然而在处理not exists和not in时是用的semi jion” 
    这句写反了 🙂

    1. 笔误,已更正,3Q!

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注