Welcome to the Beyond Blog

John's Blog

This is my blog of various topics, from things that I think might be useful to others, to things that I just find interesting personally. If you have any comments or questions on any of my posts then please do ask - any positive contribution is very much welcomed.



The views expressed in this blog and completely my own and do not reflect those of Beyond Systems Ltd. All content is provided for informational purposes only, and you are solely responsible for how you use this on your own or others systems. We accept no liability for any losses or damages caused.
6 minutes reading time (1254 words)

Oracle Row Generators and Cardinality

I'm quite a fan of row generators. If I know I need a row for every day of the year for example, I generally avoid holding (and maintaining) a table of these dates, and instead will generate them. This is by no means a universal rule, so take each case on its merits.

select trunc(sysdate,'YYYY') + level -1 dt
  from dual
connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY');

You can see a few methods (of many) of generating rows on one of my previous posts. The issue with generating rows in this manner is that the optimizer estimates cardinalities based on the number of rows in the table. Which for dual is... 1.
So when we look at the execution plan for such a query, we see this.

SQL_ID  16r2my83pj187, child number 0
-------------------------------------
select /*+gather_plan_statistics*/ trunc(sysdate,'YYYY') + level -1 dt  
 from dual connect by level <= add_months(trunc(sysdate,'YYYY'),12) - 
trunc(sysdate,'YYYY')
 
Plan hash value: 1236776825
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |    365 |00:00:00.01 |       |       |          |
|   1 |  CONNECT BY WITHOUT FILTERING|      |      1 |        |    365 |00:00:00.01 |  2048 |  2048 | 2048  (0)|
|   2 |   FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       |       |          |
----------------------------------------------------------------------------------------------------------------

We thought we would get just 1 row back - but we in fact got 365. Once you start joining out to other tables, the magnitude of that soon becomes more prominent. Let's say I want to generate a row for each employee who is a manager, for each day of the year.


 

select /*+gather_plan_statistics*/ e.first_name, e.last_name, j.job_title, cal.dt 
  from employees e, jobs j,
       (select trunc(sysdate,'YYYY') + level -1 dt
          from dual
        connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY')) cal
 where j.job_id = e.job_id
   and j.job_title like '%Manager%';
SQL_ID  0smma673xxzb2, child number 0
-------------------------------------
select /*+gather_plan_statistics*/ e.first_name, e.last_name, 
j.job_title, cal.dt    from employees e, jobs j,        (select 
trunc(sysdate,'YYYY') + level -1 dt           from dual         connect 
by level <= add_months(trunc(sysdate,'YYYY'),12) - 
trunc(sysdate,'YYYY')) cal  where j.job_id = e.job_id    and 
j.job_title like '%Manager%'
 
Plan hash value: 706006494
 
-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |            |      1 |        |   5110 |00:00:00.01 |      31 |       |       |          |
|   1 |  NESTED LOOPS                     |            |      1 |      5 |   5110 |00:00:00.01 |      31 |       |       |          |
|   2 |   NESTED LOOPS                    |            |      1 |      6 |   5110 |00:00:00.01 |      22 |       |       |          |
|   3 |    MERGE JOIN CARTESIAN           |            |      1 |      1 |   2190 |00:00:00.01 |       9 |       |       |          |
|*  4 |     TABLE ACCESS FULL             | JOBS       |      1 |      1 |      6 |00:00:00.01 |       9 |       |       |          |
|   5 |     BUFFER SORT                   |            |      6 |      1 |   2190 |00:00:00.01 |       0 | 18432 | 18432 |16384  (0)|
|   6 |      VIEW                         |            |      1 |      1 |    365 |00:00:00.01 |       0 |       |       |          |
|   7 |       CONNECT BY WITHOUT FILTERING|            |      1 |        |    365 |00:00:00.01 |       0 |  2048 |  2048 | 2048  (0)|
|   8 |        FAST DUAL                  |            |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|*  9 |    INDEX RANGE SCAN               | EMP_JOB_IX |   2190 |      6 |   5110 |00:00:00.01 |      13 |       |       |          |
|  10 |   TABLE ACCESS BY INDEX ROWID     | EMPLOYEES  |   5110 |      6 |   5110 |00:00:00.01 |       9 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter("J"."JOB_TITLE" LIKE '%Manager%')
   9 - access("J"."JOB_ID"="E"."JOB_ID")

Due to the cardinality mis-estimate on our calendar generation (plan line 6) the optimzer has chosen an index access into employees which has resulted in the index being scanned 2190 times, and the employee table 5110 times based on the rowid's fetch from the previous index scan.

Had the optimizer known the calendar would return 365 rows instead of 1, it is unlikely to have chosen an index/nested loops access approach. But how can we provide that information? Well, we know that we are generating a row per day of the year, and we know that in general there are 365 days in the year. So we can use the cardinality hint in our calendar:

(select /*+cardinality(365)*/ trunc(sysdate,'YYYY') + level -1 dt
  from dual
connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY')) cal

which then yields a somewhat different plan where we don't :

SQL_ID  7vd1csauqf761, child number 0
-------------------------------------
select /*+gather_plan_statistics*/ e.first_name, e.last_name, 
j.job_title, cal.dt    from employees e, jobs j,        (select 
/*+cardinality(365)*/ trunc(sysdate,'YYYY') + level -1 dt           
from dual         connect by level <= 
add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY')) cal  
where j.job_id = e.job_id    and j.job_title like '%Manager%'
 
Plan hash value: 3120087941
 
---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                  |      1 |        |   5110 |00:00:00.02 |      17 |      2 |       |       |          |
|*  1 |  HASH JOIN                       |                  |      1 |   1953 |   5110 |00:00:00.02 |      17 |      2 |  1376K|  1376K| 1668K (0)|
|   2 |   VIEW                           | index$_join$_001 |      1 |    107 |    107 |00:00:00.01 |       8 |      2 |       |       |          |
|*  3 |    HASH JOIN                     |                  |      1 |        |    107 |00:00:00.01 |       8 |      2 |  1316K|  1316K| 1690K (0)|
|   4 |     INDEX FAST FULL SCAN         | EMP_NAME_IX      |      1 |    107 |    107 |00:00:00.01 |       4 |      1 |       |       |          |
|   5 |     INDEX FAST FULL SCAN         | EMP_JOB_IX       |      1 |    107 |    107 |00:00:00.01 |       4 |      1 |       |       |          |
|   6 |   MERGE JOIN CARTESIAN           |                  |      1 |    347 |   2190 |00:00:00.01 |       9 |      0 |       |       |          |
|*  7 |    TABLE ACCESS FULL             | JOBS             |      1 |      1 |      6 |00:00:00.01 |       9 |      0 |       |       |          |
|   8 |    BUFFER SORT                   |                  |      6 |    365 |   2190 |00:00:00.01 |       0 |      0 | 18432 | 18432 |16384  (0)|
|   9 |     VIEW                         |                  |      1 |    365 |    365 |00:00:00.01 |       0 |      0 |       |       |          |
|  10 |      CONNECT BY WITHOUT FILTERING|                  |      1 |        |    365 |00:00:00.01 |       0 |      0 |  2048 |  2048 | 2048  (0)|
|  11 |       FAST DUAL                  |                  |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("J"."JOB_ID"="E"."JOB_ID")
   3 - access(ROWID=ROWID)
   7 - filter("J"."JOB_TITLE" LIKE '%Manager%')

There are still some differences on the "MERGE JOIN CARTESIAN" step, however we are a lot closer than we were previously.

This of course is a small demonstration with very low rowcounts, however I witnessed a query at a client site this week which was generating rows for three years in a similar fashion to above, and the execution plan ended up doing a nested loops operation which was being executed over a billion times due to this exact issue... because the optimizer estimated it was going to get back 1 row instead of nearer 1100.

So whilst the cardinality hint is always discouraged from production code, I'd say when you are geneating a known number of rows in a row generator then it's pretty much essential.

Is your E-Business Suite ERP ready for GDPR?
Oracle 18c Available on Live SQL Now!
 

Comments

No comments made yet. Be the first to submit a comment
Already Registered? Login Here
Guest
Friday, 22 June 2018

Demonstration

Request a demo of our products here

REQUEST DEMO

Contact

Beyond Systems Limited

Barnett House, 53 Fountain Street,

Manchester M2 2AN United Kingdom

 

 

Tel:    +44(0)8450 940 998

Email:  contact@wegobeyond.co.uk