Tony’s Oracle Tips

September 8, 2009

INDEX FULL SCAN (MIN/MAX)

Filed under: SQL Fun — tonyhasler @ 8:39 am

I came accross some interesting behaviour with the little-discussed “INDEX FULL SCAN (MIN/MAX)” operation.
First let us setup an example:

create table t1 (n1 number,n2 number,c1 char(100));

insert into t1 select rownum,1000-rownum,rownum from dual connect by level <= 1000;

create index t1_i on t1(n1,n2,c1) pctfree 99;

exec dbms_stats.gather_table_stats(null,'T1',
estimate_percent=>100,no_invalidate=>false,cascade=>true) ;

The index has been setup with pctfree 99 to ensure that each entry uses a whole block.

Let us autotrace a query. This is the second run to get rid of parsing overhead:

SQL> select max(n1) from t1 ;

   MAX(N1)
----------
      1000

Execution Plan
----------------------------------------------------------
Plan hash value: 1582015564

-----------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |     4 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |      |     1 |     4 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_I |  1000 |  4000 |            |          |
-----------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        334  bytes sent via SQL*Net to client
        338  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The 3 consistent gets show that we have traversed the three levels of the index to identify the maximum value. It is clear that the term “FULL SCAN” is a bit misleading but other than that so far so good. Now let us add a predicate on the same column:

SQL> select max(n1) from t1 where n1 < 1000;

   MAX(N1)
----------
       999

Execution Plan
----------------------------------------------------------
Plan hash value: 3815976388

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |      |     1 |     4 |            |          |
|   2 |   FIRST ROW                  |      |   999 |  3996 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I |   999 |  3996 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("N1"<1000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        335  bytes sent via SQL*Net to client
        338  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

There are a few things to note at this point. Firstly, we see that the operation has changed from a “FULL SCAN” to a “RANGE SCAN” and that cardinality and cost estimates have apeared for the scan.

It looks like the cardinality estimate is based on the number of rows that satisfy the access predicate. However, what seems to have happened is that we have identifed a row where n1 = 1000 and then we have done a descending scan of the index from that point. A new operation “FIRST ROW” has also appeared in the plan and perhaps its purpose is to halt the scan after we hit the first suitable row.

Perhaps the most interesting observation is that although the presence of the access predicate means we have to do more work the cost has actually reduced.

Incidentally, we could have used the MIN function and an ascending scan just as easily.  However, this operation will not work at all, for MIN or MAX, for indexes based on a function of the leading column of the index.  This includes descending indexes.

Now let us look at the case where we supply a predicate on a column other than n1:

SQL> select max(n1) from t1 where n2 = 900;

   MAX(N1)
----------
       100

Execution Plan
----------------------------------------------------------
Plan hash value: 1357677816

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |     8 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |      |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |      |     1 |     8 |     3   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| T1_I |     1 |     8 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("N2"=900)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        903  consistent gets
          0  physical reads
          0  redo size
        334  bytes sent via SQL*Net to client
        338  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Once again we seem to performing a descending scan of the index until we get a row that matches. On this occasion a full table scan would have been better but once again our filter predicate has reduced rather than increased the estimated cost.

We can always combine predicates on the non-leading columns and still use this operation.   Sometimes we can also combine predicates from the leading column and other columns but only under specific circumstances:

SQL> select max(n1) from t1 where n1 < 900 and n2 >= 900;

   MAX(N1)
----------
       100

Execution Plan
----------------------------------------------------------
Plan hash value: 3815976388

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |     8 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |      |     1 |     8 |            |          |
|   2 |   FIRST ROW                  |      |    90 |   720 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I |    90 |   720 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("N1"<900)
       filter("N2">=900)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        803  consistent gets
          0  physical reads
          0  redo size
        334  bytes sent via SQL*Net to client
        338  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Here we have traversed down to the row where n1 = 900 and then performed a descending scan until we reached a row where n2 >= 900. However, if we change the predicate on n2 to be n2=900 then the operation suddenly becomes illegal for some reason:

SQL> select /*+ index(t1) */ max(n1) from t1 where n1 < 900 and n2 = 900;

   MAX(N1)
----------
       100

Execution Plan
----------------------------------------------------------
Plan hash value: 3963813734

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     8 |   902   (0)| 00:00:11 |
|   1 |  SORT AGGREGATE   |      |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I |     1 |     8 |   902   (0)| 00:00:11 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N2"=900 AND "N1"<900)
       filter("N2"=900)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        902  consistent gets
          0  physical reads
          0  redo size
        334  bytes sent via SQL*Net to client
        338  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Quite a bit to this isn’t there?

One last thing: there do not appear to be specific hints for these operations. There are hints like INDEX_FFS (fast full scan), INDEX_RS_ASC (range scan) and so on for most index operations. However, if we try and identify the operation from the stored outline we just get the “old fashioned” INDEX hint:

SQL> select * from table(dbms_xplan.display(null,null,'outline')) ;

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Plan hash value: 1582015564

-----------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |     4 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |      |     1 |     4 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_I |  1000 |  4000 |            |          |
-----------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1" "T1"."N2" "T1"."C1"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

About these ads

4 Comments »

  1. 1) “Perhaps the most interesting observation is that although the presence of the access predicate means we have to do more work the cost has actually reduced.”

    My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries. You really have to look at metrics like consistent gets and actual execution time when comparing different queries.

    2) “On this occasion a full table scan would have been better”

    Why, wouldn’t a FTS required 1000 consistent gets while this plan did it with 903?

    Comment by James Sinnott — September 8, 2009 @ 4:51 pm | Reply

    • James,

      You are right in saying that the purpose of a cost estimate is to enable the CBO to choose between alternative plans for the same statement. There is, theoretically then, no requirement for costs across queries to correlate. However, in practice the estimate is supposed to correlate to the number of single block reads (or equivalent).

      First let us consider the case of two queries that differ only in respect of the presence of absence of a hint.

      When you add a hint you are constraining the CBO from selecting any plan it wants. As a result when you add a hint you will normally get either a) An identical plan to the the one that the CBO would have come up with on its own or b) a different plan with a cost that is greater or equal to the the unhinted one. The argument being that the CBO would have picked a lower cost plan if it were available without the need for a hint.

      A noteable exception is the bushy join that I descibe here:

      http://tonyhasler.wordpress.com/2008/12/27/bushy-joins/

      Now let us return to the “INDEX FULL/RANGE SCAN (MIN/MAX)” costings. The inappropriate reduction in cost when you add a filter predicate is important because the estimated cost is wildly different from the number of blocks that need to be accessed. The full table scan is much cheaper because it is only the index that spans 1000 blocks not the table. Unfortunatley, the CBO picked the worse performing plan as a result of the poor cost estimate. Look at the performance of the query when a FULL hint is added:

      SQL> select /*+ full(t1) */ max(n1) from t1 where n2 = 900;
      
         MAX(N1)
      ----------
             100
      
      
      Execution Plan
      ----------------------------------------------------------
      Plan hash value: 3724264953
      
      ---------------------------------------------------------------------------
      | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
      ---------------------------------------------------------------------------
      |   0 | SELECT STATEMENT   |      |     1 |     8 |     6   (0)| 00:00:01 |
      |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
      |*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |     6   (0)| 00:00:01 |
      ---------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         2 - filter("N2"=900)
      
      
      Statistics
      ----------------------------------------------------------
                0  recursive calls
                0  db block gets
               23  consistent gets
                0  physical reads
                0  redo size
              334  bytes sent via SQL*Net to client
              338  bytes received via SQL*Net from client
                2  SQL*Net roundtrips to/from client
                0  sorts (memory)
                0  sorts (disk)
                1  rows processed
      

      As you can see the number of blocks that needed to be accessed was much less.

      Comment by tonyhasler — September 9, 2009 @ 10:33 am | Reply

  2. [...] was used.  Specifically, the highly efficient MIN/MAX optimisation that I discussed at length here.  Unfortunately no index was used in the last case.  Why? Because you will see that the index of [...]

    Pingback by Tony’s Tirade against TIMESTAMP WITH TIME ZONE « Tony’s Oracle Tips — September 4, 2010 @ 11:34 pm | Reply

  3. [...] through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index [...]

    Pingback by Cost – again « Oracle Scratchpad — January 10, 2011 @ 6:50 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers

%d bloggers like this: