September | 2015 | Tony's Oracle Tips

September 13, 2015

The importance of the GLOBAL_STATS statistic on partitioned tables

Filed under: Uncategorized — tonyhasler @ 9:22 pm

I am currently working in a team alongside a gentleman known as Roberto Rigliaco. You are unlikely to have heard of Roberto as he is not (yet) a blogger and not (yet) known on the conference circuit.

Nevertheless, Roberto has done some excellent work recently analyzing the use of object statistics in partitioned tables. I haven’t seen his findings published anywhere else and so I am glad to say that Roberto has accepted my offer to publish his findings on my blog.

It turns out that the setting of the GLOBAL_STATS column in ALL_TAB_STATISTICS is critical to the analysis. Don’t be confused into thinking that the column GLOBAL_STATS has anything to do with the level (table, partition or subpartition) that the statistics refer to. In fact, GLOBAL_STATS indicates whether the table statistics have been aggregated from partition level statistics (“NO”) or explicitly gathered or set (“YES”). At the partition level a value of GLOBAL_STATS=”NO” would mean that the partition level statistics were aggregated from subpartition level statistics.

I will begin by giving you the punch lines and then providing some examples to prove the points. First some intuitive stuff:

If more than one partition is referenced then table level statistics will be used if available. No partition or subpartition level statistics will be used.
If partition elimination occurs but the optimizer does not know what single partition will be used then table level statistics will be used if available as above.
If partition elimination occurs (and in the case of composite partitioned tables elimination of all but one subpartition has not occurred) and Oracle knows what partition will be used then the statistics for that partition will be used if available.
If subpartition elimination occurs and the optimizer knows neither the partition nor subpartition that will be used then table level statistics will be used if available.
If subpartition elimination occurs and Oracle knows both the partition and subpartition that will be used then subpartition level statistics will be used if available.

The word “knows” in the above list needs to be taken with a pinch of salt: the optimizer might use bind variable peeking to create an execution plan based on statistics for one (sub)partition and then use that plan on an entirely different (sub)partition later on. As always with bind variable peeking, the execution plan obtained through the explain plan statement may differ from the one seen at runtime.

Now comes the not so intuitive stuff:

If Oracle knows what partition will be used (and in the case of composite partitioned tables elimination of all but one subpartition has not occurred) but partition level statistics are not available then table level statistics will be used only if GLOBAL_STATS is set to “YES” for the table level statistics.
If subpartition elimination occurs and the optimizer knows what single partition will be used but not which single subpartition then table level statistics will be used if available regardless of the value of GLOBAL_STATS.
If Oracle knows what single subpartition will be used and subpartition statistics are not available, then table level statistics will be used only if GLOBAL_STATS is set to “YES” for the table level statistics. Interestingly, partition level statistics will not be used even if GLOBAL_STATS=”YES”.

Roberto provided me with a full suite of test cases to verify the observations above and I have checked them on 11gR2 and 12cR1. For the purpose of this blog I will provide just three cases.

Let us begin by creating our test table

CREATE TABLE stats_test
(
   part_key      INTEGER
  ,subpart_key   INTEGER
)
PARTITION BY RANGE (part_key)
   SUBPARTITION BY LIST (subpart_key)
   (PARTITION p1 VALUES LESS THAN (2)
       (SUBPARTITION p1_sp1 VALUES (1), SUBPARTITION p1_sp2 VALUES (2))
   ,PARTITION p2 VALUES LESS THAN (3)
       (SUBPARTITION p2_sp1 VALUES (1), SUBPARTITION p2_sp2 VALUES (2)));

BEGIN
   DBMS_STATS.set_table_stats (ownname    => USER
                              ,tabname    => 'STATS_TEST'
                              ,partname   => 'P1'
                              ,numrows    => 100);
   DBMS_STATS.set_column_stats (ownname    => USER
                               ,tabname    => 'STATS_TEST'
                               ,partname   => 'P1'
                               ,colname    => 'PART_KEY'
                               ,distcnt    => 5
                               ,density    => 1 / 5);
   DBMS_STATS.set_column_stats (ownname    => USER
                               ,tabname    => 'STATS_TEST'
                               ,partname   => 'P1'
                               ,colname    => 'SUBPART_KEY'
                               ,distcnt    => 5
                               ,density    => 1 / 5);
   DBMS_STATS.set_table_stats (ownname    => USER
                              ,tabname    => 'STATS_TEST'
                              ,partname   => 'P2'
                              ,numrows    => 200);
   DBMS_STATS.set_column_stats (ownname    => USER
                               ,tabname    => 'STATS_TEST'
                               ,partname   => 'P2'
                               ,colname    => 'PART_KEY'
                               ,distcnt    => 2
                               ,density    => 1 / 2);
   DBMS_STATS.set_column_stats (ownname    => USER
                               ,tabname    => 'STATS_TEST'
                               ,partname   => 'P2'
                               ,colname    => 'SUBPART_KEY'
                               ,distcnt    => 2
                               ,density    => 1 / 2);
END;
/

STATS_TEST has two partitions each of which has two subpartitions. The table is empty but I have set relevant statistics at the partition level only. This means that:

There are no subpartition level statistics

The partition level statistics have GLOBAL_STATS=YES
The table level stats have GLOBAL_STATS=NO because they have been aggregated from the partition level statistics.

Now let us run a couple of EXPLAIN PLAN statements and examine the associated execution plans. Let us start without using bind variables.

EXPLAIN PLAN
   FOR
      SELECT COUNT (*)
        FROM stats_test
       WHERE part_key = 1 AND subpart_key = 1;

SELECT *
  FROM TABLE (
          DBMS_XPLAN.display (NULL, NULL, 'basic +rows +partition +note'));
----------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |     1 |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE|            |     1 |     1 |     1 |
|   3 |    PARTITION LIST SINGLE|            |     1 |     1 |     1 |
|   4 |     TABLE ACCESS FULL   | STATS_TEST |     1 |     1 |     1 |
----------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Notice that the precise subpartition is known because literals have been used for the predicates. However, there are no subpartition statistics to use. Partition level statistics have been mysteriously bypassed and the table level statistics not used because GLOBAL_STATS=NO. The fact that no statistics have been used is confirmed by both the row count and the use of dynamic sampling.

Now let us replace one literal with a bind variable.

EXPLAIN PLAN
   FOR
      SELECT COUNT (*)
        FROM stats_test
       WHERE part_key = 1 AND subpart_key = :b1;

SELECT *
  FROM TABLE (
          DBMS_XPLAN.display (NULL, NULL, 'basic +rows +partition +note'));
----------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |     1 |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE|            |    12 |     1 |     1 |
|   3 |    PARTITION LIST SINGLE|            |    12 |   KEY |   KEY |
|   4 |     TABLE ACCESS FULL   | STATS_TEST |    12 |   KEY |   KEY |
----------------------------------------------------------------------

In this example, the optimizer knows what partition is used but not what subpartition. In this case, the optimizer uses the table level statistics regardless of the fact that GLOBAL_STATS=NO. Notice that the row count is 12, obtained by dividing the number of rows the statistics indicate are in the table as a whole by the number of distinct values of the two columns (300/5/5=12).

Finally, let us actually run the query with the bind variable in place:

DECLARE
   dummy       PLS_INTEGER;
   v_subpart   PLS_INTEGER := 1;
BEGIN
   SELECT COUNT (*)
     INTO dummy
     FROM stats_test
    WHERE part_key = 1 AND subpart_key = v_subpart;
END;
/

SELECT *
  FROM TABLE (
          DBMS_XPLAN.display_cursor ('c9uw6vg3204v0'
                                    ,NULL
                                    ,'basic +rows +partition +peeked_binds +note'));
SELECT COUNT (*) FROM STATS_TEST WHERE PART_KEY = 1 AND SUBPART_KEY =
:B1

Plan hash value: 3184952168

----------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |       |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE|            |     1 |     1 |     1 |
|   3 |    PARTITION LIST SINGLE|            |     1 |   KEY |   KEY |
|   4 |     TABLE ACCESS FULL   | STATS_TEST |     1 |   KEY |   KEY |
----------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :B1 (NUMBER): 1

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

The use of bind variable peeking means that the optimizer now knows that it wants to use statistics for subpartition P1_SP1 even though the actual subpartition at runtime might vary. The optimizer reverts to the use of dynamic sampling as with the use of literals.

I don’t know if this behaviour is deliberate or accidental but if it is deliberate the logic escapes me.