Tony’s Oracle Tips

January 23, 2011

MAXIMUM/MINIMUM Column Statistics – Part 1

Filed under: Uncategorized — tonyhasler @ 9:32 pm

I presented a talk at the 2010 UKOUG national conference entitled “Stabilising Statistics” (materials available here) that suggested setting the minimum and maximum values for column statistics to very low and very high values respectively. This technique contributes to execution plan stability. I will refer to this technique as VDV (very different values) for the rest of this note.

During that presentation Wolfgang Breitling posed the question “What happens if you set the maximum and minimum values to NULL”?

I couldn’t answer him at the time and I promised that I would post an answer on my blog.

You will have to wait for the answer a little bit more I am afraid because I must first post some principles that I discovered whilst researching the answer.

If we dive down into the data dictionary we can find a a table called HIST_HEAD$. This table contains the minimum and maximum values for column statistics. What is interesting is that there are two sets of minimum and maximum values. These pairs of values are MINIMUM/MAXIMUM (type NUMBER) and LOWVAL/HIVAL (type RAW(32)). As far as I can see, only datatypes CHAR,NCHAR, VARCHAR2, and NVARCHAR2 use the LOWVAL/HIVAL pair in cardinality calculations. All other datatypes use the MINIMUM/MAXIMUM pair (including RAW!).

Unfortunately, when you look at views such as USER_TAB_COLS all you see is the LOWVAL/HIVAL pair from sys.HIST_HEAD$ reported as LOW_VALUE/HIGH_VALUE. The other pair is completely hidden.

“So what”? You may ask. These values are equivalent aren’t they?

Well some of the time, yes. But not always.

Here is how you might set MIN/MAX column statistics yourself:

DROP TABLE t1;
CREATE TABLE t1 (a NUMBER, b DATE, c TIMESTAMP, d TIMESTAMP WITH LOCAL TIME ZONE,
 e RAW(32), f VARCHAR2(32),g NVARCHAR2(32),h CHAR(32),
i INTERVAL DAY TO SECOND,j INTERVAL YEAR TO MONTH, k ROWID,
l TIMESTAMP WITH TIME ZONE,m nchar(32));

DECLARE
   srec   DBMS_STATS.statrec;
BEGIN
   DBMS_STATS.set_table_stats (ownname            => USER,
                               tabname            => 'T1',
                               numrows            => 20000,
                               numblks            => 1000,
                               avgrlen            => 400,
                               no_invalidate      => FALSE);

   FOR r IN (SELECT *
               FROM user_tab_cols
              WHERE table_name = 'T1')
   LOOP
      srec.epc := 2;                       -- Two endpoints
      srec.bkvals := NULL;                  -- No histogram
      DBMS_STATS.prepare_column_values
                                (srec,
                                 DBMS_STATS.rawarray (
                                 hextoraw('000000000000000000000000000000'),
                                                      -- Minimum
                                 hextoraw('ffffffffffffffffffffffffffffff')
                                                       -- Maximum
                                 ));
      -- srec.novals := dbms_stats.numarray(-9.9e125,9.9e125) ; -- Special fix
      DBMS_STATS.set_column_stats
                                 (ownname       => USER,
                                  tabname       => 'T1',
                                  partname      => NULL,
                                  colname       => r.column_name,
                                  distcnt       => 2,
                                  density       => 0.5,
                                  nullcnt       => 0,
                                  srec          => srec,
                                  avgclen       => 100);
   END LOOP;
END;
/
set autotrace off
column lowval format a30
column hival format a30
select h.lowval,h.hival,h.minimum,h.maximum
from sys.obj$ o,sys.hist_head$ h
where o.name='T1'
  and o.owner# = userenv('SCHEMAID')
  and o.obj#=h.obj# ;
--
set autotrace traceonly explain
select * from t1 where a > 1e30 ;
select * from t1 where a > 1e37 ;
select * from t1 where a < 1e30 ;
select * from t1 where a < 1e37 ;
select * from t1 where a >= 1e30 ;
select * from t1 where a >= 1e37 ;
select * from t1 where a <= 1e30 ;
select * from t1 where a <= 1e37 ;

This code sets the columns statistics to VDV and examines the consequences.
This is what we get:

LOWVAL                         HIVAL                             MINIMUM    MAXIMUM
------------------------------ ------------------------------ ---------- ----------
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          0 1.3292E+36
13 rows selected.

SQL> --
SQL> set autotrace traceonly explain
SQL> select * from t1 where a > 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">1e30)

SQL> select * from t1 where a > 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   400 |   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   400 |   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">1e37)

SQL> select * from t1 where a < 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  3906K|   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  3906K|   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<1e30) SQL> select * from t1 where a < 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<1e37) SQL> select * from t1 where a >= 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">=1e30)

SQL> select * from t1 where a >= 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   400 |   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   400 |   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">=1e37)

SQL> select * from t1 where a <= 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  3906K|   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  3906K|   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<=1e30) SQL> select * from t1 where a <= 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<=1e37)
 

You can see that cardinality estimates are badly affected when very large values are supplied in the predicates. We can get much more intelligible – and consistent – results by setting NOVALS ourselves. This is what happens when you uncomment the line labeled ‘Special Fix’ above.

LOWVAL                         HIVAL                             MINIMUM    MAXIMUM
------------------------------ ------------------------------ ---------- ----------
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
000000000000000000000000000000 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF -9.90E+125 9.900E+125
13 rows selected.

SQL> --
SQL> set autotrace traceonly explain
SQL> select * from t1 where a > 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  3906K|   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  3906K|   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">1e30)

SQL> select * from t1 where a > 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  3906K|   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  3906K|   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">1e37)

SQL> select * from t1 where a < 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  3906K|   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  3906K|   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<1e30) SQL> select * from t1 where a < 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  3906K|   222   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  3906K|   222   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<1e37) SQL> select * from t1 where a >= 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">=1e30)

SQL> select * from t1 where a >= 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A">=1e37)

SQL> select * from t1 where a <= 1e30 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<=1e30) SQL> select * from t1 where a <= 1e37 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 20000 |  7812K|   223   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   | 20000 |  7812K|   223   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"<=1e37)

January 3, 2011

UKOUG national conference 2010

Filed under: Uncategorized — tonyhasler @ 5:16 pm

As I am sure the vast majority of readers of this blog will know, “Blog” stands for “Web Log” and reflects the fact that most blogs are diaries of one form or another.  Mine is one of a minority of blogs in the world (though probably not a minority fo Oracle blogs) that are used as a way of publishing purely technical material, rather than personal experiences.

This entry is a bit different in that, although the material is largely technical,it does indeed reflect some personal experiences.  Apologies to those of you who find the change of tone disappointing; it is not a sign of things to come.

The first, and most important, thing I have to discuss about this years conference concerns a question Wolfgang Breitling asked me during my “Stabilising Statistics” talk.  This presentation (material available for download here) focussed on improving execution plan stability without the use of stored outlines or SQL baselines.  One of the items I discussed was how execution plans changed as a result of bind variable values moving beyond the minimum and maximum values recorded in column statistics.  This question was of considerable interest to Wolfgang as he had himself presented a paper in 2009 on the same subject entitled “Seeding Statistics”.

Wolfgang’s original technique was to set the number of distinct values to 2 and the minimum and maximum values the same!  Of course this cannot happen in real life: if the minimum and maximum are the same then there is only one value.  Wolfgang reported that he got good, stable plans with this approach but my experiments suggested that setting the minimum value of the column to a very low value and the maximum to a very high value achieved better results.

Wolfgang’s question was this:  “What happens if you set the minimum and maximum values to NULL”?  Well, I knew that I had tried it and that the results were not satisfactory but I didn’t recall the specifics of my incomplete analysis.  I promised to blog an answer.  Well it turns out that the topic is quite complicated and Wolfgang and I have collaborated on some research over the holiday period and I will publish the results soon.

The second experience I wanted to share with you was an enlightening conversation I had with a few new friends in the pub.  The topic was Oracle security.  This important topic is something that I find rather dry and have not thought about as much as other aspects of Oracle database technology – such as performance.  I was talking to Martin Nash, Neil Johnson, Ben Thompson and others.  I suggested that if your data was sensitive in nature you had to buy Database Vault as DBAs would otherwise be able to see your data.  It was reminded that privileges such as ALTER DATABASE, ALTER USER, and the SELECT_CATALOG role did not allow a DBA to read any business data at all.  In fact, as long as you control the rare occasions when a database needs to be bounced (that does require SYSDBA) your DBAs should be able to do most if not all of their work without having access to any business data!  Limiting DBA privileges may not be as complete a solution to role segregation as Database Vault but when I thought about it I realised that it certainly goes further than I had previously realised.

Last but not least I want to mention the other event that I was involved in at the conference:  a debate with Jonathan Lewis entitled “Does Oracle Ignore Hints”?  One thing that I learned from Jonathan during the preparation for my debate was that the technical analysis in this entry was completely wrong.  I have posted a correction here.

Blog at WordPress.com.