Tony’s Oracle Tips

January 17, 2010

Cardinality estimates in the CBO

Filed under: Uncategorized — tonyhasler @ 9:58 pm

There is a rather fundamental point about cardinality and cost calculation in the CBO that seems to have escaped the public attention.

Even if you, like I, have bought Jonathan Lewis’s book on the CBO (e.g., from Amazon) you may still be oblivious. Jonathan assures me, however, that the detail in this blog will be covered in his next book.

Consider the following script that joins two identical tables:


set linesize 180
set pages 0
DROP TABLE t1;
DROP TABLE t2;

CREATE TABLE t1 (n1 INTEGER);

INSERT INTO t1
       SELECT   ROWNUM
         FROM   DUAL
   CONNECT BY   LEVEL  USER
                                   ,tabname => table_name
                                   ,partname => NULL
                                   ,estimate_percent => 100
                                   ,block_sample => TRUE
                                   ,method_opt => 'for all columns size 1'
                                   ,degree => NULL
                                   ,granularity => 'default'
                                   ,cascade => TRUE);
   END;
BEGIN
   my_gather('T1');
   my_gather('T2');
END;
/

SELECT   /*+ gather_plan_statistics leading(t2) use_nl(t1) */
         COUNT( * )
  FROM   t1, t2
 WHERE   t1.n1 = t2.n1;

SELECT   * FROM table(DBMS_XPLAN.display_cursor(NULL, NULL, 'IOSTATS LAST'));

The execution plan that results on 10.2.0.4 is:


--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.78 |     100K|
|   2 |   NESTED LOOPS      |      |      1 |    100K|    100K|00:00:01.00 |     100K|
|   3 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.10 |     157 |
|*  4 |    INDEX RANGE SCAN | I_T1 |    100K|      1 |    100K|00:00:00.57 |     100K|
--------------------------------------------------------------------------------------

You will see that the estimated number of rows in operation 4 is 1 but the actual number of rows is 100,000. Now in addition to buying Jonathan Lewis’s book I have both read and taught the concept of Tuning by Cardinality Feedback a tuning methodology invented by Wolfgang Breitling many moons ago. His assertion is that when the estimated and actual cardinalities of an operation in an explain plan differ by several orders of magnitide it is an indication that the CBO is working from a false set of assumptions.

In this case, however, the CBO is not remotely confused. The point is that when it comes to estimated costs and cardinalities the figures reflect a single execution of the operation and the actual figures reflect the total for all executions of the operation. Because T2 has 100,000 rows the index range scan will be executed 100,000 times (as confirmed by the STARTS column above) and each execution will return exactly one row.

So if you want to use Wolfgang’s methodology you need to divide the column LAST_OUTPUT_ROWS in V$SQL_PLAN_STATISTICS_ALL by LAST_STARTS to get a meaningful comparison with CARDINALITY.


WITH X
       AS (SELECT   sql_id
             FROM   v$sql
            WHERE   sql_text LIKE
                       'SELECT   /*+ gather_plan_statistics leading(t2) %'
                    AND ROWNUM = 1)
  SELECT   DEPTH
          ,LPAD(' ', 2 * DEPTH) || operation operation
          ,options
          ,object_name
          ,last_starts
          ,CARDINALITY "EST ROWS"
          ,last_output_rows "ACTUAL ROWS"
          ,last_output_rows / last_starts "ACTUAL ROWS PER OP"
    FROM      v$sql_plan_statistics_all
           NATURAL JOIN
              x
ORDER BY   id;

The output is:



     DEPTH OPERATION                      OPTIONS    OBJECT_NAME LAST_STARTS   EST ROWS ACTUAL ROWS ACTUAL ROWS PER OP
---------- ------------------------------ ---------- ----------- ----------- ---------- ----------- ------------------
         1   SORT                         AGGREGATE                        1          1           1              1
         2     NESTED LOOPS                                                1     100000      100000         100000
         3       TABLE ACCESS             FULL       T2                    1     100000      100000         100000
         3       INDEX                    RANGE SCAN I_T1             100000          1      100000              1

Incidentally, Jonathan has advertised this himself, albeit more briefly, in his blog here

Identifying the SQL_ID of static SQL in PL/SQL blocks

Filed under: Uncategorized — tonyhasler @ 8:34 pm

I was asked recently if it was possible for a PL/SQL block to identify information about SQL statements the block itself was running. For example, it might be nice during development of a rapidly changing piece of code to log the text and/or execution statistics of a piece of static SQL embedded in the block. 

All of this would be possible were it possible to identify the SQL_ID and CHILD_NUMBER of the static SQL you are interested in. To do this you need to understand how the following four columns in V$SESSION are populated: 

  • SQL_ID
  • SQL_CHILD_NUMBER
  • PREV_SQL_ID
  • PREV_CHILD_NUMBER

The basic idea is that the first two columns are supposed to relate to the current SQL statement and the last two to the previous statement. However, these terms are a little ambiguous when you consider that an anonymous PL/SQL block is considered a SQL statement and that, therefore, when a PL/SQL statement executes static SQL there are always at least two ‘current’ SQL statements. Indeed, when we consider recursive SQL and cursors there could be many ‘current’ SQL statements running at the same time. The rules that I have been able to demonstrate are: 

  • When a SQL statement begins ‘running’ (I can’t be much more precise than that) or regains control from another SQL statement it sets the SQL_ID and SQL_CHILD_NUMBER columns.
  • When a SQL statement completes it sets the PREV_SQL_ID and PREV_CHILD_NUMBER columns.
  • When an invalid SQL statement is issued the SQL_ID and SQL_CHILD_NUMBER are set to NULL.

Consider this PL/SQL block: 

DECLARE
   x   NUMBER;
BEGIN
   SELECT   1 INTO x FROM DUAL;

   DBMS_LOCK.sleep(10);
END;
/

When the PL/SQL block starts the SQL_ID column of V$SESSION references the entire block. While the embedded SELECT statement is running the SQL_ID column is changed to reference the SELECT statement but the PREV_SQL_ID column is unchanged. When the SELECT statement completes it sets PREV_SQL_ID. The PL/SQL block then regains control and sets SQL_ID back to reference itself. This is what you will see whilst the sleep is running because PL/SQL functions like DBMS_LOCK.SLEEP do not change the SQL_ID (unless, of course, they contain embedded SQL themselves). 

When the PL/SQL block as a whole completes the PREV_SQL_ID column is set to the same value as the SQL_ID column i.e., to reference the PL/SQL block. 

With this understanding we can now code a function to capture information about an embedded SQL statement in a PL/SQL block. The code will look something like this: 

CREATE OR REPLACE PACKAGE BODY DRD.SQL_UTIL
AS
   FUNCTION GET_LAST_SQLTEXT
      RETURN CLOB
   IS
      l_sql_text   CLOB;
   BEGIN
      SELECT   sql_fulltext
        INTO   l_sql_text
        FROM      v$sql sq
               JOIN
                  v$session se
               ON     sq.sql_id = se.prev_sql_id
                  AND se.prev_child_number = sq.child_number
                  AND sid = SYS_CONTEXT('USERENV', 'SID');

      RETURN l_sql_text;
   END;
END SQL_UTIL;
/

Let us try and use this: 

DECLARE
   x   NUMBER;
BEGIN
   select   1 INTO x FROM dual;
   DBMS_OUTPUT.put_line(sql_util.get_last_sqltext);
END;
/

The output from this PL/SQL block will be


SELECT 1 FROM DUAL

This is because the PL/SQL compiler always uppercases the statement and because the ‘INTO x’ is a PL/SQL specific extension to the SQL statement and not part of the SQL statement proper. 

For cursors you have to wait until the cursor is closed: 

DECLARE    CURSOR x    IS       SELECT /*+ tag1 */    dummy FROM DUAL;
    y        DUAL.dummy%TYPE;
BEGIN
   OPEN x; 
   LOOP
      FETCH x INTO y;
      EXIT;
   END LOOP; 

   CLOSE x; 

   DBMS_OUTPUT.put_line(sql_util.get_last_sqltext);
END;
/ 

The output of this block is:

SELECT /*+ tag1 */ DUMMY FROM DUAL

This demonstrates another cool point. Although most comments are removed from static SQL, you can use a tag that looks like a hint to help you identify your code uniquely.

January 16, 2010

Converting a date to a day of the week

Filed under: Uncategorized — tonyhasler @ 2:32 pm

I try and avoid making blog entries about discoveries I make on the web. After all, why duplicate information?

However, on this occasion Tanel Poder suggested I did after he and I both faced the same issue within days of each other. What is even more bizarre is we both made the same “mistake” whilst developing the same code!!!

We were both writing code to filter performance information based on “windows” that enabled us to eliminate weekend and overnight periods.

The code, unsurprisingly, required us to determine the day of the week for a given date.  The obvious way to do this is:

SELECT TO_CHAR(date '2010-01-16','D') FROM dual;

However, the results of this expression are non-deterministic. If the setting of NLS_TERRITORY is AMERICA then you get 7, indicating the last day of the week i.e. Saturday. However, for some reason Oracle believes that when NLS_TERRITORY is set to UNITED KINGDOM that Monday should be considered the start of the week and so you get 6. I do not know why this is as I was brought up in the UK to believe that Sunday was the first day of the week!

The problem is that NLS_TERRITORY  is set for the Oracle client installation, not just the server so people in different parts of the world get different results for the same query. You could, of course, change the settings for the session explicitly. You might also think that you could use the third argument to TO_CHAR to explicitly specify the NLS_TERRITORY for that particular call. However, despite references to the contrary in the documentation this doesn’t work.

I discovered a neat solution to this problem in Laurent Schneider’s blog here.

SELECT MOD(date '2010-01-16' - date '1000-01-01',7) FROM dual;

In general, Monday will be 0, Tuesday will be 1 and so on with Sunday being 6. So in our case, Saturday will be 5 regardless of the setting of NLS_TERRITORY.

Blog at WordPress.com.