Tony’s Oracle Tips

September 4, 2010

Tony’s Tirade against TIMESTAMP WITH TIME ZONE

Filed under: Uncategorized — tonyhasler @ 11:34 pm

I have been thinking about posting this blog entry for some time but the final straw came a few days ago and I must now strongly warn people off using this poorly thought out datatype.

Before I begin let me briefly summarise the differences between the various datatypes that you can use in Oracle to store a timestamp:

DATE This was the only datatype that you could use prior to 10g and for general purpose use is still the most appropriate.  It does, however, not include any time zone component.  So if you copy a row that contains “04-SEP-2010 15:20” from a database in London accross a database link to a database in New York the value will be stored as “04-SEP-2010 15:20”.  The difference between two DATE values is a number of days (possibly including a fraction). The difference between two values of any of the new datatypes is an INTERVAL DAY TO SECOND. This is usually less convenient than a simple number as in the case of a DATE.
TIMESTAMP This new datatype is used by Oracle (exclusively it seems) for data dictionary objects from 10g onwards.  Many people assume that they should use this in preference to DATE for general purpose use. Yet its only advantage is that it can store times accurate to a fraction of a second, whereas you need to round DATE datatypes to the nearest second.
TIMESTAMP WITH LOCAL TIME ZONE This datatype has the benefit that it is “time zone aware”.  If you copy a value of “04-SEP-2010 15:20 +1:00” from a database in London across a database link to a database in New York the value in both databases will be visible as “04-SEP-2010 15:20 +1:00” to a London client (or more precisely to any client with a session time zone of GB or similar) as before. However, the value in both databases will be seen as “04-SEP-2010 11:20 -04:00” to any client with a session time zone of “America/New_York” or similar.  This may be a bit confusing as the same data will appear to different clients in different ways but there will be no ambiguity.
TIMESTAMP WITH TIME ZONE This looks for all the world as the most powerful datatype. It stores the time zone along with the time so the data is both unambiguous and presented in a consistent way to clients independent of the location of the client.  So in our example the value would be “04-SEP-2010 15:20 +01:00” in both London and New York databases and for both London and New York users.  The time zone reflects British summer time.

Based on the above many people are now using TIMESTAMP WITH TIME ZONE exclusively.  This is an exceptionally bad idea.  There are severe complications with this datatype and I would counsel against its use:

  • For general purpose use (such as logfiles and the like) stick with DATE
  • When you are copying data around different time zones use TIMESTAMP WITH
    LOCAL TIME ZONE.
  • Don’t bother with TIMESTAMP or TIMESTAMP WITH TIME ZONE

Let me begin with a basic usability issue that I have to live with every day at the moment.  I query data from a table called DRD_LOG that resides in a database in New York.  The table has a column called TS that is of type TIMESTAMP WITH TIME ZONE.  I would like to look for entries in the last hour by using a predicate “TS > SYSDATE-1/24“. 
However, this doesn’t work.  Look at this example:

--
-- Enter data in the time zone of the database
--
CREATE TABLE tz_examples (c_date DATE,c_tsltzTIMESTAMP WITH LOCAL TIME ZONE,c_tstz
TIMESTAMP WITH TIME ZONE);

INSERT INTO tz_examples
SELECT SYSDATE - (ROWNUM / 48),
SYSTIMESTAMP - (ROWNUM / 48),
SYSTIMESTAMP - (ROWNUM / 48)
FROM DUAL
CONNECT BY LEVEL <= 1000;

COMMIT ;
--
-- Now query the data from a time zone further East than the database
-- (you may have to change the next line depending on the location of your database)
--

ALTER SESSION SET TIME_ZONE ='+06:00';

SELECT *
FROM tz_examples
WHERE c_tstz > SYSDATE - 1 / 24;

The result is this:


Table created.
1000 rows created.
Commit complete.
Session altered.

C_DATE    C_TSLTZ                            C_TSTZ
--------- ---------------------------------- ----------------------------------
04-SEP-10 04-SEP-10 16.21.40.000000 +01:00   04-SEP-10 16.21.40.000000 +01:00
04-SEP-10 04-SEP-10 15.51.40.000000 +01:00   04-SEP-10 15.51.40.000000 +01:00
04-SEP-10 04-SEP-10 15.21.40.000000 +01:00   04-SEP-10 15.21.40.000000 +01:00
04-SEP-10 04-SEP-10 14.51.40.000000 +01:00   04-SEP-10 14.51.40.000000 +01:00
04-SEP-10 04-SEP-10 14.21.40.000000 +01:00   04-SEP-10 14.21.40.000000 +01:00
04-SEP-10 04-SEP-10 13.51.40.000000 +01:00   04-SEP-10 13.51.40.000000 +01:00
04-SEP-10 04-SEP-10 13.21.40.000000 +01:00   04-SEP-10 13.21.40.000000 +01:00
04-SEP-10 04-SEP-10 12.51.40.000000 +01:00   04-SEP-10 12.51.40.000000 +01:00
04-SEP-10 04-SEP-10 12.21.40.000000 +01:00   04-SEP-10 12.21.40.000000 +01:00
04-SEP-10 04-SEP-10 11.51.40.000000 +01:00   04-SEP-10 11.51.40.000000 +01:00
04-SEP-10 04-SEP-10 11.21.40.000000 +01:00   04-SEP-10 11.21.40.000000 +01:00  

11 rows selected.

I created a table with entries seperated by half an hour with the most recent being 30 minutes ago.  You can see that I actually got back far more rows than I wanted.  I can cure this by typing more:

SELECT *
FROM tz_examples
WHERE c_tstz > SYSTIMESTAMP - NUMTODSINTERVAL (1, 'HOUR');

But I usually do the more concise:

SELECT *
FROM tz_examples
WHERE c_tstz+0 > SYSDATE - 1 / 24;

For an understanding of why this works see my other blog entry here. This issue is specific to the TIMESTAMP WITH TIME ZONE datatype.  But there are far worse issues with TIMESTAMP WITH TIME ZONE than just ease of use.  The CBO can get thoroughly confused.  Let us add a few indexes and gather stats:

CREATE INDEX i_date ON tz_examples(c_date);
CREATE INDEX i_tsltz ON tz_examples(c_tsltz);
-- The next index is useful in some circumstances
CREATE INDEX i_tsltz2 ON tz_examples(SYS_EXTRACT_UTC(c_tsltz));
CREATE INDEX i_tstz ON tz_examples(c_tstz);
BEGIN
   DBMS_STATS.gather_table_stats
                    (USER,
                     'TZ_EXAMPLES',
                     estimate_percent      => 100,
                     method_opt            => 'FOR ALL COLUMNS SIZE 1');
END;
/

Now let us have a look at a few execution plans with autotrace in SQLPLUS:

SQL> SET autotrace traceonly explain lines 180 pages 0
SQL> SELECT MAX (c_date)
  2    FROM tz_examples;

Execution Plan
----------------------------------------------------------
Plan hash value: 2930774790

-------------------------------------------------------------------------------------
| Id  | Operation                  | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |        |     1 |     8 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |        |     1 |     8 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| I_DATE |  1000 |  8000 |            |          |
-------------------------------------------------------------------------------------

SQL> SELECT MAX (c_tsltz)
  2    FROM tz_examples;

Execution Plan
----------------------------------------------------------
Plan hash value: 2609621962

--------------------------------------------------------------------------------------
| Id  | Operation                  | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |         |     1 |    11 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |         |     1 |    11 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| I_TSLTZ |  1000 | 11000 |            |          |
--------------------------------------------------------------------------------------

SQL> SELECT MAX (c_tstz)
  2    FROM tz_examples;

Execution Plan
----------------------------------------------------------
Plan hash value: 608072692

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |     1 |    13 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |             |     1 |    13 |            |          |
|   2 |   TABLE ACCESS FULL| TZ_EXAMPLES |  1000 | 13000 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

You can see that in the first two cases the index was used.  Specifically, the highly efficient MIN/MAX optimisation that I discussed at length here.  Unfortunately no index was used in the last case.  Why? Because you will see that the index of a TIMESTAMP WITH TIME ZONE column is actually a function based index using the function SYS_EXTRACT_UTC.  UTC stands for Universal Coordinated Time (someone please tell me why the abbreviation is not UCT?) - in other words Greenwich Mean Time.  The long and the short of it is that the index could only be used if the function was used in the SQL statement - and even then only in 11g according to my tests.

This next, related, issue affects both TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE alike. Look at the cardinality estimates here:

SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_date > SYSDATE - 1/24;

Execution Plan
----------------------------------------------------------
Plan hash value: 2458437043

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     1 |    32 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |     1 |    32 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_DATE      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C_DATE">SYSDATE@!-.0416666666666666666666666666666666666667)

SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_tstz > SYSTIMESTAMP - NUMTODSINTERVAL (1, 'HOUR');

Execution Plan
----------------------------------------------------------
Plan hash value: 3545948392

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |    50 |  1600 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |    50 |  1600 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TSTZ      |     9 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SYS_EXTRACT_UTC("C_TSTZ")>SYS_EXTRACT_UTC(SYSTIMESTAMP(6)-INTERVAL'+0
              00000000 01:00:00.000000000' DAY(9) TO SECOND(9)))

SQL>

You can see that the cardinality estimate for the first query correctly uses the column statistics to identify the low number of rows that the query will return.  The second query uses the CBO built-in arbitrary value of 5% that it uses for range based predicates when no other information is available. When I saw this I thought the problem was that the presence of the SYS_EXTRACT_UTC function in the predicate precluded the use of the column statistics.  But this is not the full story as we will see shortly.

OK, so what was the issue that pushed me over the edge and persuaded me to write this blog?  Look at this:

SQL> SET autotrace traceonly explain
SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_tstz > TIMESTAMP '1999-04-15 8:00:00 -8:00';

Execution Plan
----------------------------------------------------------
Plan hash value: 2535155864

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |  1000 | 32000 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TZ_EXAMPLES |  1000 | 32000 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SYS_EXTRACT_UTC("C_TSTZ")>TIMESTAMP '1999-04-15
              16:00:00.000000000')

SQL>
SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_tstz >
  4            TO_TIMESTAMP_TZ ('1999-04-15 08:00:00 -8:00',
  5                             'YYYY-MM-DD HH24:MI:SS TZH:TZM');

Execution Plan
----------------------------------------------------------
Plan hash value: 3545948392

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |    50 |  1600 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |    50 |  1600 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TSTZ      |     9 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SYS_EXTRACT_UTC("C_TSTZ")>SYS_EXTRACT_UTC(TO_TIMESTAMP_TZ('1999-04-15
               08:00:00 -8:00','YYYY-MM-DD HH24:MI:SS TZH:TZM')))

SQL>

You see, whereas the TO_DATE built-in function is just Oracle's way of expressing a date literal (it preceded the now-supported but restricted ANSI syntax) and Oracle understands very well how to deal with expressions like SYSDATE-1, it seems that TO_TIMESTAMP_TZ and expressions based on SYSTIMESTAMP are nowhere near as sophisticated.  In both 10g and 11g you have to use actual literals to get the cardinality estimate correct.

And if this wasn't complicated enough, the use of bind variables creates even more confusion:

SQL> SET autotrace off
SQL>
SQL> BEGIN
  2     EXECUTE IMMEDIATE 'SELECT /* TAG1 */ * from tz_examples WHERE c_tstz > :B1'
  3                 USING
  4                       TO_TIMESTAMP_TZ
  5                             ('1999-04-15 00:00:00 +00:00',
  6                              'YYYY-MM-DD HH24:MI:SS TZH:TZM');
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL>
SQL> SELECT p.*
  2    FROM v$sql s,
  3         TABLE (DBMS_XPLAN.display_cursor (s.sql_id)) p
  4   WHERE sql_text LIKE 'SELECT /* TAG1%';
SQL_ID  36n9vmjmm0803, child number 0
-------------------------------------
SELECT /* TAG1 */ * from tz_examples WHERE c_tstz > :B1

Plan hash value: 3545948392

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |     1 |    32 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TSTZ      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TZ_EXAMPLES"."SYS_NC00004$">SYS_EXTRACT_UTC(:B1))

19 rows selected.

SQL>

What is going on here?  The cardinality estimate here is totally wrong here with 10g although it is correct (1000)  in 11g!

I should emphasise that although the performance impact of these various issues is minimal in this demonstration they can have disastrous performance implications in real life systems using large tables.

Stick to DATE if at all possible.

Advertisements

Create a free website or blog at WordPress.com.