Tony’s Oracle Tips

September 4, 2010

Tony’s Tirade against TIMESTAMP WITH TIME ZONE

Filed under: Uncategorized — tonyhasler @ 11:34 pm

I have been thinking about posting this blog entry for some time but the final straw came a few days ago and I must now strongly warn people off using this poorly thought out datatype.

Before I begin let me briefly summarise the differences between the various datatypes that you can use in Oracle to store a timestamp:

DATE This was the only datatype that you could use prior to 10g and for general purpose use is still the most appropriate.  It does, however, not include any time zone component.  So if you copy a row that contains “04-SEP-2010 15:20″ from a database in London accross a database link to a database in New York the value will be stored as “04-SEP-2010 15:20″.  The difference between two DATE values is a number of days (possibly including a fraction). The difference between two values of any of the new datatypes is an INTERVAL DAY TO SECOND. This is usually less convenient than a simple number as in the case of a DATE.
TIMESTAMP This new datatype is used by Oracle (exclusively it seems) for data dictionary objects from 10g onwards.  Many people assume that they should use this in preference to DATE for general purpose use. Yet its only advantage is that it can store times accurate to a fraction of a second, whereas you need to round DATE datatypes to the nearest second.
TIMESTAMP WITH LOCAL TIME ZONE This datatype has the benefit that it is “time zone aware”.  If you copy a value of “04-SEP-2010 15:20 +1:00″ from a database in London across a database link to a database in New York the value in both databases will be visible as “04-SEP-2010 15:20 +1:00″ to a London client (or more precisely to any client with a session time zone of GB or similar) as before. However, the value in both databases will be seen as “04-SEP-2010 11:20 -04:00″ to any client with a session time zone of “America/New_York” or similar.  This may be a bit confusing as the same data will appear to different clients in different ways but there will be no ambiguity.
TIMESTAMP WITH TIME ZONE This looks for all the world as the most powerful datatype. It stores the time zone along with the time so the data is both unambiguous and presented in a consistent way to clients independent of the location of the client.  So in our example the value would be “04-SEP-2010 15:20 +01:00″ in both London and New York databases and for both London and New York users.  The time zone reflects British summer time.

Based on the above many people are now using TIMESTAMP WITH TIME ZONE exclusively.  This is an exceptionally bad idea.  There are severe complications with this datatype and I would counsel against its use:

  • For general purpose use (such as logfiles and the like) stick with DATE
  • When you are copying data around different time zones use TIMESTAMP WITH
    LOCAL TIME ZONE.
  • Don’t bother with TIMESTAMP or TIMESTAMP WITH TIME ZONE

Let me begin with a basic usability issue that I have to live with every day at the moment.  I query data from a table called DRD_LOG that resides in a database in New York.  The table has a column called TS that is of type TIMESTAMP WITH TIME ZONE.  I would like to look for entries in the last hour by using a predicate “TS > SYSDATE-1/24“. 
However, this doesn’t work.  Look at this example:

--
-- Enter data in the time zone of the database
--
CREATE TABLE tz_examples (c_date DATE,c_tsltzTIMESTAMP WITH LOCAL TIME ZONE,c_tstz
TIMESTAMP WITH TIME ZONE);

INSERT INTO tz_examples
SELECT SYSDATE - (ROWNUM / 48),
SYSTIMESTAMP - (ROWNUM / 48),
SYSTIMESTAMP - (ROWNUM / 48)
FROM DUAL
CONNECT BY LEVEL <= 1000;

COMMIT ;
--
-- Now query the data from a time zone further East than the database
-- (you may have to change the next line depending on the location of your database)
--

ALTER SESSION SET TIME_ZONE ='+06:00';

SELECT *
FROM tz_examples
WHERE c_tstz > SYSDATE - 1 / 24;

The result is this:


Table created.
1000 rows created.
Commit complete.
Session altered.

C_DATE    C_TSLTZ                            C_TSTZ
--------- ---------------------------------- ----------------------------------
04-SEP-10 04-SEP-10 16.21.40.000000 +01:00   04-SEP-10 16.21.40.000000 +01:00
04-SEP-10 04-SEP-10 15.51.40.000000 +01:00   04-SEP-10 15.51.40.000000 +01:00
04-SEP-10 04-SEP-10 15.21.40.000000 +01:00   04-SEP-10 15.21.40.000000 +01:00
04-SEP-10 04-SEP-10 14.51.40.000000 +01:00   04-SEP-10 14.51.40.000000 +01:00
04-SEP-10 04-SEP-10 14.21.40.000000 +01:00   04-SEP-10 14.21.40.000000 +01:00
04-SEP-10 04-SEP-10 13.51.40.000000 +01:00   04-SEP-10 13.51.40.000000 +01:00
04-SEP-10 04-SEP-10 13.21.40.000000 +01:00   04-SEP-10 13.21.40.000000 +01:00
04-SEP-10 04-SEP-10 12.51.40.000000 +01:00   04-SEP-10 12.51.40.000000 +01:00
04-SEP-10 04-SEP-10 12.21.40.000000 +01:00   04-SEP-10 12.21.40.000000 +01:00
04-SEP-10 04-SEP-10 11.51.40.000000 +01:00   04-SEP-10 11.51.40.000000 +01:00
04-SEP-10 04-SEP-10 11.21.40.000000 +01:00   04-SEP-10 11.21.40.000000 +01:00  

11 rows selected.

I created a table with entries seperated by half an hour with the most recent being 30 minutes ago.  You can see that I actually got back far more rows than I wanted.  I can cure this by typing more:

SELECT *
FROM tz_examples
WHERE c_tstz > SYSTIMESTAMP - NUMTODSINTERVAL (1, 'HOUR');

But I usually do the more concise:

SELECT *
FROM tz_examples
WHERE c_tstz+0 > SYSDATE - 1 / 24;

For an understanding of why this works see my other blog entry here. This issue is specific to the TIMESTAMP WITH TIME ZONE datatype.  But there are far worse issues with TIMESTAMP WITH TIME ZONE than just ease of use.  The CBO can get thoroughly confused.  Let us add a few indexes and gather stats:

CREATE INDEX i_date ON tz_examples(c_date);
CREATE INDEX i_tsltz ON tz_examples(c_tsltz);
-- The next index is useful in some circumstances
CREATE INDEX i_tsltz2 ON tz_examples(SYS_EXTRACT_UTC(c_tsltz));
CREATE INDEX i_tstz ON tz_examples(c_tstz);
BEGIN
   DBMS_STATS.gather_table_stats
                    (USER,
                     'TZ_EXAMPLES',
                     estimate_percent      => 100,
                     method_opt            => 'FOR ALL COLUMNS SIZE 1');
END;
/

Now let us have a look at a few execution plans with autotrace in SQLPLUS:

SQL> SET autotrace traceonly explain lines 180 pages 0
SQL> SELECT MAX (c_date)
  2    FROM tz_examples;

Execution Plan
----------------------------------------------------------
Plan hash value: 2930774790

-------------------------------------------------------------------------------------
| Id  | Operation                  | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |        |     1 |     8 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |        |     1 |     8 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| I_DATE |  1000 |  8000 |            |          |
-------------------------------------------------------------------------------------

SQL> SELECT MAX (c_tsltz)
  2    FROM tz_examples;

Execution Plan
----------------------------------------------------------
Plan hash value: 2609621962

--------------------------------------------------------------------------------------
| Id  | Operation                  | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |         |     1 |    11 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |         |     1 |    11 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| I_TSLTZ |  1000 | 11000 |            |          |
--------------------------------------------------------------------------------------

SQL> SELECT MAX (c_tstz)
  2    FROM tz_examples;

Execution Plan
----------------------------------------------------------
Plan hash value: 608072692

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |     1 |    13 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |             |     1 |    13 |            |          |
|   2 |   TABLE ACCESS FULL| TZ_EXAMPLES |  1000 | 13000 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

You can see that in the first two cases the index was used.  Specifically, the highly efficient MIN/MAX optimisation that I discussed at length here.  Unfortunately no index was used in the last case.  Why? Because you will see that the index of a TIMESTAMP WITH TIME ZONE column is actually a function based index using the function SYS_EXTRACT_UTC.  UTC stands for Universal Coordinated Time (someone please tell me why the abbreviation is not UCT?) - in other words Greenwich Mean Time.  The long and the short of it is that the index could only be used if the function was used in the SQL statement - and even then only in 11g according to my tests.

This next, related, issue affects both TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE alike. Look at the cardinality estimates here:

SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_date > SYSDATE - 1/24;

Execution Plan
----------------------------------------------------------
Plan hash value: 2458437043

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     1 |    32 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |     1 |    32 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_DATE      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C_DATE">SYSDATE@!-.0416666666666666666666666666666666666667)

SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_tstz > SYSTIMESTAMP - NUMTODSINTERVAL (1, 'HOUR');

Execution Plan
----------------------------------------------------------
Plan hash value: 3545948392

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |    50 |  1600 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |    50 |  1600 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TSTZ      |     9 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SYS_EXTRACT_UTC("C_TSTZ")>SYS_EXTRACT_UTC(SYSTIMESTAMP(6)-INTERVAL'+0
              00000000 01:00:00.000000000' DAY(9) TO SECOND(9)))

SQL>

You can see that the cardinality estimate for the first query correctly uses the column statistics to identify the low number of rows that the query will return.  The second query uses the CBO built-in arbitrary value of 5% that it uses for range based predicates when no other information is available. When I saw this I thought the problem was that the presence of the SYS_EXTRACT_UTC function in the predicate precluded the use of the column statistics.  But this is not the full story as we will see shortly.

OK, so what was the issue that pushed me over the edge and persuaded me to write this blog?  Look at this:

SQL> SET autotrace traceonly explain
SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_tstz > TIMESTAMP '1999-04-15 8:00:00 -8:00';

Execution Plan
----------------------------------------------------------
Plan hash value: 2535155864

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |  1000 | 32000 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TZ_EXAMPLES |  1000 | 32000 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SYS_EXTRACT_UTC("C_TSTZ")>TIMESTAMP '1999-04-15
              16:00:00.000000000')

SQL>
SQL> SELECT *
  2    FROM tz_examples
  3   WHERE c_tstz >
  4            TO_TIMESTAMP_TZ ('1999-04-15 08:00:00 -8:00',
  5                             'YYYY-MM-DD HH24:MI:SS TZH:TZM');

Execution Plan
----------------------------------------------------------
Plan hash value: 3545948392

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |    50 |  1600 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |    50 |  1600 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TSTZ      |     9 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SYS_EXTRACT_UTC("C_TSTZ")>SYS_EXTRACT_UTC(TO_TIMESTAMP_TZ('1999-04-15
               08:00:00 -8:00','YYYY-MM-DD HH24:MI:SS TZH:TZM')))

SQL>

You see, whereas the TO_DATE built-in function is just Oracle's way of expressing a date literal (it preceded the now-supported but restricted ANSI syntax) and Oracle understands very well how to deal with expressions like SYSDATE-1, it seems that TO_TIMESTAMP_TZ and expressions based on SYSTIMESTAMP are nowhere near as sophisticated.  In both 10g and 11g you have to use actual literals to get the cardinality estimate correct.

And if this wasn't complicated enough, the use of bind variables creates even more confusion:

SQL> SET autotrace off
SQL>
SQL> BEGIN
  2     EXECUTE IMMEDIATE 'SELECT /* TAG1 */ * from tz_examples WHERE c_tstz > :B1'
  3                 USING
  4                       TO_TIMESTAMP_TZ
  5                             ('1999-04-15 00:00:00 +00:00',
  6                              'YYYY-MM-DD HH24:MI:SS TZH:TZM');
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL>
SQL> SELECT p.*
  2    FROM v$sql s,
  3         TABLE (DBMS_XPLAN.display_cursor (s.sql_id)) p
  4   WHERE sql_text LIKE 'SELECT /* TAG1%';
SQL_ID  36n9vmjmm0803, child number 0
-------------------------------------
SELECT /* TAG1 */ * from tz_examples WHERE c_tstz > :B1

Plan hash value: 3545948392

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| TZ_EXAMPLES |     1 |    32 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TSTZ      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TZ_EXAMPLES"."SYS_NC00004$">SYS_EXTRACT_UTC(:B1))

19 rows selected.

SQL>

What is going on here?  The cardinality estimate here is totally wrong here with 10g although it is correct (1000)  in 11g!

I should emphasise that although the performance impact of these various issues is minimal in this demonstration they can have disastrous performance implications in real life systems using large tables.

Stick to DATE if at all possible.

About these ads

9 Comments »

  1. UTC is actually defined as Coordinated Universal Time which should be CUT. When the standard was being internationally agreed the English speaking nations wanted Coordinated Universal Time (CUT), the French speakers wanted Temps Universel Coordonne (TUC) and so a compromise for the abbreviation UTC was eventually agreed and this format matches previous time standards such as UT0, UT1 and UT2.

    Comment by Tony Sleight — November 12, 2010 @ 12:03 pm | Reply

    • Great! Thanks for clearing this up. I always wondered what the history was.

      Comment by tonyhasler — November 13, 2010 @ 4:17 pm | Reply

  2. Thanks for the post – I was wondering whether I was just being lazy by sticking with DATE all those years…

    I usually decipher UTC as “Universal Time Coordinate”, a universal coordinate along the time axis. It might not be accurate, but at least makes some sense :-)

    Cheers!
    Flado

    Comment by Flado — January 11, 2011 @ 9:31 am | Reply

  3. I am about to embark on the journey of heavily using TIMESTAMP WITH TIME ZONE. I have read your reasoning about why I should take a different route, but there is something the TSTZ offers me that nothing else does…

    That is the ability to retain the timezone (or tz offset) that was originally entered by the user.

    TSLTZ is no good because the value is shifted by client settings. But I often want to display a value in the “original/native” timezone that it was entered with. I lose that information unless I use the TSTZ datatype.

    So what am I to do!?

    + ms which for me is always ZLTZ obviously doesn’t

    Comment by DC — April 4, 2011 @ 6:51 pm | Reply

    • You are right that TSTZ is the only datatype that allows you to retain the “original” time zone of the client. If you really want this information remember to use CURRENT_TIMESTAMP rather than SYSTIMESTAMP for any code that specifies the current time.

      However, I am curious as to why the time zone that the client is using is of any interest. Can you explain what the business need for this information is?

      Comment by tonyhasler — April 5, 2011 @ 6:49 am | Reply

      • We do a lot of global research, and when displaying reports to clients it is often important to display date/time values on the reports in the time zone that is consistent with the other data on the report which they relate to.

        We could just store a date/time w/o timezone and ensure that it is never converted or normalized to any other zone, however there are often instances where these values need to be used in calculations or are compared to dates of other records which may not be in the same zone.

        Thus we need to both retain and display native zone, as well as have the ability to compare universally consistent points in time.

        Comment by DC — April 8, 2011 @ 4:24 pm

  4. If all you need to do is ensure times are reported in a particular time zone then producing the report using TSLTZ will do what you want providing the session time zone is set correctly (either explicitly or implicitly) before you start. If, however, your report says something like “90% of people in New York start work before 8:00 a.m. EST and only 80% of people in London start work before 8:00 a.m. GMT” then you need TSTZ as changing the session time zone affects the display of every timestamp from TSLTZ in the session. There are numerous similar problems and it sounds like yours is one of them.

    As you say, using DATE or TIMESTAMP would solve this problem but also as you say this precludes comparison.

    So the good news is that you are now forewarned and forearmed: your business problem is very unusual and you should prepare whoever is doing your coding to be sensitive to the possibility of performance issues.

    Comment by tonyhasler — April 8, 2011 @ 10:57 pm | Reply

  5. Helpful information. Fortunate me I discovered your site unintentionally, and
    I am shocked why this coincidence did not took place earlier!
    I bookmarked it.

    Comment by bathroom cleaning tools — December 12, 2012 @ 4:13 am | Reply

  6. […] posts timezone issues in more detail from Jonathan Lewis and Tony Hasler. For a quick example script, keep […]

    Pingback by More Date Mysteries: When Is Current SYS? » SQLfail — November 11, 2013 @ 9:01 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers

%d bloggers like this: