Tony’s Oracle Tips

March 27, 2012

FIRST/LAST versus FIRST_VALUE/LAST_VALUE

Filed under: SQL Fun — tonyhasler @ 2:28 pm

As I promised in my last post, I want to discuss the difference between FIRST/LAST and FIRST_VALUE/LAST_VALUE.

Previously I discussed FIRST and LAST purely as aggregate functions but they can be used as analytic functions as well. FIRST_VALUE and LAST_VALUE can only be used as an analytic functions and in that respect at least they are somewhat less powerful.

Most of us take a while to get to grasp with what analytic functions are and how they differ from aggregate functions when we first learn about them but there are many articles and discussions on the web about the general topic. Here is one you might like.

Now that you know about analytic functions in general, let us return to look at an example of FIRST using the same table as last time:

SELECT c1, MIN (c3) KEEP (DENSE_RANK FIRST ORDER BY c1) OVER () "Analytic"
  FROM t1
 WHERE c1 <= 50;
 
 QUERY 1

What this query says is: After filtering out rows where C1 is NULL or has a value greater than 50, find the row or rows that have the minimum value of C1 and amongst those rows find the minimum value of C3. This value is used for the “Analytic” column; all rows in the final result set have the same value for the “Analytic” column.

Now let us look at a similar query using FIRST_VALUE:

SELECT c1, FIRST_VALUE (c3) OVER (ORDER BY c1) "Analytic"
  FROM t1
 WHERE c1 <= 50;
 
 QUERY 2
 

What this query says is: After filtering out rows where C1 is NULL or has a value greater than 50, find the row or rows that have the minimum value of C1 and amongst those rows pick an arbitrary row and then select the value of C3. This value is used for the “Analytic” column; all rows in the final result set have the same value for the “Analytic” column.

Let us first assume that the expression used in the ORDER BY clause is unique (as recommended in the documentation of FIRST_VALUE). In this case, the results of QUERY 2 are deterministic and absolutely the same as QUERY 1. Of course, in this case the aggregate function used in conjunction with FIRST could be MIN, MAX, or even AVG. The result is the same as there is only one row from which to pick.

However, if the ORDER BY CLAUSE does not represent a unique key then I would suggest you use the FIRST function in preference to FIRST_VALUE as it provides a way to explicitly deal with any duplicates.

The FIRST function may also provide better performance. Here is the execution plan for QUERY 1:

Here is the execution plan:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    50 |  1300 |     6   (0)| 00:00:01 |
|   1 |  WINDOW BUFFER     |      |    50 |  1300 |     6   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |    50 |  1300 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

As you can see, we don’t need a sort as explained in my last post but we do need to buffer the rows as we don’t know what to put into the “Analytic” column for any row until we process the last row in our result set.

Now let us look at the execution plan for QUERY 2:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    50 |  1300 |     7  (15)| 00:00:01 |
|   1 |  WINDOW SORT       |      |    50 |  1300 |     7  (15)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |    50 |  1300 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

As you can see, with FIRST_VALUE we have a superfluous sort and as a consequence a higher cost.

You can’t always avoid a sort with FIRST. Consider this example:

  SELECT c1
        ,MIN (c3) KEEP (DENSE_RANK FIRST ORDER BY c1) OVER (PARTITION BY c2)
            "Analytic"
    FROM t1
   WHERE c1 <= 50
ORDER BY c2, c1 DESC;
 
 QUERY 3
 

This query is similar to QUERY 1 but this time there is a seperate value of "Analytic" for each value of C2. This PARTITION clause means we have to sort the data by C2. However, the ORDER BY clause requires that at some stage we need to order the rows by "C2, C1 DESC". This single sort is sufficient for both purposes as we can see in the execution plan:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    50 |  1950 |     6   (0)| 00:00:01 |
|   1 |  WINDOW SORT       |      |    50 |  1950 |     6   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |    50 |  1950 |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------

Let us look at the equivalent statement using FIRST_VALUE:

SELECT c1, FIRST_VALUE (c3) OVER (PARTITION BY c2 ORDER BY c1) "Analytic"
    FROM t1
   WHERE c1 <= 50
ORDER BY c2, c1 DESC;

QUERY 4


Here is the execution plan for QUERY 4:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    50 |  1950 |     8  (25)| 00:00:01 |
|   1 |  SORT ORDER BY      |      |    50 |  1950 |     8  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |      |    50 |  1950 |     8  (25)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |    50 |  1950 |     6   (0)| 00:00:01 |
----------------------------------------------------------------------------

Now we see an extra sort. The reason is that the FIRST_VALUE function requires the data sorted by the composite sort key "c2,c1" and so needs to be reordered by "c2,c1 DESC" for the final result.

Of course, all the above observations can be applied to LAST and LAST_VALUE.

So, is there ever a need for FIRST_VALUE and LAST_VALUE? Well consider this query:

SELECT c1
      ,FIRST_VALUE (
          c3)
       OVER (PARTITION BY c2
             ORDER BY c1
             ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
          "Analytic"
  FROM t1
 WHERE c1 <= 50;
 
QUERY 5

This time FIRST_VALUE is being used almost like LAG(C3,3) in that it is picking up the value of C3 "three rows up" so to speak. The difference is that LAG(c3,3) returns NULL for the first three rows and FIRST_VALUE doesn't. There isn't a way to code this expression using FIRST because FIRST doesn't support either an ORDER BY clause or a windowing clause.

Whilst we are on the topic of windowing clauses, the implicit and unchangeable window clause for the FIRST and LAST functions is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, in other words all rows in our partition. For FIRST_VALUE and LAST_VALUE the default but changeable windowing clause is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, in other words we exclude rows after the current one. Dropping rows off the bottom of a list makes no difference when we are looking for the first row in the list (FIRST_VALUE) but it does make a difference when we are looking for the last row in the list (LAST_VALUE) so you will usually need either to specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING explicitly when using LAST_VALUE or just use FIRST_VALUE and reverse the sort order.

In conclusion, I would recommend that you only ever use FIRST_VALUE or LAST_VALUE when you need a windowing function other than ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. When FIRST or LAST are available the performance is never worse and frequently better than the equivalent FIRST_VALUE or LAST_VALUE construct. In addition, FIRST and LAST avoid the possibility of non-deterministic results when non-unique ORDER BY clauses are supplied.

Advertisements

2 Comments »

  1. Tony,

    a little late but I just wanted to thank you. These were two very fine articles on FIRST / LAST and FIRST_VALUE / LAST_VALUE. Very informative and therefore much appreciated!

    Thanks,
    Randolf

    Comment by Randolf Geist — April 5, 2012 @ 9:43 am | Reply

    • Thanks for the feedback. You are welcome!

      Comment by tonyhasler — April 7, 2012 @ 7:29 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: