Tony’s Oracle Tips

June 11, 2012

Pushing parameters into views a.k.a predicate pushing

Filed under: SQL Fun — tonyhasler @ 12:18 am

It is a very common and oft discussed issue. You have a complex view definition, possibly containing an analytic function, and when you SELECT from the view you can’t get any advantage from a WHERE clause. For an old, and lengthy, discussion of why Oracle “just can’t do it” you can see this article on Ask Tom. Well it turns out that there is something you can do after all. First, let us set the scene. We need a table and associated index with some values:

CREATE TABLE t1
AS
   WITH q1
        AS (    SELECT ROWNUM rn
                  FROM DUAL
            CONNECT BY LEVEL <= 100)
   SELECT MOD (a.rn, 100) grp, a.rn+b.rn fact_int, RPAD ('X', 100) vpad
     FROM q1 a, q1 b;

CREATE INDEX i1
   ON t1 (grp);

We have a table with 10,000 rows with 100 values of GRP. Now let us build our view with the awkward analytic query:

CREATE OR REPLACE VIEW v1
AS
     SELECT grp
           ,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
       FROM t1
   GROUP BY grp;

When we query this view with a specific value for GRP we get no index usage.

SELECT *
  FROM v1
 WHERE grp BETWEEN 50 AND 51;
 
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  9485 |   240K|    49   (3)| 00:00:01 |
|*  1 |  VIEW                | V1   |  9485 |   240K|    49   (3)| 00:00:01 |
|   2 |   WINDOW BUFFER      |      |  9485 |   240K|    49   (3)| 00:00:01 |
|   3 |    SORT GROUP BY     |      |  9485 |   240K|    49   (3)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |  9485 |   240K|    48   (0)| 00:00:01 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("GRP">=50 AND "GRP"<=51)

Note how our predicate is applied at id 1, after all 10,000 rows have been aggregated.
Wouldn’t it be nice if somehow we could convince the optimizer to transform our query into this:

WITH q1
     AS (  SELECT grp
                 ,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
             FROM t1
            WHERE grp BETWEEN 48 AND 51
         GROUP BY grp)
SELECT *
  FROM q1
 WHERE grp BETWEEN 50 AND 51;

For the moving average calculation we actually only need to look at the rows with values within the range being aggregated. Well, it turns out that you can rewrite your view to allow this providing you have a convenient way to identify the complete set of legal values for the predicated column (grp in our case). If the table being queried is a fact table in a data warehouse you may already have a suitable dimension table you can use. In our case we will build one:

CREATE TABLE grp_table (grp PRIMARY KEY)
ORGANIZATION INDEX
AS
   SELECT DISTINCT grp FROM t1;

I have made this an IOT given that (in my example) it has only one column and we plan to use it exclusively for unique primary key lookups. A heap dimension table with several columns is fine if it is reasonably small.
We also need to define a couple of types to reflect the result of our original query:

CREATE OR REPLACE TYPE pair_num AS OBJECT (n1 NUMBER, n2 NUMBER);
CREATE OR REPLACE TYPE pair_num_t AS TABLE OF pair_num;

In our case, our desired query returns two numeric values. We define our types generically so that we can reuse them for other views with similar result sets. Now redefine our view as follows:

CREATE OR REPLACE VIEW v1 (startgrp, endgrp, grp, mv_avg)
AS
   WITH q1 AS (SELECT grp startgrp FROM grp_table)
       ,q2 AS (SELECT grp endgrp FROM grp_table)
     SELECT startgrp, endgrp, n1 AS grp, n2 AS mv_avg
     FROM q1
         ,q2
         ,TABLE (CAST (MULTISET (  SELECT grp
                                         ,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
                                     FROM t1
                                    WHERE grp BETWEEN startgrp-2 AND endgrp
                                 GROUP BY grp) AS pair_num_t)) pairs
    WHERE n1 >= startgrp;

This view now contains two extra columns STARTGRP and ENDGRP that can be used in our WHERE clause to identify the subset of rows that we want. The new query and associated execution plan are as follows:

SELECT grp,mv_avg
  FROM v1
 WHERE startgrp = 50 AND endgrp = 51;
 
---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |    20 |   560 |    30   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                       |                   |    20 |   560 |    30   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                      |                   |     1 |    26 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN                | SYS_IOT_TOP_84132 |     1 |    13 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN                | SYS_IOT_TOP_84132 |     1 |    13 |     0   (0)| 00:00:01 |
|*  5 |   COLLECTION ITERATOR SUBQUERY FETCH|                   |    20 |    40 |    29   (0)| 00:00:01 |
|   6 |    VIEW                             |                   |    24 |   624 |     3   (0)| 00:00:01 |
|   7 |     WINDOW BUFFER                   |                   |    24 |   624 |     3   (0)| 00:00:01 |
|   8 |      SORT GROUP BY NOSORT           |                   |    24 |   624 |     3   (0)| 00:00:01 |
|*  9 |       FILTER                        |                   |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID  | T1                |    24 |   624 |     3   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN            | I1                |    43 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("GRP"=51)
   4 - access("GRP"=50)
   5 - filter(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)>=50 AND "GRP"<=SYS_OP_ATG(VALUE(KOKBF$),1,2,2))
   9 - filter(:B1-2<=:B2)
  11 - access("GRP">=:B1-2 AND "GRP"<=:B2)

Our new query has a small unnecessary overhead where we "look up" the STARTDATE and ENDDATE in our dimension table but on the other hand we have been able to use information about our predicate to restrict the rows aggregated to the point that we can take advantage of our index.
This technique of passing information from one rows source in a join to another is known as a lateral join and is available with the TABLE and XMLTABLE operators.
Note that we will get very undesirable results if a user does not specify a WHERE clause as now we will get results matching every combination of STARTDATE and ENDDATE where ENDDATE >= STARTDATE. If you want to protect against this you can write a pipelined function.

CREATE OR REPLACE VIEW v1 (startgrp, endgrp, grp, mv_avg)
AS
   WITH q1
        AS (  SELECT /*+ no_merge no_eliminate_oby */
                    grp startgrp
                FROM grp_table
            ORDER BY startgrp DESC)
       ,q2
        AS (  SELECT /*+ no_merge no_eliminate_oby */
                    grp endgrp
                FROM grp_table
            ORDER BY endgrp)
   SELECT /*+ leading(q1) use_nl(q2) */
         startgrp, endgrp, n1 AS grp, n2 AS mv_avg
     FROM q1, q2
         ,TABLE (my_package.my_pipelined_function (startgrp, endgrp)) pairs;

Note the order by clauses with associated hints will cause the pipelined function to be first invoked with a value of ENDDATE less than STARTDATE if no predicate is supplied. The package containing the piplelined function might look like this:

CREATE OR REPLACE PACKAGE my_package
AS
   FUNCTION my_pipelined_function (startgrp NUMBER, endgrp NUMBER)
      RETURN pair_num_t
      PIPELINED;
END my_package;

CREATE OR REPLACE PACKAGE BODY my_package
AS
   FUNCTION my_pipelined_function (startgrp NUMBER, endgrp NUMBER)
      RETURN pair_num_t
      PIPELINED
   IS
      CURSOR c1
      IS
         WITH q1
              AS (  SELECT grp
                          ,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
                      FROM t1
                     WHERE grp BETWEEN startgrp - 2 AND endgrp
                  GROUP BY grp)
         SELECT *
           FROM q1
          WHERE grp BETWEEN startgrp AND endgrp;
   BEGIN
      IF startgrp > endgrp
      THEN
         raise_application_error (-20000, 'STARTGRP must be specified and be less than ENDGRP');
      END IF;

      FOR r IN c1
      LOOP
         PIPE ROW (pair_num (r.grp, r.mv_avg));
      END LOOP;
   END my_pipelined_function;
END my_package;
Advertisements

3 Comments »

  1. This is so complicated! Wouldn’t it be easier to just not use the view in the first place?

    Comment by Brad — January 16, 2013 @ 10:37 pm | Reply

    • Brad,

      Yes, you could create a query that accesed the underlying tables using factored subqueries and avoding the view (as in fact I showed in the query following the line “Wouldn’t it be nice if somehow we could convince the optimizer to transform our query into this”). However, this would have to be done for each query; the purpose of any view is to hide the complexity from the caller. In my case, the callers didn’t have the technical ability to code it themselves and in any event they were accessing the view using a data visualisation tool that only supported the most basic of SQL constructs.

      Comment by tonyhasler — January 16, 2013 @ 11:34 pm | Reply

  2. Use context variables instead. Much easier!

    Comment by Martin Rose — June 9, 2015 @ 11:18 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: