It is a very common and oft discussed issue. You have a complex view definition, possibly containing an analytic function, and when you SELECT from the view you can’t get any advantage from a WHERE clause. For an old, and lengthy, discussion of why Oracle “just can’t do it” you can see this article on Ask Tom. Well it turns out that there is something you can do after all. First, let us set the scene. We need a table and associated index with some values:
CREATE TABLE t1
AS
WITH q1
AS ( SELECT ROWNUM rn
FROM DUAL
CONNECT BY LEVEL <= 100)
SELECT MOD (a.rn, 100) grp, a.rn+b.rn fact_int, RPAD ('X', 100) vpad
FROM q1 a, q1 b;
CREATE INDEX i1
ON t1 (grp);
We have a table with 10,000 rows with 100 values of GRP. Now let us build our view with the awkward analytic query:
CREATE OR REPLACE VIEW v1
AS
SELECT grp
,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
FROM t1
GROUP BY grp;
When we query this view with a specific value for GRP we get no index usage.
SELECT *
FROM v1
WHERE grp BETWEEN 50 AND 51;
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9485 | 240K| 49 (3)| 00:00:01 |
|* 1 | VIEW | V1 | 9485 | 240K| 49 (3)| 00:00:01 |
| 2 | WINDOW BUFFER | | 9485 | 240K| 49 (3)| 00:00:01 |
| 3 | SORT GROUP BY | | 9485 | 240K| 49 (3)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T1 | 9485 | 240K| 48 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("GRP">=50 AND "GRP"<=51)
Note how our predicate is applied at id 1, after all 10,000 rows have been aggregated.
Wouldn’t it be nice if somehow we could convince the optimizer to transform our query into this:
WITH q1
AS ( SELECT grp
,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
FROM t1
WHERE grp BETWEEN 48 AND 51
GROUP BY grp)
SELECT *
FROM q1
WHERE grp BETWEEN 50 AND 51;
For the moving average calculation we actually only need to look at the rows with values within the range being aggregated. Well, it turns out that you can rewrite your view to allow this providing you have a convenient way to identify the complete set of legal values for the predicated column (grp in our case). If the table being queried is a fact table in a data warehouse you may already have a suitable dimension table you can use. In our case we will build one:
CREATE TABLE grp_table (grp PRIMARY KEY)
ORGANIZATION INDEX
AS
SELECT DISTINCT grp FROM t1;
I have made this an IOT given that (in my example) it has only one column and we plan to use it exclusively for unique primary key lookups. A heap dimension table with several columns is fine if it is reasonably small.
We also need to define a couple of types to reflect the result of our original query:
CREATE OR REPLACE TYPE pair_num AS OBJECT (n1 NUMBER, n2 NUMBER);
CREATE OR REPLACE TYPE pair_num_t AS TABLE OF pair_num;
In our case, our desired query returns two numeric values. We define our types generically so that we can reuse them for other views with similar result sets. Now redefine our view as follows:
CREATE OR REPLACE VIEW v1 (startgrp, endgrp, grp, mv_avg)
AS
WITH q1 AS (SELECT grp startgrp FROM grp_table)
,q2 AS (SELECT grp endgrp FROM grp_table)
SELECT startgrp, endgrp, n1 AS grp, n2 AS mv_avg
FROM q1
,q2
,TABLE (CAST (MULTISET ( SELECT grp
,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
FROM t1
WHERE grp BETWEEN startgrp-2 AND endgrp
GROUP BY grp) AS pair_num_t)) pairs
WHERE n1 >= startgrp;
This view now contains two extra columns STARTGRP and ENDGRP that can be used in our WHERE clause to identify the subset of rows that we want. The new query and associated execution plan are as follows:
SELECT grp,mv_avg
FROM v1
WHERE startgrp = 50 AND endgrp = 51;
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 560 | 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 20 | 560 | 30 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 26 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | SYS_IOT_TOP_84132 | 1 | 13 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_IOT_TOP_84132 | 1 | 13 | 0 (0)| 00:00:01 |
|* 5 | COLLECTION ITERATOR SUBQUERY FETCH| | 20 | 40 | 29 (0)| 00:00:01 |
| 6 | VIEW | | 24 | 624 | 3 (0)| 00:00:01 |
| 7 | WINDOW BUFFER | | 24 | 624 | 3 (0)| 00:00:01 |
| 8 | SORT GROUP BY NOSORT | | 24 | 624 | 3 (0)| 00:00:01 |
|* 9 | FILTER | | | | | |
| 10 | TABLE ACCESS BY INDEX ROWID | T1 | 24 | 624 | 3 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | I1 | 43 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("GRP"=51)
4 - access("GRP"=50)
5 - filter(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)>=50 AND "GRP"<=SYS_OP_ATG(VALUE(KOKBF$),1,2,2))
9 - filter(:B1-2<=:B2)
11 - access("GRP">=:B1-2 AND "GRP"<=:B2)
Our new query has a small unnecessary overhead where we "look up" the STARTDATE and ENDDATE in our dimension table but on the other hand we have been able to use information about our predicate to restrict the rows aggregated to the point that we can take advantage of our index.
This technique of passing information from one rows source in a join to another is known as a lateral join and is available with the TABLE and XMLTABLE operators.
Note that we will get very undesirable results if a user does not specify a WHERE clause as now we will get results matching every combination of STARTDATE and ENDDATE where ENDDATE >= STARTDATE. If you want to protect against this you can write a pipelined function.
CREATE OR REPLACE VIEW v1 (startgrp, endgrp, grp, mv_avg)
AS
WITH q1
AS ( SELECT /*+ no_merge no_eliminate_oby */
grp startgrp
FROM grp_table
ORDER BY startgrp DESC)
,q2
AS ( SELECT /*+ no_merge no_eliminate_oby */
grp endgrp
FROM grp_table
ORDER BY endgrp)
SELECT /*+ leading(q1) use_nl(q2) */
startgrp, endgrp, n1 AS grp, n2 AS mv_avg
FROM q1, q2
,TABLE (my_package.my_pipelined_function (startgrp, endgrp)) pairs;
Note the order by clauses with associated hints will cause the pipelined function to be first invoked with a value of ENDDATE less than STARTDATE if no predicate is supplied. The package containing the piplelined function might look like this:
CREATE OR REPLACE PACKAGE my_package
AS
FUNCTION my_pipelined_function (startgrp NUMBER, endgrp NUMBER)
RETURN pair_num_t
PIPELINED;
END my_package;
CREATE OR REPLACE PACKAGE BODY my_package
AS
FUNCTION my_pipelined_function (startgrp NUMBER, endgrp NUMBER)
RETURN pair_num_t
PIPELINED
IS
CURSOR c1
IS
WITH q1
AS ( SELECT grp
,AVG (SUM (fact_int)) OVER (ORDER BY grp RANGE 2 PRECEDING) mv_avg
FROM t1
WHERE grp BETWEEN startgrp - 2 AND endgrp
GROUP BY grp)
SELECT *
FROM q1
WHERE grp BETWEEN startgrp AND endgrp;
BEGIN
IF startgrp > endgrp
THEN
raise_application_error (-20000, 'STARTGRP must be specified and be less than ENDGRP');
END IF;
FOR r IN c1
LOOP
PIPE ROW (pair_num (r.grp, r.mv_avg));
END LOOP;
END my_pipelined_function;
END my_package;

This is so complicated! Wouldn’t it be easier to just not use the view in the first place?
Comment by Brad — January 16, 2013 @ 10:37 pm |
Brad,
Yes, you could create a query that accesed the underlying tables using factored subqueries and avoding the view (as in fact I showed in the query following the line “Wouldn’t it be nice if somehow we could convince the optimizer to transform our query into this”). However, this would have to be done for each query; the purpose of any view is to hide the complexity from the caller. In my case, the callers didn’t have the technical ability to code it themselves and in any event they were accessing the view using a data visualisation tool that only supported the most basic of SQL constructs.
Comment by tonyhasler — January 16, 2013 @ 11:34 pm |