One of the exciting new features of Oracle 8.1.5 is the summary management capability. Snapshots have been assimilated into 'materialized views', and a 'query rewrite' facility exists that allows Oracle to determine that a query against a large table can be converted into a similar query against a smaller summarising snapshot or 'materialized view', so that when a user types:
select department, year, sum(daily sales) from enormous_daily_sales_tables group by department, year;
Oracle may actually operate the equivalent of:
select department, year, sum(monthly sales) from small_monthly_sales_tables group by department, year;
This article outlines the method using a small, but not entirely trivial, example. It starts with the table created for an earlier article on 2-dimensional partitions partly so that I do not have to include in this article a section on creating an initial 'large' table, and partly to demonstrate that partitioning and materialized views can be mixed.
In the various datawarehouse environments I have designed, I have found that there is sometimes a physical arrangement of data that can make the performance of a critical set of queries significantly faster- basically we can always consider physically sorting the data along one dimension to expedite queries in that dimension - consequently I am not keen on the idea of letting Oracle's built-in 'snapshot refresh' processes take control of rebuilding summary tables, so this example of using materialized views limits itself to referencing a pre-built table. Later articles will cover some of the features of refreshing, and the packages supplied by Oracle to add value and convenience to materialized views and datawarehouse management.
Preparing the groundwork:
My original sample table in the 2-d partitioning example was a daily sales table, with a key of (product, sales_date), in this article I am going to assume that products sum to product_groups, and product_groups sum to product_depts, so we have a hierarchy of: product_dept --- product_group --- product.
I will be demonstrating that query rewrite will work in two different scenarios - first if the heirarchy is denormalised into a single dimensional table, and secondly if the hierarchy is left in a perfectly normalised form. The attached script will create 4 separate tables (with constraints and indices):
product_hierarchy the denormalised hierarchy products the table of products groups the table of product_group, parent to products departments the table of product_dept, parent to product_group
Once you have created these tables, you then need to create a summary table at the product_group level by executing the following:
CREATE TABLE sales_sum
unrecoverable
PARTITION BY RANGE (sale_date)
(
PARTITION p_1999_jan VALUES LESS THAN (TO_DATE('01-Feb-1999', 'dd-mon-yyyy')),
PARTITION p_1999_feb VALUES LESS THAN (TO_DATE('01-Mar-1999', 'dd-mon-yyyy')),
PARTITION p_1999_mar VALUES LESS THAN (TO_DATE('01-Apr-1999', 'dd-mon-yyyy')),
PARTITION p_1999_apr VALUES LESS THAN (TO_DATE('01-May-1999', 'dd-mon-yyyy')),
PARTITION p_1999_may VALUES LESS THAN (maxvalue)
)
as
select
sal.country,
sal.sale_date,
hir.product_group,
sum(sal.qty) qty,
sum(sal.value) value,
store
from
sales sal,
product_hierarchy hir
where
hir.product = sal.product
group by
sal.country,
sal.sale_date,
hir.product_group,
store
;
The Materialized View and a simple Dimension:
At this point, we need to do two things. Tell Oracle that we want to use the table we have just created as a materialized view that may be used with the Query Rewrite facility, and tell Oracle how the product_hierarchy table (or normalised equivalent) may be used when joining to the materialized view. The two different approaches have to be handled separately, so I will take the simpler denormalised case first.
Before starting this section, though, you will need to ensure that you have the privileges:
create dimension create materialized view query rewrite
Then execute the following script to create the materialized view
rem The materialized view name must match the table name. rem The table could have columns not mentioned in this view rem The order of the various clauses after the 'create' IS important. create materialized view sales_sum on prebuilt table -- the table already exists with reduced precision -- allow column precision mismatch never refresh -- we rebuild it ourselves enable query rewrite -- the point of the exercise as select sal.country, sal.sale_date, hir.product_group, sum(sal.qty) qty, sum(sal.value) value, store from sales sal, product_hierarchy hir where hir.product = sal.product group by sal.country, sal.sale_date, hir.product_group, store ;
One odd consequence of this statement (on 8.1.5) is that my schema ended up with two objects named SALES_SUM, one of them a table, the other of type UNDEFINED - presumably a minor error in the view definition that failed to cater for the object type code for MATERIALIZED VIEW.
After the Materialized view, we create the dimension with this script-
rem Databases operate on meaningless ids, but users rem operate on names, so we have to associated things like rem product_dept (meaningless pk) with department_name create dimension product_dim -- first bit lists the levels level product is product_hierarchy.product level prd_group is product_hierarchy.product_group level department is product_hierarchy.product_dept -- second bit describes the relationships in one possible hierarchy hierarchy product_roll_up( product child of prd_group child of department ) -- third bit identifies extra columns in a given level -- the 'attribute' is actually the name of a level attribute product determines (prod_name) attribute prd_group determines (group_name) attribute department determines (dept_name) ;
Then to test the effect of our efforts so far we execute a query against the original (product-only table). However, before we can use query rewrite, we have to set some init.ora parameters (in my case at the session level) to allow it work.
alter session set query_rewrite_enabled = true; alter session set query_rewrite_integrity = trusted;
The first line allows query rewrite to work, the second tells Oracle to assume that a user-maintained view - which Oracle has not been keeping up to date with its own refresh mechanisms itself - will really be okay. If we then execute and trace the following statement we get a pleasantly quick and cheap result -
select hir.group_name, sum(sal.value) value from sales sal, product_hierarchy hir where hir.product = sal.product and hir.product_group = 2 group by hir.group_name ;
in the absence of the materilized view, this query required several thousand logical I/Os, and accessed the tables as listed, but with the query rewrite available, the cost was 121 logical I/Os, and the plan turned into:
SELECT STATEMENT Optimizer=CHOOSE (Cost=43 Card=4 Bytes=332) SORT (GROUP BY) (Cost=43 Card=4 Bytes=332) MERGE JOIN (CARTESIAN) (Cost=43 Card=1465 Bytes=121595) VIEW (Cost=3 Card=5 Bytes=260) SORT (UNIQUE) (Cost=3 Card=5 Bytes=190) TABLE ACCESS (FULL) OF 'PRODUCT_HIERARCHY' (Cost=1 Card=5 Bytes=190) SORT (JOIN) (Cost=43 Card=293 Bytes=9083) PARTITION RANGE (ALL) TABLE ACCESS (FULL) OF 'SALES_SUM' (Cost=8 Card=293 Bytes=9083)
Oracle rewrote the query to find uniquely the list of required product_groups (in this case just the one) and group_names from the product_hierarchy table, then joined this intermediate result to the materialized view to produce the final result.
The Materialized View and a normalized Dimension:
A more complex example appears when we do NOT normalise the hierarchy. There seem to be some limitations in this approach - the examples I got to work required me to generate a more complex materialized view than should have been necessary. (Don't forget to drop the objects from the previous experiment first).
create materialized view sales_sum on prebuilt table -- the table already exists with reduced precision -- allow column precision mismatch never refresh -- we rebuild it ourselves enable query rewrite -- the point of the exercise as select sal.country, sal.sale_date, grp.product_group, sum(sal.qty) qty, sum(sal.value) value, store from sales sal, products prd, groups grp where prd.product = sal.product and grp.product_group = prd.product_group group by sal.country, sal.sale_date, grp.product_group, sal.store ;
The corresponding Dimension is also more complex, including some JOIN KEY clauses:
rem
rem It may appear a little confusing that the JOIN KEY clause is:
rem join key {child table.column list} references {parent level}
rem
create dimension product_dim level product is products.product level prd_group is groups.product_group level department is departments.product_dept hierarchy product_roll_up( product child of prd_group child of department -- join columns have to be in a child table of the reference level join key products.product_group references prd_group join key groups.product_dept references department ) -- listed columns have to be in the table defined in the attribute/level attribute product determines (products.prod_name) attribute prd_group determines (groups.group_name) attribute department determines (departments.dept_name) ;
The sample SQL I ran looked like this:
alter session set query_rewrite_enabled = true; alter session set query_rewrite_integrity = trusted; select dep.dept_name, grp.group_name, sum(sal.value) value from sales sal, products prd, groups grp, departments dep where prd.product = sal.product and grp.product_group = prd.product_group and dep.product_dept = grp.product_dept group by dep.dept_name, grp.group_name ;
The results were very pleasing: without rewrite, the total cost was some 33,000 logical I/Os, 22,000 physicals and took about 49 seconds to complete. with rewrite the query took 180 logicals, 48 physicals, and completed in 4.5 seconds.
My only concern was that I had to include the GROUPS table in the materialized view in order to make this work. I felt that it should have been sufficient to include only the PRODUCTS table, since the dimension declared products.product_group to be equivalent to groups.product_group. I still need to spend more time tracing the internal execution to figure out why this is not working.