The Jonathan Lewis Pages

Dimensions and summary management in Oracle 8.1.5


One of the exciting new features of Oracle 8.1.5 is the summary management capability. Snapshots have been assimilated into 'materialized views', and a 'query rewrite' facility exists that allows Oracle to determine that a query against a large table can be converted into a similar query against a smaller summarising snapshot or 'materialized view', so that when a user types:

	select department, year, sum(daily sales) 
	from enormous_daily_sales_tables 
	group by department, year;

Oracle may actually operate the equivalent of:

	select department, year, sum(monthly sales) 
	from small_monthly_sales_tables 
	group by department, year;

This article outlines the method using a small, but not entirely trivial, example. It starts with the table created for an earlier article on 2-dimensional partitions partly so that I do not have to include in this article a section on creating an initial 'large' table, and partly to demonstrate that partitioning and materialized views can be mixed.

In the various datawarehouse environments I have designed, I have found that there is sometimes a physical arrangement of data that can make the performance of a critical set of queries significantly faster- basically we can always consider physically sorting the data along one dimension to expedite queries in that dimension - consequently I am not keen on the idea of letting Oracle's built-in 'snapshot refresh' processes take control of rebuilding summary tables, so this example of using materialized views limits itself to referencing a pre-built table. Later articles will cover some of the features of refreshing, and the packages supplied by Oracle to add value and convenience to materialized views and datawarehouse management.

Preparing the groundwork:

My original sample table in the 2-d partitioning example was a daily sales table, with a key of (product, sales_date), in this article I am going to assume that products sum to product_groups, and product_groups sum to product_depts, so we have a hierarchy of: product_dept --- product_group --- product.

I will be demonstrating that query rewrite will work in two different scenarios - first if the heirarchy is denormalised into a single dimensional table, and secondly if the hierarchy is left in a perfectly normalised form. The attached script will create 4 separate tables (with constraints and indices):

	product_hierarchy	the denormalised hierarchy
	products		the table of products
	groups			the table of product_group, parent to products
	departments		the table of product_dept, parent to product_group

Once you have created these tables, you then need to create a summary table at the product_group level by executing the following:

	CREATE TABLE sales_sum
	unrecoverable
	PARTITION BY RANGE (sale_date)
	(
		PARTITION p_1999_jan VALUES LESS THAN (TO_DATE('01-Feb-1999', 'dd-mon-yyyy')),
		PARTITION p_1999_feb VALUES LESS THAN (TO_DATE('01-Mar-1999', 'dd-mon-yyyy')),
		PARTITION p_1999_mar VALUES LESS THAN (TO_DATE('01-Apr-1999', 'dd-mon-yyyy')),
		PARTITION p_1999_apr VALUES LESS THAN (TO_DATE('01-May-1999', 'dd-mon-yyyy')),
		PARTITION p_1999_may VALUES LESS THAN (maxvalue)
	)
	as
	select
		sal.country,
		sal.sale_date,
		hir.product_group,
		sum(sal.qty)		qty,
		sum(sal.value)		value,
		store
	from
		sales			sal,
		product_hierarchy	hir
	where
		hir.product = sal.product
	group by
		sal.country,
		sal.sale_date,
		hir.product_group,
		store
	;

The Materialized View and a simple Dimension:

At this point, we need to do two things. Tell Oracle that we want to use the table we have just created as a materialized view that may be used with the Query Rewrite facility, and tell Oracle how the product_hierarchy table (or normalised equivalent) may be used when joining to the materialized view. The two different approaches have to be handled separately, so I will take the simpler denormalised case first.

Before starting this section, though, you will need to ensure that you have the privileges:

	create dimension
	create materialized view
	query rewrite

Then execute the following script to create the materialized view

	rem	The materialized view name must match the table name.
	rem	The table could have columns not mentioned in this view
	rem	The order of the various clauses after the 'create' IS important.
	
	create materialized view sales_sum
	on prebuilt table 		-- the table already exists
	with reduced precision		-- allow column precision mismatch
	never refresh			-- we rebuild it ourselves
	enable query rewrite		-- the point of the exercise 
	as
	select
		sal.country,
		sal.sale_date,
		hir.product_group,
		sum(sal.qty)	qty,
		sum(sal.value)	value,
		store
	from
		sales			sal,
		product_hierarchy	hir
	where
		hir.product = sal.product
	group by
		sal.country,
		sal.sale_date,
		hir.product_group,
		store
	;

One odd consequence of this statement (on 8.1.5) is that my schema ended up with two objects named SALES_SUM, one of them a table, the other of type UNDEFINED - presumably a minor error in the view definition that failed to cater for the object type code for MATERIALIZED VIEW.

After the Materialized view, we create the dimension with this script-

	rem	Databases operate on meaningless ids, but users
	rem	operate on names, so we have to associated things like
	rem	product_dept (meaningless pk) with department_name
	
	create dimension product_dim 
		--  first bit lists the levels
		level	product		is product_hierarchy.product
		level	prd_group	is product_hierarchy.product_group
		level	department	is product_hierarchy.product_dept
		-- second bit describes the relationships in one possible hierarchy
		hierarchy product_roll_up(
			product 	child of
			prd_group	child of
			department
		)
		-- third bit identifies extra columns in a given level
		-- the 'attribute' is actually the name of a level
		attribute product determines (prod_name)
		attribute prd_group determines (group_name)
		attribute department determines (dept_name)
	;

Then to test the effect of our efforts so far we execute a query against the original (product-only table). However, before we can use query rewrite, we have to set some init.ora parameters (in my case at the session level) to allow it work.

	alter session set query_rewrite_enabled = true;
	alter session set query_rewrite_integrity = trusted;

The first line allows query rewrite to work, the second tells Oracle to assume that a user-maintained view - which Oracle has not been keeping up to date with its own refresh mechanisms itself - will really be okay. If we then execute and trace the following statement we get a pleasantly quick and cheap result -

	select 
		hir.group_name,
		sum(sal.value)	value
	from
		sales			sal,
		product_hierarchy	hir
	where
		hir.product = sal.product
	and	hir.product_group = 2
	group by
		hir.group_name
	;

in the absence of the materilized view, this query required several thousand logical I/Os, and accessed the tables as listed, but with the query rewrite available, the cost was 121 logical I/Os, and the plan turned into:

	SELECT STATEMENT Optimizer=CHOOSE (Cost=43 Card=4 Bytes=332)
	  SORT (GROUP BY) (Cost=43 Card=4 Bytes=332)
	    MERGE JOIN (CARTESIAN) (Cost=43 Card=1465 Bytes=121595)
	      VIEW (Cost=3 Card=5 Bytes=260)
	        SORT (UNIQUE) (Cost=3 Card=5 Bytes=190)
	          TABLE ACCESS (FULL) OF 'PRODUCT_HIERARCHY' (Cost=1 Card=5 Bytes=190)
	      SORT (JOIN) (Cost=43 Card=293 Bytes=9083)
	       	PARTITION RANGE (ALL)
	          TABLE ACCESS (FULL) OF 'SALES_SUM' (Cost=8 Card=293 Bytes=9083)

Oracle rewrote the query to find uniquely the list of required product_groups (in this case just the one) and group_names from the product_hierarchy table, then joined this intermediate result to the materialized view to produce the final result.

The Materialized View and a normalized Dimension:

A more complex example appears when we do NOT normalise the hierarchy. There seem to be some limitations in this approach - the examples I got to work required me to generate a more complex materialized view than should have been necessary. (Don't forget to drop the objects from the previous experiment first).

	create materialized view sales_sum
	on prebuilt table 		-- the table already exists
	with reduced precision		-- allow column precision mismatch
	never refresh			-- we rebuild it ourselves
	enable query rewrite		-- the point of the exercise 
	as
	select
		sal.country,
		sal.sale_date,
		grp.product_group,
		sum(sal.qty)	qty,
		sum(sal.value)	value,
		store
	from
		sales			sal,
		products		prd,
		groups			grp
	where
		prd.product = sal.product
	and	grp.product_group = prd.product_group
	group by
		sal.country,
		sal.sale_date,
		grp.product_group,
		sal.store
	;

The corresponding Dimension is also more complex, including some JOIN KEY clauses:

rem
rem	It may appear a little confusing that the JOIN KEY clause is:
rem		join key {child table.column list} references {parent level}
rem
create dimension product_dim
	level	product		is products.product
	level	prd_group	is groups.product_group
	level	department	is departments.product_dept
	hierarchy product_roll_up(
		product 	child of
		prd_group	child of
		department
		--	join columns have to be in a child table of the reference level
		join key products.product_group references prd_group
		join key groups.product_dept references department
	)
	-- 	listed columns have to be in the table defined in the attribute/level
	attribute product determines (products.prod_name)
	attribute prd_group determines (groups.group_name)
	attribute department determines (departments.dept_name)
;

The sample SQL I ran looked like this:

	alter session set query_rewrite_enabled = true;
	alter session set query_rewrite_integrity = trusted;
	
	select  
		dep.dept_name,
		grp.group_name,
		sum(sal.value)	value
	from
		sales			sal,
		products		prd,
		groups			grp,
		departments		dep
	where
		prd.product = sal.product
	and	grp.product_group = prd.product_group
	and	dep.product_dept = grp.product_dept
	group by
		dep.dept_name,
		grp.group_name
	;

The results were very pleasing: without rewrite, the total cost was some 33,000 logical I/Os, 22,000 physicals and took about 49 seconds to complete. with rewrite the query took 180 logicals, 48 physicals, and completed in 4.5 seconds.

My only concern was that I had to include the GROUPS table in the materialized view in order to make this work. I felt that it should have been sufficient to include only the PRODUCTS table, since the dimension declared products.product_group to be equivalent to groups.product_group. I still need to spend more time tracing the internal execution to figure out why this is not working.


Back to Main Index of Topics