What is Materialized View?
Before understanding this, let's understand what the Standard View is?
People typically use standard views as a tool that helps organize the logical objects and queries in a database. A standard view computes its data each time when the view is used. Now imagine a scenario where your BI report pulls data from a view connected to multiple tables, with lots of joins and aggregate functions, causing more compute-heavy operations such as shuffles and joins in query execution.
Eventually, these queries will take longer time to complete, particularly on large tables. Now, each time your BI model refreshes, it has to run those same complex queries again, which impacts the performance of your BI report, too.
Here comes Materialized Views; A materialized view pre-computes, stores, and maintains its data in dedicated SQL pool just like a table. Recomputation isn't needed each time when a materialized view is used. So, queries that use all or a subset of the data in materialized views can gain faster performance. In above scenario, Users can create materialized view, so there's no recomputation required when this data is needed by queries of BI, allowing lower compute cost and faster query response.
Comparison | View | Materialized View |
---|---|---|
View definition | Stored in Azure data warehouse. | Stored in Azure data warehouse. |
View content | Generated each time when the view is used. | Pre-processed and stored in Azure data warehouse during view creation. Updated as data is added to the underlying tables. |
Data refresh | Always updated | Always updated |
Speed to retrieve view data from complex queries | Slow | Fast |
Extra storage | No | Yes |
Syntax | CREATE VIEW | CREATE MATERIALIZED VIEW AS SELECT |
Benefits of using Materialized Views
-
Reduced execution time for complex queries with JOINs and aggregate functions. The more complex the query, the higher the potential for execution-time saving. The most benefit is gained when a query's computation cost is high, and the resulting data set is small.
- The data distribution is specified at the table creation time and remains unchanged until the table is dropped. Materialized view being a virtual table on disk supports Hash and Round_Robin data distributions. Users can choose a data distribution that's different from the base tables but optimal for the performance of queries that frequently use the views.
Additional benefits of materialized views implemented in dedicated SQL pool compared to other data warehouse providers:
- Automatic and synchronous data refresh with data changes in base tables. No user action is required. A Materialized View stores data in two places, a Clustered Columnstore Index for the initial data at the view creation time, and a Delta Store for the incremental data changes. All data changes from the base tables are automatically added to the Delta Store in a synchronous manner. A background process (Tuple Mover) periodically moves the data from the Delta Store to the view's Columnstore Index. This design allows querying Materialized Views to return the same data as directly querying the base tables.
- Broad aggregate function support. See CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL).
- Support for query-specific materialized view recommendation. See EXPLAIN (Transact-SQL).
Some Key Points
- The query optimizer in dedicated SQL pool can automatically use deployed materialized views to improve query execution plans. This process is transparent to users providing faster query performance.
-
The data in a materialized view can be distributed differently from the base tables.
Steps for Implementing Materialized Views:
Step 1 -> Check the estimated execution plan for the query. Use SSMS tool to check.
SELECT storekey, sum(salesquantity*unitprice) as totalsales, count(*) as ordercount
FROM [cso].[FactOnlineSales]
GROUP BY storekey
ORDER BY storekey
You see, most of the costs, in this execution plan is spent with this shuffle. Shuffle is basically the data movement between distributed nodes. Whenever we querying, we wanna minimize the data movement. Also the query is getting the 12627600 rows from that table.
Step 2 -> Run EXPLAIN WITH_RECOMMENDATIONS query which is built inside of Synapse Dedicated SQL Pool. It returns the query plan with recommendations to optimize the SQL statement performance.
EXPLAIN WITH_RECOMMENDATIONS
SELECT storekey, sum(salesquantity*unitprice) as totalsales, count(*) as ordercount
FROM [cso].[FactOnlineSales]
GROUP BY storekey
ORDER BY storekey
It returns an XML document which recommends some materialized view candidates.
<?xml version="1.0" encoding="utf-8"?>
<dsql_query number_nodes="1" number_distributions="60" number_distributions_per_node="60">
<sql>SELECT storekey, sum(salesquantity*unitprice) as totalsales, count(*) as ordercount FROM [cso].[FactOnlineSales] GROUP BY storekey ORDER BY storekey</sql>
<materialized_view_candidates>
<materialized_view_candidates with_constants="False">
CREATE MATERIALIZED VIEW View1
WITH (
DISTRIBUTION = HASH([Expr0])
) AS
SELECT [prac_dedicated].[cso].[FactOnlineSales].[StoreKey] AS [Expr0],
SUM(CONVERT(money,[prac_dedicated].[cso].[FactOnlineSales].[SalesQuantity],0)*[prac_dedicated].[cso].[FactOnlineSales].[UnitPrice]) AS [Expr1],
COUNT_BIG(CONVERT(money,[prac_dedicated].[cso].[FactOnlineSales].[SalesQuantity],0)*[prac_dedicated].[cso].[FactOnlineSales].[UnitPrice]) AS [Expr3],
COUNT(*) AS [Expr2]
FROM [cso].[FactOnlineSales]
GROUP BY [prac_dedicated].[cso].[FactOnlineSales].[StoreKey]
</materialized_view_candidates>
</materialized_view_candidates>
<dsql_operations total_cost="0.0048" total_number_operations="5">
<dsql_operation operation_type="RND_ID">
<identifier>TEMP_ID_3</identifier>
</dsql_operation>
<dsql_operation operation_type="ON">
<location permanent="false" distribution="AllDistributions" />
<sql_operations>
<sql_operation type="statement">CREATE TABLE [qtabledb].[dbo].[TEMP_ID_3] ([StoreKey] INT NOT NULL, [col] MONEY, [col1] BIGINT ) WITH(DISTRIBUTED_MOVE_FILE='');</sql_operation>
</sql_operations>
</dsql_operation>
<dsql_operation operation_type="SHUFFLE_MOVE">
<operation_cost cost="0.0048" accumulative_cost="0.0048" average_rowsize="20" output_rows="3" GroupNumber="30" />
<source_statement>SELECT [T1_1].[StoreKey] AS [StoreKey], [T1_1].[col] AS [col], [T1_1].[col1] AS [col1] FROM (SELECT SUM([T2_1].[col]) AS [col], COUNT_BIG(CAST ((0) AS INT)) AS [col1], [T2_1].[StoreKey] AS [StoreKey] FROM (SELECT (CONVERT (MONEY, [T3_1].[SalesQuantity], 0) * [T3_1].[UnitPrice]) AS [col], [T3_1].[StoreKey] AS [StoreKey] FROM [prac_dedicated].[cso].[FactOnlineSales] AS T3_1) AS T2_1 GROUP BY [T2_1].[StoreKey]) AS T1_1 OPTION (MAXDOP 1, MIN_GRANT_PERCENT = [MIN_GRANT], DISTRIBUTED_MOVE(N''))</source_statement>
<destination_table>[TEMP_ID_3]</destination_table>
<shuffle_columns>StoreKey;</shuffle_columns>
</dsql_operation>
<dsql_operation operation_type="RETURN">
<location distribution="AllDistributions" />
<select>SELECT [T1_1].[StoreKey] AS [StoreKey], [T1_1].[col1] AS [col], [T1_1].[col] AS [col1] FROM (SELECT CONVERT (INT, [T2_1].[col], 0) AS [col], [T2_1].[StoreKey] AS [StoreKey], [T2_1].[col1] AS [col1] FROM (SELECT ISNULL([T3_1].[col1], CONVERT (BIGINT, 0, 0)) AS [col], [T3_1].[StoreKey] AS [StoreKey], [T3_1].[col] AS [col1] FROM (SELECT SUM([T4_1].[col]) AS [col], SUM([T4_1].[col1]) AS [col1], [T4_1].[StoreKey] AS [StoreKey] FROM [qtabledb].[dbo].[TEMP_ID_3] AS T4_1 GROUP BY [T4_1].[StoreKey]) AS T3_1) AS T2_1) AS T1_1 ORDER BY [T1_1].[StoreKey] ASC OPTION (MAXDOP 1, MIN_GRANT_PERCENT = [MIN_GRANT])</select>
</dsql_operation>
<dsql_operation operation_type="ON">
<location permanent="false" distribution="AllDistributions" />
<sql_operations>
<sql_operation type="statement">DROP TABLE [qtabledb].[dbo].[TEMP_ID_3]</sql_operation>
</sql_operations>
</dsql_operation>
</dsql_operations>
</dsql_query>
Step 3 -> Create Materialized View.
CREATE MATERIALIZED VIEW mvw_salesreport
WITH (
DISTRIBUTION = HASH(StoreKey)
) AS
SELECT StoreKey, sum(salesquantity*unitprice) as totalsales, count_big(salesquantity*unitprice) as cnt_null_exp, COUNT(*) AS ordercount
FROM [cso].[FactOnlineSales]
GROUP BY StoreKey
Now again if you check the execution plan for Step 1 query:
there is no shuffling occurs and instead of scanning whole table, the same query now gets only 3 rows because of this mvw_salesreport materialized view.
The database is intelligent enough to detect a supporting materialized view and uses it to improve the query performance. So, without changing the query we can get better performance.
Note before creating Materialized Views
- For each materialized view, there's a data storage cost and a cost for maintaining the view. As data changes in the base tables, the size of the materialized view increases and its physical structure also changes.
Each materialized view is maintained separately by the data warehouse engine, including moving rows from Delta Store to the Columnstore Index segments and consolidating data changes. The maintenance workload climbs higher when the number of materialized views and base table changes increase.
-- Run this query to list materialized views in a database
SELECT V.name as materialized_view, V.object_id
FROM sys.views V
JOIN sys.indexes I
ON V.object_id= I.object_id AND I.index_id < 2;
- A materialized view is stored in the data warehouse just like a table with Clustered Columnstore Index (CCI). Reading data from a materialized view includes scanning the index and applying changes from the Delta Store. When the number of rows in the Delta Store is too high, resolving a query from a materialized view can take longer than directly querying the base tables.
To avoid query performance degradation, it's a good practice to run DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD to monitor the view's overhead_ratio (total_rows / base_view_row). If the overhead_ratio is too high, consider rebuilding the materialized view so all rows in the delta store are moved to the columnstore index.
Use this query to rebuild a materialized view
ALTER MATERIALIZED VIEW <mv_name> REBUILD
Updated Dec 14, 2024
Version 2.0ayush9892
Brass Contributor
Joined August 03, 2023
Educator Developer Blog
Follow this blog board to get notified when there's new activity