Improving cube processing time

As your SSAS cube gets bigger, cube processing time will become a problem. This is especially true as more and more companies want cube processing during the day instead of the usual off-hours time when no one is using the cube. Partitioning the cube can help to reduce the processing time. So can using a different processing strategy than a Process Full.

The biggest benefit of partitioning is that it allows you to process multiple partitions in parallel on a server that has multiple processors. This can greatly reduce the total cube processing time. Note that partitioning requires the enterprise version of SQL Server 2008 (view version differences).

Regarding the best possible processing strategy, I suggest the following steps:

1. Process Update all the dimensions that could have had data changed. Depending on the nature of the changes in the dimension table, Process Update can affect dependent partitions. If only new members were added, then the partitions are not affected. But if members were deleted or if member relationships changed (e.g., a Customer moved from Redmond to Seattle), then some of the aggregation data and bitmap indexes on the partitions are dropped. The cube is still available for queries, albeit with lower performance (with Process Update flexible aggregations and indexes on related partitions will be dropped). This means after a Process Update you need to do a Process Index on the partitions (see step #3).

2. Process Data the partitions that have changed data (which are usually the most recent partitions). Of course, the smaller the partition, the better, so try to use daily partitions instead of monthly or use monthly partitions instead of yearly. Note that Process Data processes data only without building aggregations or indexes.

3. Process Index for the rest of the partitions (the partitions that have not changed). Note instead of doing step #2 and step #3 separately you could just do a Process Full (which does a Process Data and Process Index behind the scenes). However, the best practice is to do a Process Data and Process Index separately instead of a Process Full, because: it is a bit faster, it reduces the stress on the server, it makes data available to end-users sooner (while a Process Index is happening, users can still query cube), and you can just run Process Index if Process Data completes but Process Index bombs. As another option, instead of doing a Process Index, you can do a Process Default, which will evaluate the state of all the partitions: for one’s that had a Process Full it will ignore them; for one’s that did not process, it may or may not touch them (it will if they were part of a dimension and the aggregations were dropped, in which case it will rebuild those aggregations and indexes via a Process Index but it won’t reprocess the data); for one’s that had a Process Data it will build out the aggregations and indexes (Process Index).

There is one more option to consider, a Process Incremental, which can be used in place of all the above steps. But you can only do a Process Incremental if there are new records for the partition (no updates or deletes), as a Process Incremental never deletes or updates existing members, it only adds new members. A Process Incremental internally creates a temporary partition, processes it with the target fact data, and then merges it with the existing partition. Process Incremental doesn’t drop aggregations and indexes. Note a big point of confusion is that when you choose “Process Incremental” in the GUI, it really translates it into a Process Add, and it only works for partitions and dimensions (yet for some reason the Process Incremental option does not show up on the GUI for dimensions – you need to fire it using XMLA). Even though the GUI has the Process Incremental option available for Cube and Measure Group, it scripts those to work only on the partitions.

I prefer to have my cube do a Process Full occasionally to mop up any deletes or updates to measure groups, and to just generally ensure it’s exactly the same as the tables it’s built from.

Other options for increasing processing speed is to improve I/O: More/Faster spindles, short-stoke the disks, use solid-state disks.

And other ways to improve processing speed: add faster CPU, add more memory, use remote partitions, use a dedicated SSAS server, changing Processing thread pool (used for allocating worker threads for processing jobs), use Jumbo Frames, changing AggregationMemoryLimitMin/AggregationMemoryLimitMax, increase the network packet size, use multiple NICs.

Since a picture is worth a thousand words, here are all the process options depending on the type of object:

More info:

Can you query a SSAS cube while it is processing?

Use cube partitions to improve Analysis Services performance

Dynamic Cube Partitioning in SSAS 2008

Handling And Tuning Really Large Analysis Services Cubes

Video Tuning Analysis Services Processing Performance

SSAS: Process Incremental – What it is & isn’t

SSIS Design Pattern – Incremental Loads

Processing Options and Settings

Analysis Services 2005 Processing Architecture

Different Kinds of SSAS Processing in simple words…..

Video PASSMN May 2010 – SSAS Design & Incremental Processing (blog)

Implementing Analysis Services Process Add Functionality

ProcessAdd on Large Dimensions

The basics of faster fact processing in Analysis Services

SSAS performance tuning (Case Study)

SSAS 2005: Cube Performance Tuning Lessons Learned

The Anatomy of a process update – Part 1

SSIS Package to Process SSAS Cube

Comments

Improving cube processing time — 5 Comments

Robert on June 25, 2014 at 10:30 pm said:

Hi James,

Thanks for your post, very informative and better than anything else I was able to locate on the web.

Can you confirm whether the approach / functionality in this post works the same in SSAS 2005?

Regards – Robert

Reply ↓
Shannon on September 4, 2014 at 7:55 pm said:

Thanks for the great info. I’m a bit confused.
If I do a Process Update on a dimension, shouldn’t I then do a Process Index on the dimension? Then after I Process Data only the partitions that were affected, I Process Index on the partitions?

Have you ever use the Process Affected Objects option? Is this advised or not and do you know why?

Reply ↓
Sam on November 6, 2014 at 7:58 am said:

Hi James,

Thanks for sharing valuable inputs. I have query for you. currently I am using Processfull for partition, which is taking lots of time. I want to convert processfull to increment or process data only. But the problem only if I added 5 row and delete one row when I processdata that delete row is not getting update even there is not data for that row in the fact table. Still it shows in the Cube. Do you have any suggestion ?

Regards,
Sam

Reply ↓
Phil on May 15, 2015 at 1:23 pm said:

Thanks James! Great job explaining the utility of each type and why you use them in your implementation.

Reply ↓
sreez on July 6, 2015 at 7:29 am said:

Thanks for sharing valuable points.:) As part of Project migration process we have upgrade all ssis packages from 2008 to 2012 server and applied connection managers at project level. And scheduled all jobs in SQL Server Agent. Analysis server cube processing task job(last) is running very poor more than 9 hours rest of packages running fine and performance also improved. Before upgrade this package(cube processing task) ran’s around 55 mins. I have checked all the cube related proc’s and view’s but views(Fact tables) query’s are performing very slow. And while processing cube some errors also populating “A duplicate attribute key has been found when processing” for this case applied ignore errors in error configuration.
Please let me know how to improve the cube processing time without doing any optimization in views.

Reply ↓

James Serra's Blog

Big Data and Data Warehousing

Comments

Improving cube processing time — 5 Comments

Leave a Reply Cancel reply

Share:

Leave a Reply Cancel reply