How an MPP appliance solution can improve your future
Massive parallel processing (MPP) is the future for data warehousing.
So what is MPP? SQL Server is a Symmetric Multiprocessing (SMP) solution, which essentially means it uses one server. MPP provides scalability and query performance by running independent servers in parallel. That is the quick definition. For more details, read What MPP means to SQL Server Parallel Data Warehouse.
Microsoft has an MPP appliance called the Analytics Platform System (APS). If you are building a new data warehouse that will be of any decent size in the next few years (i.e. 1TB or greater), it is really a “no brainer” to purchase a MPP solution over a SMP solution,
Looking at the value over time for a MPP appliance vs a SMP solution:
Comparison of Cumulative Cash Flows of Customer Experience Project using a Big Data Solution (MPP) vs. a Traditional Data Warehouse Appliance
Source: Wikibon 2011
The financial metrics of the two approaches were overwhelmingly in favor of the Big Data Solution (MPP) approach:
- Big Data Approach:
- Cumulative 3-year Cash Flow – $152M,
- Net Present Value – $138M,
- Internal Rate of Return (IRR) – 524%,
- Breakeven – 4 months.
- Traditional DW Appliance Approach:
- Cumulative 3-year Cash Flow – $53M,
- Net Present Value – $46M,
- Internal Rate of Return (IRR) – 74%,
- Breakeven – 26 months.
The bottom line is that for big data projects, the traditional data warehouse approach is more expensive in IT resources, takes much longer to do, and provides a less attractive return-on-investment.
Getting into the specific reasons to choose MPP (i.e. APS) over SMP when building a data warehouse:
- Be proactive instead of reactive (solve future performance problems now and not when they rear their ugly head)
- The hardware cost to upgrade to powerful servers that are clustered (for high availability) can be more than a quarter rack of APS (which has high availability built-in)
- You can use the same SQL Server EE licenses for APS
- You can use your current premium support – just add hours for APS
- So you may need to just need to purchase hardware and there are three vendors to choose from: HP, Dell, and Quanta. Each solution uses commodity hardware than can be de-racked and repurposed if for whatever reason the MPP solution did not work out
- It makes data warehouse design easier as you don’t need the typical “band aids” and work-arounds with a SMP solution to get the required performance: star schema’s, aggregate tables, cubes, data marts, multiple data warehouse servers, etc. This also reduces the complexity of ETL and therefore makes it easier to maintain
- Data warehouse development is quicker because of APS speed: https://www.jamesserra.com/archive/2014/10/non-obvious-apspdw-benefits/
- Knocks down any potential hardware barriers down the road as it is long-term solution where you can easily scale your hardware by sliding in a new rack instead of fork-lifting to a new server (i.e. purchase and build and tune, backup and restore to the new hardware, move over security, repoint users to the new server)
- You get 30x-100x performance improvement over SMP and when scaling your hardware you get linear performance benefits (i.e. double your hardware you double your performance) as opposed to the 20-30% performance improvement when fork-lifting to a new server
- Use can use your existing SQL Server skillset as APS is very much like SQL Server so little new training is needed
- You don’t have to say “no” anymore to end-users when they ask for more data because of the reasons: we don’t have room on the SAN, we can’t give you the query performance, we are bumping up into our maintenance window
- You need non-relational data down the road. Use can use Hadoop which is a platform designed and optimized for new forms of data, and then use PolyBase for easy access. You can create a data lake in Hadoop for the end-user to mine data and let you know what is useful
- If your data warehouse has been around a while this may be a good time to re-engineer and development will be so much quicker with MPP
- Even if you don’t have performance problems now but see the value in big data and analytics and want to be ready for it, especially IoT
- Tons of additional benefits that you don’t get with SMP: See Parallel Data Warehouse (PDW) benefits made simple
If you are currently housing a data warehouse on SMP, it will almost always be worth the migration effort to switch to a MPP solution. Remember that old saying: “pay me now or pay me later”!
But there are some reasons where MPP may not be a good fit or should be supplemented with a SMP solution and/or a SSAS cube:
- High volume transactional workloads (OLTP). MPP is for data warehousing (heavy reads and batch writes)
- Small company with small budget
- Thousands of concurrent users
- Super-fast dashboard query performance
- Chatty workload
- 24/7 SLA
- Need replication (true real-time updates)
Another benefit is APS is an appliance solution, meaning it is prebuilt with software, hardware, and networking components already installed. Think of it as “Big data in a box”. Some of the appliance benefits:
- Deploy in hours, not weeks
- Save time by implementing a turnkey solution complete with hardware and software
- Gain confidence and piece-of-mind by deploying a pre-tested and tuned data warehouse optimized for your specific workload
- Reduce operational costs and simplify management
- Reduce energy costs and environmental impact through balanced infrastructure and performance engineering
- Maximize reliability through the use of industry standard infrastructure and software components
- React quickly with unprecedented database agility
- No individualized patching of servers
- Much lower overall TCO
More info:
Financial Comparison of Big Data MPP Solution and Data Warehouse Appliance
Remember that old saying: “pay me know or pay me later”!
Should be “now” not “know” 🙂 (reminds me of the joke where the neighbor is apologizing profusely for using my wife (where his auto-correct keeps replacing wifi 🙂
How does PowerBI make use of the MPP environment?
Rob
Thanks for the catch Rob, I have corrected my misspelling. The various Power BI products will make use of the MPP environment. For example, using Power View the SQL query it runs behind the scenes will run on the APS so it uses the power of the MPP appliance, with the results of the query returned to Power View.
Hi James,
Thanks for all the good stuff you’ve been publishing. Quick question about your “band aid” comment – do you really see a star schema that way? Do you see customers doing less dimensional modeling even for traditional datasets?
Thanks!
Melissa
P.S. I think you need a girl on your Favorite Blogs list. Hint hint.
Hi Melissa!
The two biggest benefits of a star schema are 1) performance, 2) provide an abstract layer for self-service reporting. Benefit #1 goes away with a MPP solution, but benefit #2 can still be very helpful, so I still see a lot of customers doing dimensional modeling. One negative is that you have to have another copy of the data and create/maintain additional ETL, but it is almost always worth it.