SQL Server 2014: Columnstore Index improvements
In SQL Server 2012, a new feature was added called Columnstore Indexes that resulted in huge query performance improvements. In SQL Server 2014, there have been two major improvements on this feature:
Clustered: They have enhanced the columnstore to be a pure columnar store, so indexing is no longer required. The In-Memory ColumnStore for data warehousing is implemented as a clustered columnstore index (or CCI) on a table. The data in a CCI is grouped and stored for every column in the table. Unlike the non-clustered columnstore index, the CCI is the data – there is no other underlying data structure
Updatable: You are able to insert, update, and delete data in an existing ColumnStore. Note that a columnstore index is impossible to update “in-place” due to its highly compressed structure, so “deltastores” are used. The same solution was done for v2 of PDW
Note you will still be able to create a non-clustered columnstore index which is not updatable. A table with a clustered columnstore index cannot have any type of nonclustered index.
More info:
CREATE COLUMNSTORE INDEX (Transact-SQL)
SQL SERVER 2014 – Columnstore Index Enhancement – Part 1
Clustered Columnstore Indexes – part 1 (“Intro”)
Updatable columnstore index, sp_spaceused and sys.partitions
SQL SERVER CLUSTERED COLUMNSTORE INDEXES AT TECHED 2013
Video What’s New for Columnstore Indexes and Batch Mode Processing
Enhancements to SQL Server Column Stores
What’s New for Columnstore Indexes in SQL Server 2014
ColumnStore Archival Compression–SQL Server 2014
New Enhanced Column Store Index in SQL Server 2014 – Part 1
Getting Started with Columnstore Index in SQL Server 2014 – Part 1
Nice post I’m glad some of the PDW work/research is going into the new SQL Server. That thing is a beast!
So just a quick clarifying question. The SQL Server data pages will now how column information in it instead of row using the new Clustered Column Store Index right?
So let’s say a column is 64kb in size, that one column will take up an entire extent, while another column in the same table could be on other pages in a different extent? I feel like I have to turn my head sideways now when looking at how data is stored.
How does this now writable index affect the speed of ETL bulk inserts? Is the effect similar to an already page compressed table as described in the white paper “The Data Loading Performance Guide”?