Top Modern Data Warehouse questions
Below are the top 15 questions I am seeing from customers looking to build a modern data warehouse in the cloud, and the blogs that I have wrote that try to answer each question (I have updated most of these blogs):
- Do I need a data lake? Data lake details
- Do I need a relational data warehouse? Is the traditional data warehouse dead?
- Do I need a cube? The need for having both a DW and cubes
- Where do I clean my data? Where should I clean my data?
- What is the Common Data Model (CDM)? Common Data Model
- How should I organize the data lake? For this question I defer to my favorite blogger, Melissa Coates (SQL Chick): Zones in a Data Lake, Data Lake Use Cases and Planning Considerations, FAQs About Organizing a Data Lake
- What is the future of Hadoop? The difficulty in answering this question relies in exactly what is “Hadoop” anymore (it used to consist of HDFS, YARN, MapReduce and tools built on top of these such as Hive and Spark). Products like Azure Data Lake Store Gen2 are HDFS compatible which I see being around for a real long time due to the rise of data lakes (I see HDFS as the primary source for a data lake). Meanwhile, SQL Server 2019 Big Data Clusters creates HDFS data nodes and includes Apache Spark. See Did Hadoop kill Data Warehousing or Save it?
- Should I put all our structured data into our data lake? Should I load structured data into my data lake?
- How does a NoSQL database like Cosmos DB compare to a relational database? Understanding Cosmos DB
- Can you explain the use cases for big data products? Use cases of various products for a big data cloud solution
- What are all the products I can use to clean my data? What product to use to transform my data?
- Is Azure Data Factory “SSIS in the cloud”? Azure Data Factory Data Flow
- Can the cloud save us money? Cost savings of the cloud
- What do you think of data virtualization? Data Virtualization vs Data Warehouse
- Do you know a good workshop to learn big data technologies? Big Data Workshop
I have also put these questions under a new FAQ page.
Just wanted to express appreciation for such generous and well organized knowledge sharing. As a data architect stuck in the slow-to-adapt, on-premise, federal behemoth of data infrastructure, the bullet express train of private sector cloud ETL and warehousing rushing by can be overwhelming to digest.
These 15 questions orbit in one way or the other around the Data Lake concept and that was a very helpful focus point for grasping the bigger picture.
Glad you found it helpful Mike!
Pingback:Top Modern Data Warehouse questions ⋆ GeekMustHave
Great insights. Really helped me with my client.