Questions to ask when designing a data architecture
When I’m leading a full-day architecture design session (ADS) with a customer and the goal is to come up with a data architecture for them, I first start doing “discovery” with them for at least a couple of hours so I can come up with the best architecture for their particular use case. Discovery involves asking them a bunch of questions so I can design a high-level architecture without mentioning any products at first, then once the high-level architecture is complete, I’ll ask more questions and apply products to the architecture. I tell customers Microsoft has many tools in the toolbox for building a solution, and based on the answers to my questions, I’ll reduce that toolbox to just a few tools that are most appropriate for them. I know it is very difficult for customers to keep up with all the new technology Microsoft comes out with on a very frequent bases (after all, you have day jobs), and the reason Microsoft has architects like me is to clear up the confusion and help customers in choosing the right tools for the job.
As an example, if I were asked what product to use to store data in the Azure cloud, I could come up with at least a dozen options, so I need to ask questions to reduce the choices to the best use case for the customers situation. This will avoid what I have seen many times – a company chooses a particular product and after their solution is built, they say the product is “terrible”, but they were using it for a use case that it was not designed for. But the customer was not aware of a better product for their use case because “they don’t know what they don’t know”. That is why you should work with an architect expert as one of your first order of business: the technology decisions at this early part of building a solution are vital to get correct, as finding out 6-months or one year later that you made the wrong choice and have to start over can lead to so much wasted time and money (and I have seen some shocking waste).
Some of the questions I will ask:
- Can you use the cloud? (nowadays, this is almost always yes, if not, let’s evaluate why and see if we can overcome it)
- Is this a new solution or a migration?
- What is the skillset of the developers?
- Is this an OLTP or OLAP/DW solution?
- Will you use non-relational data (variety)?
- How much data do you need to store (volume)?
- Will you have streaming data (velocity)?
- Will you use dashboards and/or ad-hoc queries?
- Will you use batch and/or interactive queries?
- How fast do the operational reports need to run (SLA’s)?
- Will you do predictive analytics/machine learning (ML)?
- Do you want to use Microsoft tools or open source?
- What are your high availability and/or disaster recovery requirements?
- Do you need to master the data (MDM)?
- Are there any security limitations with storing data in the cloud (i.e. defined in your customer contracts)?
- Does this solution require 24/7 client access?
- How many concurrent users will be accessing the solution at peak-time and on average?
- What is the skill level of the end users?
- What is your budget and timeline?
- Is the source data cloud-born and/or on-prem born?
- How much daily data needs to be imported into the solution?
- What are your current pain points or obstacles (performance, scale, storage, concurrency, query times, etc)?
- Are you ok with using products that are in public or private preview?
- What are your security requirements? Do you need data sovereignty?
- Is data movement a challenge?
- How much self-service BI would you like?
And you have to be flexible: after a day spent in an ADS with the customer, going over all these questions and coming up with the best architecture and products for them, you might hear them say that it will cost too much and they want a more cost-effective solution. And I usually brief customers on what new products and features that Microsoft has in private preview (or about to be) as it may be something they want to consider if it fits within their timeline, which is usually quite long when building a data architecture such as a modern data warehouse, data fabric, data lakehouse, or data mesh (see Data Lakehouse, Data Mesh, and Data Fabric).
Sometimes I do miss the days when we just had to worry about a new version of SQL Server every few years, where we would just go to a bootcamp for a few weeks and then we knew everything we needed. But learning is fun so I do prefer the challenge of today’s world where the technology is changing on a near-daily bases.
More info:
Hi James great blog post it’s a good aide for anyone designing a new data platform. Just a note on the link at the bottom for more info. It’s pretty dated, It may be a bit confusing.
Wonderful. Hand picked questions to be used anytime .thanks james
Pingback:Data Architecture Questions to Ask – Curated SQL
Great list, thank you!
Usually I by no means touch upon blogs however your article is so convincing that I by no means prevent myself to mention some thing approximately it. Really useful.