Designing data products
Summary
This article provides a practical methodology for designing data products as part of a data mesh implementation. The authors advocate for a 'working backwards' approach, starting from specific use cases rather than data sources, to avoid analysis paralysis. They define what data products are (self-contained, deployable units serving analytical data) and what they're not (dashboards, PDFs, or entire data warehouses). The process involves short workshops to identify data products, overlay multiple use cases to ensure reusability, assign clear domain ownership, and define service level objectives. The article concludes with implementation guidance including establishing reusable blueprints and automating governance.
Key Insight
Designing data products should start from concrete use cases and work backwards to data sources, ensuring each product represents a single cohesive information concept describable in one or two sentences.
Spicy Quotes (click to share)
- 3
There's a common tendency to start with the data sources and define data products. Without the constraints of a tangible use case, you won't know when your design is good enough to move forward with implementation, which often leads to analysis paralysis and lots of wasted effort.
- 4
If you find it difficult to describe a data product in one or two simple sentences, it's likely not well-defined.
- 4
No single data product should be owned by multiple domains, as this can lead to confusion and finger-pointing over quality issues.
- 4
For structured data, this usually means a single denormalized table, and for semi-structured or unstructured data, a single dataset. Anything larger is likely trying to do too much.
- 3
Conflating data product with too many different concepts not only creates confusion among teams but also makes it significantly harder to develop reusable blueprints.
- 2
The defined SLOs will guide the architecture, solution design and implementation of the data product.
- 3
If they need to prioritise one aspect of data mesh, it should be 'data as a product'. Focusing on getting that right establishes a strong foundation, enabling the other pillars to follow naturally.
Tone
methodical, practical, prescriptive
