Data Products - The revolutionary change

The way we manage data hasn't change in the last 30 years

Feb 09, 2023

👋 Hi folks, thanks for reading my newsletter! My name is Diogo Santos, and I write about data product principles, the evolution of the modern data stack, and the journey to data mesh (the future of data architecture).

In today’s article, I’ll discuss one of the main trends I’m more excited about, data products. I will address what are data products, why this shift in data management and the benefits for data teams. Please consider subscribing if you haven’t already. Reach out on LinkedIn if you ever want to connect.

Tags: Data Products | Data Strategy | Data Mesh | Data Product Management

The way we manage data didn’t change too much in the last 30 years. There was a big shift to the cloud and the rise of new data technologies, which are now part of the Modern Data Stack, but the foundations of data management kept the same.

Take data from source applications
Integrate into a new database
Clean the data and process it to a new state
Analyze, and create reports and machine learning models

We have been dealing with data initiatives from a purely technical perspective.

And because of this, paraphrasing Einstein, we keep doing the same thing over and over again and expecting different results.

The bottom line. Data management can’t be just about technology.

Data management is a socio-technical phenomenon, meaning we need to include people and processes in addition to technology to modernize and enhance our data management efforts.

And with this in mind, one of the trends I’m extremely excited about and that I think is here to stay is Data Products.

Applying Product Thinking

Before product thinking was applied to software engineering management, teams would spend several hours thinking about features that, in the end, would not be relevant to the user. Or worse, planning a software product that nobody cares about.

Today, it’s already a common practice to apply product thinking to software development. The main objective is to make sure teams build a product that solves a real user problem.

To accomplish that goal, the team product manager:

Gathers feedback from users/customers
Create software requirements and use cases, based on data collection from the first step
Defines a roadmap
Creates a plan and a backlog of actions
Manages the software releases

The product manager is also responsible for prioritizing features to build, estimating the time involved, and testing key functionalities of the software.

In summary, the PM is obsessed with users getting real value from the product.

Bringing this mindset to data is what leads to generating Data Products.

But what exactly is a data product?

In the context of Data Mesh, a data product is an autonomous data unit that enables teams to own, operate and govern their data to create business value.

They should be highly modular (such as application microservice), with clear boundaries and APIs, making it easy for teams to manage and evolve their data product over time.

Some examples are:

Analytics – historic/real-time reports, dashboards, charts and data visualizations
Models – domain objects, schema and data models (ER, Ontology, Taxonomy, XSD, etc), object models (UML), Machine Learning features and attributes
Algorithms – Machine Learning models (production models), scoring, business rules (RETE, DROOLS, etc)
Data Services & APIs – Payloads (JSON, Avro, Protobuf), topics/queues (JMS, Kafka, Pulsar, etc), REST APIs (gateways and contracts), DDD Aggregates (eg; serialized payloads)

Obviously, not all data must be a data product.

As shared by Jeffrey T. Pollock:

Not all data is exchanged between different entities, and not all data that is exchanged is for the purposes of a specific business outcome.
Ultimately, the exact definition of whether a particular data (datum?) should rise to the level of being a Data Product is up to your organization and whether the added rigor around data product management will help with the innovation, quality and governance of the multi-party exchange. There will inevitably be a lot of data flowing which is not managed ‘as a product.’

Data Product Principles

Regardless of the data product, I believe they should follow a set of principles. Otherwise, they are not different from what data teams have been deploying to businesses in the last decades.

Ownership

Who owns and is responsible for the data product?
Who fixes the product when it breaks?
Who defines the requirements?

In a real-life scenario, deprecating an unused Postgres table can lead to a broken data pipeline, which means a team won’t get the data they need, a machine learning inference won’t happen or a real-time dashboard won’t be updated.

Who solves the problem? The software engineering team? The data engineering team? The analytics & machine learning team?

Data contracts are a great help to avoid such situations in the first place. I will write more about it in a different post.

Boundaries

What are the inputs and outputs of the data product?
What is the roadmap?
How to balance the roadmap against other organizational priorities and considerations?

Expectations

Where is the data product being used?
What are the SLAs and SLOs?
How to ensure data quality?
How will it be monitored and tested?
Are there any security requirements?
Who can access this data product?

Data Consumers

Who will be the consumers of the data product?
How data product value will be measured? User activity? Real active users? Time savings? Financial return?
How will it evolve?
What is the consumption interface?
Are the consumer data educated? How will the enablement plan work?

Semantics

How is this data product related to others?
What is the ontology with corresponding definitions?

Decentralization

Small and medium-sized organizations are centralized. With time they identify specific domains that need decentralized responsibility.

The Data Mesh trend is to push the data responsibility for the domain team that knows and understands it best and let a centralized entity/team manage aspects that are for the greater good of an organization

The expectations and semantics principle should be managed by a central entity. That is how you ensure standardization and connections between different data products. All the other principles should be managed by the specific domain. They are the ones capable to understand them better.

Final Takeaway

Data Products are here to stay.

I think the best word to describe their goal is measurement. You want to understand who is consuming which data product and for what reason in order to understand if the data team is being successful and generating value. All the rest is done to fulfill this need.

In the upcoming years, we will see an evolution in what data products should be as teams start implementing them and understand what worked and what doesn’t.

If you want to read more content make sure to follow me on LinkedIn for more weekly posts, and if you liked my article, please consider subscribing.

Thank you so much for reading and let’s talk again soon.

The Data Product Thinking

Discussion about this post