The Data World in 2023
What to expect from data engineering and machine learning domains in the future
It’s that time of year again, when data engineering, machine learning and dataOps leaders, consultants and vendors look back to what happened, check the trends and make their predictions. After a whirlwind in 2022, it’s no easy task this time around.
Overall, the main data evolution for 2023 will be around the following:
Reducing the bottleneck between data producers and data consumers as the business grows and evolves.
The definition of governance to better manage data and machine learning models. This will come together with DataOps.
Data quality will start getting his due attention as it is a must for data reliability and trust in organizations and a key for machine learning success.
With these concepts in mind, let’s dive into the predictions for the future of data:
Data engineering will become mainstream
During last year multiple data initiatives were impacted by bad data modeling, infrastructure, overwhelming requests to the data engineering team, and several data silos created by business/data analysts to correct data they needed to work. This made the business understand that without a good data engineering plan, all of the rest will struggle to succeed.
Data with quality and not just more data
Enough for the time when businesses were more concerned about collecting more and more data, without making sure of its due quality. Ensuring quality data at the source might be the best way to achieve successful data usage in the organization.
Product thinking applied to data and machine learning
All data professionals saw already tables, pipelines, or dashboards that are either deprecated or simply left to die. By treating data as a product, the user and the business becomes the center of the development. Data teams need to ensure the data product fits a specific problem, evaluate user engagement, ensure product progress and maintenance and assess if the user is educated enough to take full advantage of the technology.
Data contracts are starting to gain traction
Internal systems generate data that lands in the data Data Warehouses, usually through CDC (Change Data Capture). However, software engineers in charge of these systems are often unaware of the data dependencies that were built on top of CDC processes by Data Engineers. So when they make an update to their service that results in a schema change, the data systems crash.
Data contracts are being implemented to enforce data schemas, levels of data access required, data ownership, which data is being extracted, anonymization, and other systems that might be impacted if something changes at the source.
Distributed and domain-oriented data architecture
Data lakes will evolve from monolithic architecture to domain-oriented data mesh the same way application architecture evolved from a monolithic architecture to domain-driven microservices.
New roles across data teams
We will start seeing recruitment for Data product managers to boost adoption and monetization and DataOps engineers to focus on governance and efficiency.
Notebooks are well-positioned to become the new Excel
More and more business users are becoming familiar with python, SQL and R. Writing small scripts of code aren’t as scary as it was 10 years ago. The ability to quickly pull data from a database or create a simple app with streamlit, is very powerful.
Most machine learning models (>51%) will successfully make it to production
Yes, you heard it right. Deploying machine learning in production is no longer a secret science. Making sure this model will positively impact the business or avoid that it won’t break in the first week will become the new big problem.
Monitoring and observability tooling will consolidate
Real-time monitoring of the data pipelines, observability of the data that is being ingested in the data warehouse and data lineage will become crucial to scale data operations.
Is there anything I miss?
If you want to read more content make sure to follow me on LinkedIn for more weekly posts, and if you liked my article, please consider subscribing. I post once a week, every week.
Thank you so much for reading and let’s talk again soon.