A data mesh is an analytical data architecture and operating paradigm in which data is considered a product and owned by the teams who know how to use the data the best.
Data is everywhere these days. Every digital activity we do generates data as a byproduct. Data is produced by everything, including systems, processes, and sensors. Technology makes it simpler for businesses to acquire and retain data, which they can then use to make better decisions or provide more personalized experiences for their consumers.
Organizations, on the other hand, have difficulty enabling and empowering their people to make the best and most timely judgments possible. Centralized data platform designs are incapable of delivering insights at the pace and scale that businesses want. These problems are addressed with the use of a data mesh.
Why Use a Data Mesh?
A data mesh, when done correctly, indicates who owns the data and, as a result, who can help add new features, provide further information about anomalies, and interact with business and technical teams to fix gaps.
Data is separated into domains that don’t need to be normalized thoroughly. Completely normalized data is no longer necessary since, in addition to being less expensive to store, it increases the join complexity for BI and advanced analytics use cases. Instead, teams usually adopt a “Starflake” schema, which is a cross between Snowflake and Star. As a consequence, they’re able to support more development teams as well as complex analytics and reporting use cases.
Data Mesh Principles
Data producers and data consumers should collaborate as closely as feasible. The ideal situation from an organizational standpoint is when the same team produces and consumes the same data, combining interest, responsibility, and competence in the same team. In fact, this is seldom possible since a data-generating team already has too many obligations in their area to fully own a data-consuming program.
Splitting those duties into two teams that interact directly without the need for a middleman is a significant step forward. A data-producing team’s purpose should be to give their data in such a way that others may benefit from it.
Data as a Product
A data mesh offers a domain-driven design (DDD) for data! The structure of data in DDD is determined by an organization’s domains. As a result, the organization and logic would be driven by each domain.
Because data can be understood as entities and characteristics, both of which are essentially domain-driven, DDD makes at least as much sense here as it does for software engineering. The data mesh applies product thinking to data, with data products being APIs. To be “discoverable”, data must be well-defined and documented.
Traditional data marts, which are data aggregations in data warehouses that are often domain-driven and managed by a small team in a more Agile manner, have a lot in common with the data mesh concept. They’re utilized to get new perspectives and address specific strategic difficulties.
As you can expect, it takes a lot of infrastructures to design, install, execute, monitor, and access a simple hexagon—a data product. The skills required to supply this infrastructure are specialized and would be impossible to reproduce in each area.
Most crucially, having access to a high-level abstraction of infrastructure that eliminates the complexity and friction of providing and maintaining a lifetime of data products is the only way that teams can autonomously control their data products. This necessitates the establishment of a new principle: self-serve a data infrastructure as a platform for domain autonomy.
As you can see, a data mesh is developed using a distributed system design, which consists of a set of separate data products with their own lifecycles, built and released by potentially independent teams.
However, to obtain value in the form of higher-order datasets, insights, or machine intelligence, these disparate data products must interoperate. They must be able to correlate, make unions, identify intersections, conduct other graphs or set operations on them at scale.
A data mesh implementation that incorporates decentralization and domain self-sovereignty, interoperability through global standardization, a dynamic topology, and, most crucially, automated platform decision execution is required for any of these processes to be possible. This is what I refer to as federated computational governance.
Data as a Product: Data Mesh
The high friction and expense of obtaining, interpreting, trusting, and ultimately utilizing excellent data is one of the problems of present analytical data infrastructures. If not handled, the problem will only get worse as the number of sites and teams providing data grows. This would be the result of our first decentralization principle.
The data-as-a-product philosophy is intended to address data quality and the age-old problem of data silos, or black data. These are, as Gartner defines it, “the information assets businesses acquire, process, and store during routine business activities, but seldom employ for other purposes”. Domain-provided analytical data must be handled as a product, and data consumers should be treated as customers—happy customers.
For domain data to be called a product, a data mesh implementation should provide discoverability, security, explorability, understandability, trustworthiness, and so on. It should also outline the roles that businesses must establish (such as the domain data product owner, who is accountable for the objective metrics that guarantee data is supplied as a product).
These metrics include data quality, reduced data consumption lag time, and overall data user happiness as measured by the net promoter score. The owner of a domain data product must have a thorough grasp of who the data consumers are, how they use the data, and what native methods they prefer to use to consume the data. Using this detailed knowledge of data users, data product interfaces that fit users’ demands are designed.
Honestly, there are just a few traditional personas with their own tooling and expectations for the bulk of data products on the mesh: data analysts and data scientists. To support them, all data products can build standardized interfaces. The communication between data consumers and product owners is an important part of developing data product interfaces.
A data mesh integrates siloed data to assist businesses in moving toward scaled automated analytics. It enables enterprises to break free from the consumptive trap of monolithic data structures and save money on operations and storage. By delegating data administration and ownership to domain-specific business teams, this novel distributed strategy promises to alleviate data access bottlenecks caused by centralized data ownership.