We learn & share

ACA Group Blog

Read more about our thoughts, views, and opinions on various topics, important announcements, useful insights, and advice from our experts.

Featured

8 MAY 2025
Reading time 5 min

In the ever-evolving landscape of data management, investing in platforms and navigating migrations between them is a recurring theme in many data strategies. How can we ensure that these investments remain relevant and can evolve over time, avoiding endless migration projects? The answer lies in embracing ‘Composability’ - a key principle for designing robust, future-proof data (mesh) platforms. Is there a silver bullet we can buy off-the-shelf? The data-solution market is flooded with data vendor tools positioning themselves as the platform for everything, as the all-in-one silver bullet. It's important to know that there is no silver bullet. While opting for a single off-the-shelf platform might seem like a quick and easy solution at first, it can lead to problems down the line. These monolithic off-the-shelf platforms often end up inflexible to support all use cases, not customizable enough, and eventually become outdated.This results in big complicated migration projects to the next silver bullet platform, and organizations ending up with multiple all-in-one platforms, causing disruptions in day-to-day operations and hindering overall progress. Flexibility is key to your data mesh platform architecture A complete data platform must address numerous aspects: data storage, query engines, security, data access, discovery, observability, governance, developer experience, automation, a marketplace, data quality, etc. Some vendors claim their all-in-one data solution can tackle all of these. However, typically such a platform excels in certain aspects, but falls short in others. For example, a platform might offer a high-end query engine, but lack depth in features of the data marketplace included in their solution. To future-proof your platform, it must incorporate the best tools for each aspect and evolve as new technologies emerge. Today's cutting-edge solutions can be outdated tomorrow, so flexibility and evolvability are essential for your data mesh platform architecture. Embrace composability: Engineer your future Rather than locking into one single tool, aim to build a platform with composability at its core. Picture a platform where different technologies and tools can be seamlessly integrated, replaced, or evolved, with an integrated and automated self-service experience on top. A platform that is both generic at its core and flexible enough to accommodate the ever-changing landscape of data solutions and requirements. A platform with a long-term return on investment by allowing you to expand capabilities incrementally, avoiding costly, large-scale migrations. Composability enables you to continually adapt your platform capabilities by adding new technologies under the umbrella of one stable core platform layer. Two key ingredients of composability Building blocks: These are the individual components that make up your platform. Interoperability: All building blocks must work together seamlessly to create a cohesive system. An ecosystem of building blocks When building composable data platforms, the key lies in sourcing the right building blocks. But where do we get these? Traditional monolithic data platforms aim to solve all problems in one package, but this stifles the flexibility that composability demands. Instead, vendors should focus on decomposing these platforms into specialized, cost-effective components that excel at addressing specific challenges. By offering targeted solutions as building blocks, they empower organizations to assemble a data platform tailored to their unique needs. In addition to vendor solutions, open-source data technologies also offer a wealth of building blocks. It should be possible to combine both vendor-specific and open-source tools into a data platform tailored to your needs. This approach enhances agility, fosters innovation, and allows for continuous evolution by integrating the latest and most relevant technologies. Standardization as glue between building blocks To create a truly composable ecosystem, the building blocks must be able to work together, i.e. interoperability. This is where standards come into play, enabling seamless integration between data platform building blocks. Standardization ensures that different tools can operate in harmony, offering a flexible, interoperable platform. Imagine a standard for data access management that allows seamless integration across various components. It would enable an access management building block to list data products and grant access uniformly. Simultaneously, it would allow data storage and serving building blocks to integrate their data and permission models, ensuring that any access management solution can be effortlessly composed with them. This creates a flexible ecosystem where data access is consistently managed across different systems. The discovery of data products in a catalog or marketplace can be greatly enhanced by adopting a standard specification for data products. With this standard, each data product can be made discoverable in a generic way. When data catalogs or marketplaces adopt this standard, it provides the flexibility to choose and integrate any catalog or marketplace building block into your platform, fostering a more adaptable and interoperable data ecosystem. A data contract standard allows data products to specify their quality checks, SLOs, and SLAs in a generic format, enabling smooth integration of data quality tools with any data product. It enables you to combine the best solutions for ensuring data reliability across different platforms. Widely accepted standards are key to ensuring interoperability through agreed-upon APIs, SPIs, contracts, and plugin mechanisms. In essence, standards act as the glue that binds a composable data ecosystem. A strong belief in evolutionary architectures At ACA Group, we firmly believe in evolutionary architectures and platform engineering, principles that seamlessly extend to data mesh platforms. It's not about locking yourself into a rigid structure but creating an ecosystem that can evolve, staying at the forefront of innovation. That’s where composability comes in. Do you want a data platform that not only meets your current needs but also paves the way for the challenges and opportunities of tomorrow? Let’s engineer it together Ready to learn more about composability in data mesh solutions? {% module_block module "widget_f1f5c870-47cf-4a61-9810-b273e8d58226" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Contact us now!"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":230950468795,"href":"https://25145356.hs-sites-eu1.com/en/contact","href_with_scheme":null,"type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more
We learn & share

ACA Group Blog

Read more about our thoughts, views, and opinions on various topics, important announcements, useful insights, and advice from our experts.

Featured

8 MAY 2025
Reading time 5 min

In the ever-evolving landscape of data management, investing in platforms and navigating migrations between them is a recurring theme in many data strategies. How can we ensure that these investments remain relevant and can evolve over time, avoiding endless migration projects? The answer lies in embracing ‘Composability’ - a key principle for designing robust, future-proof data (mesh) platforms. Is there a silver bullet we can buy off-the-shelf? The data-solution market is flooded with data vendor tools positioning themselves as the platform for everything, as the all-in-one silver bullet. It's important to know that there is no silver bullet. While opting for a single off-the-shelf platform might seem like a quick and easy solution at first, it can lead to problems down the line. These monolithic off-the-shelf platforms often end up inflexible to support all use cases, not customizable enough, and eventually become outdated.This results in big complicated migration projects to the next silver bullet platform, and organizations ending up with multiple all-in-one platforms, causing disruptions in day-to-day operations and hindering overall progress. Flexibility is key to your data mesh platform architecture A complete data platform must address numerous aspects: data storage, query engines, security, data access, discovery, observability, governance, developer experience, automation, a marketplace, data quality, etc. Some vendors claim their all-in-one data solution can tackle all of these. However, typically such a platform excels in certain aspects, but falls short in others. For example, a platform might offer a high-end query engine, but lack depth in features of the data marketplace included in their solution. To future-proof your platform, it must incorporate the best tools for each aspect and evolve as new technologies emerge. Today's cutting-edge solutions can be outdated tomorrow, so flexibility and evolvability are essential for your data mesh platform architecture. Embrace composability: Engineer your future Rather than locking into one single tool, aim to build a platform with composability at its core. Picture a platform where different technologies and tools can be seamlessly integrated, replaced, or evolved, with an integrated and automated self-service experience on top. A platform that is both generic at its core and flexible enough to accommodate the ever-changing landscape of data solutions and requirements. A platform with a long-term return on investment by allowing you to expand capabilities incrementally, avoiding costly, large-scale migrations. Composability enables you to continually adapt your platform capabilities by adding new technologies under the umbrella of one stable core platform layer. Two key ingredients of composability Building blocks: These are the individual components that make up your platform. Interoperability: All building blocks must work together seamlessly to create a cohesive system. An ecosystem of building blocks When building composable data platforms, the key lies in sourcing the right building blocks. But where do we get these? Traditional monolithic data platforms aim to solve all problems in one package, but this stifles the flexibility that composability demands. Instead, vendors should focus on decomposing these platforms into specialized, cost-effective components that excel at addressing specific challenges. By offering targeted solutions as building blocks, they empower organizations to assemble a data platform tailored to their unique needs. In addition to vendor solutions, open-source data technologies also offer a wealth of building blocks. It should be possible to combine both vendor-specific and open-source tools into a data platform tailored to your needs. This approach enhances agility, fosters innovation, and allows for continuous evolution by integrating the latest and most relevant technologies. Standardization as glue between building blocks To create a truly composable ecosystem, the building blocks must be able to work together, i.e. interoperability. This is where standards come into play, enabling seamless integration between data platform building blocks. Standardization ensures that different tools can operate in harmony, offering a flexible, interoperable platform. Imagine a standard for data access management that allows seamless integration across various components. It would enable an access management building block to list data products and grant access uniformly. Simultaneously, it would allow data storage and serving building blocks to integrate their data and permission models, ensuring that any access management solution can be effortlessly composed with them. This creates a flexible ecosystem where data access is consistently managed across different systems. The discovery of data products in a catalog or marketplace can be greatly enhanced by adopting a standard specification for data products. With this standard, each data product can be made discoverable in a generic way. When data catalogs or marketplaces adopt this standard, it provides the flexibility to choose and integrate any catalog or marketplace building block into your platform, fostering a more adaptable and interoperable data ecosystem. A data contract standard allows data products to specify their quality checks, SLOs, and SLAs in a generic format, enabling smooth integration of data quality tools with any data product. It enables you to combine the best solutions for ensuring data reliability across different platforms. Widely accepted standards are key to ensuring interoperability through agreed-upon APIs, SPIs, contracts, and plugin mechanisms. In essence, standards act as the glue that binds a composable data ecosystem. A strong belief in evolutionary architectures At ACA Group, we firmly believe in evolutionary architectures and platform engineering, principles that seamlessly extend to data mesh platforms. It's not about locking yourself into a rigid structure but creating an ecosystem that can evolve, staying at the forefront of innovation. That’s where composability comes in. Do you want a data platform that not only meets your current needs but also paves the way for the challenges and opportunities of tomorrow? Let’s engineer it together Ready to learn more about composability in data mesh solutions? {% module_block module "widget_f1f5c870-47cf-4a61-9810-b273e8d58226" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Contact us now!"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":230950468795,"href":"https://25145356.hs-sites-eu1.com/en/contact","href_with_scheme":null,"type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more

All blog posts

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Reading time 5 min
8 MAY 2025

In the ever-evolving landscape of data management, investing in platforms and navigating migrations between them is a recurring theme in many data strategies. How can we ensure that these investments remain relevant and can evolve over time, avoiding endless migration projects? The answer lies in embracing ‘Composability’ - a key principle for designing robust, future-proof data (mesh) platforms. Is there a silver bullet we can buy off-the-shelf? The data-solution market is flooded with data vendor tools positioning themselves as the platform for everything, as the all-in-one silver bullet. It's important to know that there is no silver bullet. While opting for a single off-the-shelf platform might seem like a quick and easy solution at first, it can lead to problems down the line. These monolithic off-the-shelf platforms often end up inflexible to support all use cases, not customizable enough, and eventually become outdated.This results in big complicated migration projects to the next silver bullet platform, and organizations ending up with multiple all-in-one platforms, causing disruptions in day-to-day operations and hindering overall progress. Flexibility is key to your data mesh platform architecture A complete data platform must address numerous aspects: data storage, query engines, security, data access, discovery, observability, governance, developer experience, automation, a marketplace, data quality, etc. Some vendors claim their all-in-one data solution can tackle all of these. However, typically such a platform excels in certain aspects, but falls short in others. For example, a platform might offer a high-end query engine, but lack depth in features of the data marketplace included in their solution. To future-proof your platform, it must incorporate the best tools for each aspect and evolve as new technologies emerge. Today's cutting-edge solutions can be outdated tomorrow, so flexibility and evolvability are essential for your data mesh platform architecture. Embrace composability: Engineer your future Rather than locking into one single tool, aim to build a platform with composability at its core. Picture a platform where different technologies and tools can be seamlessly integrated, replaced, or evolved, with an integrated and automated self-service experience on top. A platform that is both generic at its core and flexible enough to accommodate the ever-changing landscape of data solutions and requirements. A platform with a long-term return on investment by allowing you to expand capabilities incrementally, avoiding costly, large-scale migrations. Composability enables you to continually adapt your platform capabilities by adding new technologies under the umbrella of one stable core platform layer. Two key ingredients of composability Building blocks: These are the individual components that make up your platform. Interoperability: All building blocks must work together seamlessly to create a cohesive system. An ecosystem of building blocks When building composable data platforms, the key lies in sourcing the right building blocks. But where do we get these? Traditional monolithic data platforms aim to solve all problems in one package, but this stifles the flexibility that composability demands. Instead, vendors should focus on decomposing these platforms into specialized, cost-effective components that excel at addressing specific challenges. By offering targeted solutions as building blocks, they empower organizations to assemble a data platform tailored to their unique needs. In addition to vendor solutions, open-source data technologies also offer a wealth of building blocks. It should be possible to combine both vendor-specific and open-source tools into a data platform tailored to your needs. This approach enhances agility, fosters innovation, and allows for continuous evolution by integrating the latest and most relevant technologies. Standardization as glue between building blocks To create a truly composable ecosystem, the building blocks must be able to work together, i.e. interoperability. This is where standards come into play, enabling seamless integration between data platform building blocks. Standardization ensures that different tools can operate in harmony, offering a flexible, interoperable platform. Imagine a standard for data access management that allows seamless integration across various components. It would enable an access management building block to list data products and grant access uniformly. Simultaneously, it would allow data storage and serving building blocks to integrate their data and permission models, ensuring that any access management solution can be effortlessly composed with them. This creates a flexible ecosystem where data access is consistently managed across different systems. The discovery of data products in a catalog or marketplace can be greatly enhanced by adopting a standard specification for data products. With this standard, each data product can be made discoverable in a generic way. When data catalogs or marketplaces adopt this standard, it provides the flexibility to choose and integrate any catalog or marketplace building block into your platform, fostering a more adaptable and interoperable data ecosystem. A data contract standard allows data products to specify their quality checks, SLOs, and SLAs in a generic format, enabling smooth integration of data quality tools with any data product. It enables you to combine the best solutions for ensuring data reliability across different platforms. Widely accepted standards are key to ensuring interoperability through agreed-upon APIs, SPIs, contracts, and plugin mechanisms. In essence, standards act as the glue that binds a composable data ecosystem. A strong belief in evolutionary architectures At ACA Group, we firmly believe in evolutionary architectures and platform engineering, principles that seamlessly extend to data mesh platforms. It's not about locking yourself into a rigid structure but creating an ecosystem that can evolve, staying at the forefront of innovation. That’s where composability comes in. Do you want a data platform that not only meets your current needs but also paves the way for the challenges and opportunities of tomorrow? Let’s engineer it together Ready to learn more about composability in data mesh solutions? {% module_block module "widget_f1f5c870-47cf-4a61-9810-b273e8d58226" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Contact us now!"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":230950468795,"href":"https://25145356.hs-sites-eu1.com/en/contact","href_with_scheme":null,"type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more
Data strategy
Data strategy
Reading time 5 min
6 MAY 2025

You may well be familiar with the term ‘data mesh’. It is one of those buzzwords to do with data that have been doing the rounds for some time now. Even though data mesh has the potential to bring a lot of value for an organization in quite a few situations, we should not stare ourselves blind on all the fancy terminology. If you are looking to develop a proper data strategy, you do well to start off by asking yourselves the following questions: what is the challenge we are seeking to tackle with data? And how can a solution contribute to achieving our business goals? There is certainly nothing new about organizations using data, but we have come a long way. Initially, companies gathered data from various systems in a data warehouse. The drawback being that the data management was handled by a central team and the turnaround time of reports was likely to seriously run up. Moreover, these data engineers needed to have a solid understanding of the entire business. Over the years that followed, the rise of social media meant the sheer amount of data positively mushroomed, which in turn led to the term Big Data. As a result, tools were developed to analyse huge data volumes, with the focus increasingly shifting towards self-service. The latter trend now means that the business itself is increasingly better able to handle data under their own steam. Which in turn brings yet another new challenge: as is often the case, we are unable to dissociate technology from the processes at the company or from the people that use these data. Are these people ready to start using data? Do they have the right skills and have you thought about the kind of skills you will be needing tomorrow? What are the company’s goals and how can employees contribute towards achieving them? The human aspect is a crucial component of any potent data strategy. How to make the difference with data? In practice, the truth is that, when it comes to their data strategies, a lot of companies have not progressed from where they were a few years ago. Needless to say, this is hardly a robust foundation to move on to the next step. So let’s hone in on some of the key elements in any data strategy: Data need to incite action: it is not enough to just compare a few numbers; a high-quality report leads to a decision or should at the very least make it clear which kind of action is required. Sharing is caring: if you do have data anyway, why not share them? Not just with your own in-house departments, but also with the outside world. If you manage to make data available again to the customer there is a genuine competitive advantage to be had. Visualise: data are often collected in poorly organised tables without proper layout. Studies show the human brain struggles to read these kinds of tables. Visualising data (using GeoMapping for instance) may see you arrive at insights you had not previously thought of. Connect data sets: in the case of data sets, at all times 1+1 needs to equal 3. If you are measuring the efficacy of a marketing campaign, for example, do not just look at the number of clicks. The real added value resides in correlating the data you have with data about the business, such as (increased) sales figures. Make data transparent: be clear about your business goals and KPIs, so everybody in the organization is able to use the data and, in doing so, contribute to meeting a benchmark. Train people: make sure your people understand how to use technology, but also how data are able to simplify their duties and how data contribute to achieving the company goals. Which problem are you seeking to resolve with data? Once you have got the foundations right, we can work up a roadmap. No solution should ever set out from the data themselves, but at all times needs to be linked to a challenge or a goal. This is why ACA Group always organises a workshop first in order to establish what the customer’s goals are. Based on the outcome of this workshop, we come up with concrete problem definition, which sets us on the right track to find a solution for each situation. The integration of data sets will gain even greater importance in the near future, in amongst other things as part of sustainability reporting. In order to prepare and guide companies as best as possible, over the course of this year, we will be digging deeper into some important terminologies, methods and challenges around data with a series of blogs. If in the meantime, are you keen to find out exactly what ‘Data Mesh’ entails, and why this could be rewarding for your organization? {% module_block module "widget_1aee89e6-fefb-47ef-92d6-45fc3014a2b0" %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more
Data lake vs. Data mesh
Data lake vs. Data mesh
Reading time 6 min
6 MAY 2025

In recent years, the exponential growth of data has led to an increasing demand for more effective ways to manage it. Building a data-driven business remains one of the top strategic goals of many business stakeholders. And while it may seem logical for companies to embrace the idea of being data-driven, it’s far more difficult to execute on that idea. Data Mesh and Data Lakes are two important concepts in the world of data architectures that can work together to provide a flexible and scalable approach to data management. Data Lakes have already proven to be a popular solution, but a newer approach, Data Mesh, is gaining attention. This blog will dive into the two concepts and explore how they can complement each other . Data Lakes A data lake is a large and central storage repository that holds massive amounts of data, from various sources, and in various data formats. It can store structured, semi-structured, and unstructured data (e.g. images). Think of it as a huge pool of water, where you can store all sorts of data, such as customer data, transaction data, social media feeds, images, videos and more. It is a cost-effective and accessible solution for companies dealing with large data volumes and various data formats . Additionally, data lakes allow teams to work with raw data , without the need for extensive preprocessing or normalization. Data Mesh Data Mesh is a relatively new concept that takes a decentralized approach to data management. It treats data as a product and is managed by autonomous teams that are responsible for a particular domain. Data Mesh advocates that data should be owned and managed by the people who understand it best - the domain experts - and should be treated as a product. It means that each team is responsible for the data quality, reliability and accessibility of data within its domain. This creates a more scalable and flexible approach to data management, where teams can make decisions about their data independently, without requiring intervention from a centralized data team. How can data lake technology be used in a data mesh approach? In short, Data Mesh is an architecture where data is owned and managed by individual product teams, creating a decentralized approach to data management. A data lake is a technology that provides a centralized storage solution, allowing teams to store and manage large amounts of data without worrying about data structure or format. Decentralization in Data Mesh is about taking ownership of sharing data as products in a decentralized way. It’s not about abandoning centralized storage solutions, such as Data Lakes, but about using them in a way that adheres to the principles of Data Mesh. Data Mesh is all about defining and managing Data Products as a building block to make data easily accessible and reusable for various use cases. Each ‘Data Product’ should be able to provide its data in multiple ways through different output ports . An output port is aimed at making data natively accessible for a specific use case. Example use cases are analytics and reporting, machine learning, real-time processing, etc. As such, multiple types of output ports need corresponding data technologies that enable a specific access mode. One technology that can support a Data Mesh architecture is a data lake. The data in an output port for a data product can be stored in a data lake . This type of output port then receives all the benefits offered by data lake technology. In a Data Mesh architecture, each data product gets its own segment in the data lake (e.g. an S3 Bucket). This segment acts as the output port for the data product, where the team responsible for the data product can write their data to the lake. By segmenting the data lake in this way, teams can manage and secure their own data without worrying about conflicting with other teams. As such, decentralized ownership is made possible, even when using a more centralized storage technology . While a data lake is an important technology for supporting a Data Mesh architecture, it may not be the ideal solution for every use case . Using a data lake as the only type of data storage technology may limit the flexibility of the Data Mesh platform, as it only provides one type of storage. For example, when it comes to business intelligence and reporting, a data warehouse technology with tabular storage may be more suitable. Another example is when time series databases or graph databases are a better option because of the type of data we want to make natively reusable. To make the Data Mesh platform more flexible , it should provide the capability to plug in different types of data storage technology . Each of them is a different type of output port. In this way, each data product can have its own output ports, with different types of data storage technologies, geared towards specific data usage patterns. We have noticed that cloud vendors frequently recommend implementing a Data Mesh solution using one of their existing data lake services . Typically, their approach involves defining security boundaries to separate segments within these services, which can be owned by different domain teams to create various data products. However, the reference architectures they provide only incorporate one storage technology , namely their own data lake technology. Consequently, the resulting Data Mesh platform is less adaptable and tied to a single technology. What is lacking is an explicit ‘Data Product’ abstraction that goes beyond merely enforcing security boundaries and allows for the integration of various data storage technologies and solutions. Conclusion Data management is a critical component of any organization. Various technologies and approaches are available, like data lakes, data warehouses, data vaults, time series databases, graph databases, etc. They all have their unique strengths and limitations. Ultimately, a successful Data Mesh architecture provides the flexibility to share and reuse data with the right technology for the right use case . While a data lake is a powerful tool for managing raw data, it may not be the best solution for all types of data usage. By considering different types of data storage technologies, teams can choose the solution that best meets their specific needs and optimize their data management workflows. By using data products in a Data Mesh, teams can create a flexible and scalable architecture that can adapt to changing data management needs . Want to find out more about Data Mesh or Data Lakes? {% module_block module "widget_9cdc4a9f-7cb9-4bf2-a07a-3fd969809937" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Discover data mesh"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":null,"href":"","href_with_scheme":"","type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more
data mesh
data mesh
Reading time 10 min
6 MAY 2025

Data mesh is revolutionizing the way organizations manage data. Unlike traditional centralized models, data mesh uses a decentralized, domain-oriented structure. But how does governance work in such a distributed system? At ACA Group, we believe data mesh is an answer to the challenge of managing data by focusing on building a decentralized, self-serve data ecosystem. The goal is to embed data-driven innovation within each department or team, making everyone in the organization responsible for creating reusable data that fuels new products and services across departments. In a data mesh, not only the management of ownership and infrastructure is different. The key to success is transforming data governance itself. Instead of making a centralized IT team responsible for data governance, data mesh distributes the responsibility across different teams. This approach, known as "federated computational governance", ensures active participation from both data-producing and data-consuming teams in crafting and adopting governance policies. Four pillars of data mesh and their governance challenges To understand the importance of governance in a data mesh, we need to break down the core principles of a data mesh and how they relate to data governance challenges: 1. Decentralization In a data mesh, data ownership and responsibility are distributed across different business domains or teams. Each domain becomes a self-contained unit, managing its own data products. This also means that each data product and domain is self-governing, but needs to be interoperable with other data products and domains. 2. Domain-oriented approach Instead of a monolithic data warehouse, a data mesh is made up of interconnected data products. This implies that each data product might come with its own “local dialect”. The challenge here is how to speak the same language, without speaking the same language. 3. Data as a product This approach treats data as a product, with each domain creating and maintaining data products that are discoverable, accessible, and reusable. Metadata management becomes an important topic, since metadata is used to discover, access, integrate with and use the data encapsulated within a data product. 4. Self-serve platform This engine and control panel empowers data producers and consumers alike. Developer portals, data catalogs, lineage tools, and collaboration spaces facilitate seamless navigation, while automated policy enforcement and regular audits are used to ensure compliance and promote data product quality without manual intervention. Automation of governance is a core challenge associated with the self-serve platform. Now that you have a better understanding of the central building blocks and challenges of data governance in a data mesh, let’s take a closer look at each of these challenges individually. Federated Governance A standout feature of data mesh is federated governance . But what does it actually mean? “Federated” refers to the fact that while each domain (and data product within those domains) has its own autonomy, they come together to hash out a few things that are relevant and valuable for everyone. You might think of it as a parliamentary democracy, where representatives come together to make joint decisions, which then need to be broadly implemented. This cross-domain collaboration means that quite a few teams are going to be involved. Federated Governance Team This is a group of domain representatives and experts who collaborate across business units and areas of expertise. They ensure data quality, compliance, and alignment with organizational goals.They oversee tasks such as: Automated data quality assessments Data access and privacy management Ensuring data products and datasets can be shared and reused This team defines standardized data governance policies and ensures that data products and datasets can be shared and reused, while safeguarding overall quality. To continue our earlier comparison, the Governance team is like a “parliament” that discusses and passes “laws”. Platform Team This team is essential to automate and enforce the governance policies defined by the Governance Team on the self-serve platform. They ensure that policies can be adopted by Data Products on a low-effort basis, promoting interoperability and collaboration without introducing unnecessary overhead. Domain Teams Aligned with business units, domain teams handle operational data governance within their own domains. Responsibilities include: Data mapping and documentation Ensuring data quality Implementing standards defined by the federated governance team Importantly, each domain team has the autonomy and resources to execute the standards defined by the federated governance team. In summary While local domain teams make decisions specific to their domain, federated data governance ensures global rules are applied to all data products and their interfaces. These rules must ensure a healthy and interoperable ecosystem. How does federated data governance work? Let’s start with an important note: Federated Governance requires a different way of thinking compared to more traditional governance approaches. Federated governance is focused on promoting autonomy and interoperability as much as possible, keeping interference by a centralized team to an absolute minimum. Do you want to successfully implement federated data governance in your organization? Then, make sure you establish the following key foundations: Culture of ownership Teams must feel accountable for their data. This requires a high level of maturity in data literacy, and a willingness to invest in training and continuous education on data management and governance best practices. Robust data infrastructure You need to be ready to invest in scalable and flexible data infrastructure that supports decentralized data management. Governance framework You will need a clear governance framework that defines roles, responsibilities, and processes. This framework should be flexible enough to adapt to the needs of different domains while maintaining overall coherence. Cross-functional collaboration Collaboration between IT, data professionals, and business units is essential. Enterprise ontology: bridging domain-specific language gaps Each domain can have its own specific lingo, creating challenges when terms differ in definition across teams. To bridge the gaps between domains, we need a solid basis for “translation” and a common understanding of terms. This is where the enterprise ontology comes in. What is an enterprise ontology? You can see it as a large, hierarchically structured “dictionary” that links concepts used in different domains to each other based on a common denominator. For example: a sales team and a finance team both use the term “customer”, but the definitions for this term used by each team are somewhat different. The Sales team calls people who have received a quote a customer. The Finance team defines a "customer" as someone with a signed contract and invoicing details. Others are referred to as “prospects”. Without a shared ontology, combining the data products from these teams would yield inconsistent results, highlighting the need for clarity. How an enterprise ontology works By tagging domain-specific terms to a unified concept (e.g., "customer") in the ontology, teams can reconcile differences and enable cross-domain understanding. To bridge the gaps between domain-specific terms: Tag terms to a common ontology : Terms from each domain are linked to a unified concept in the enterprise ontology using tags. For instance, "sales customer" and "finance customer" might both map to a universal "customer" term. Leverage unique identifiers : When consulting the ontology, you might discover that the unique identifier across all “customers” is their email address. Moreover, finding a unique identifier across terms linked to the same concept is valuable, as it allows you to correlate data related to the same term across domains. Metadata: Enabling prevention, validation, and auditing Metadata, often described as "data about data," plays a crucial role in Federated Data Governance within a data mesh. It provides the necessary context to make data understandable, accessible, and usable across different domains. Key roles of metadata in federated data governance Enhancing data discoverability Metadata enables users to easily find and understand data across the organization. It includes practical information such as the data source(s), creation date, format, and usage instructions, but also information specifically linked to discoverability, like which enterprise ontology tags are applicable, who the owner is, or associated data products. This makes it easier for teams to locate (and integrate with) relevant data products. Improving data quality and trust Metadata includes (or should include) data quality metrics and lineage information, helping teams ensure data accuracy and reliability. It allows users to trace data back to its origin, understand transformations it has undergone, and assess its quality. Facilitating compliance and security Metadata helps in maintaining compliance with data privacy and security regulations. The data product team can specify who or which roles can access the data and for what purpose, ensuring accountability and transparency. Furthermore, tagging sensitive data elements helps to automatically apply data privacy and masking policies, ensuring regulatory compliance. Enabling interoperability Metadata ensures that data from different domains can be integrated and used together. Standardized metadata formats and definitions enable seamless data exchange and interoperability. Best practices for metadata management in data mesh In a data mesh, metadata should be managed as close to the source as possible. Each data product team is responsible to carefully author and curate the metadata associated with their data product. Exceptions, like the automated addition of data quality metrics from the self-serve platform, can apply, but the data product itself remains the source of truth, and they should be managed as such. In short, metadata should be decentrally managed, but centrally consumable. Metadata management should be automated as much as reasonably possible and integrated with data governance tools to ensure accuracy and consistency. Key practices include: Careful metadata authoring and curation: Use tools that automatically capture and update metadata. Introduce processes and practices that motivate data product owners to take special care when they create and modify the metadata associated with their data product. The data product owner should ensure that the metadata presented to consumers gives a truthful representation of the content of the data product, so these consumers can make an informed decision about the value of the product for their use case. Standardization: Implement standardized metadata formats and definitions across all domains (where appropriate) to ensure maximal interoperability and ease of use. Automated validation: Define procedures and policies to automatically validate metadata, in order to spot mistakes and inconsistencies early on and prevent error propagation throughout the system. As always, prevention and validation come first, audits second. Regular audits: Conduct regular automated audits to ensure metadata accuracy and compliance with governance policies. The self-serve platform: automating governance The self-serve platform embodies "Federated Computational Governance." It provides tools and infrastructure that allow both users and creators to independently access and manage data products without relying on a central IT team. Key features of a self-serve platform Empowering domain teams: Self-serve platforms enable domain teams to take ownership of their data. They can create, manage, and use data products independently, fostering a sense of accountability. Ensuring compliance: Self-serve platforms integrate governance controls, ensuring that data usage complies with organizational policies and regulations, balancing autonomy with oversight. Metadata management: Through the use of the right tooling, the self-serve platform can facilitate the careful curation and automated validation of metadata. This eases both integration with the self-serve platform and management of metadata within the individual data products. Policy management: Governance policies can be translated to automated processes, which can be enforced through the platform. Automated policy enforcement ensures that data usage complies with internal guidelines and external regulations. Monitoring and auditing: Monitoring and auditing capabilities can be used to track data usage and ensure compliance. Regular audits help identify and address any governance issues. Alerting data product or domain teams of these issues and their consequences allows them to address them in their own way and at their own time. Conclusion: striking the balance between autonomy and oversight Embracing a data mesh architecture requires a different approach to governance. The traditional centralized model of managing data no longer suffices in a world where agility, autonomy, and cross-functional collaboration are paramount. Federated data governance empowers domain teams to take ownership of their data products while ensuring alignment with global organizational standards. By distributing responsibilities across domain teams, supported by a self-serve platform and strong metadata management practices, organizations can enhance data quality, interoperability, and compliance without adding unnecessary complexity. However, the success of data mesh governance depends on fostering a strong culture of data ownership, building a robust self-service platform, and establishing clear frameworks that promote seamless cross-domain collaboration. That’s a lot of buzzwords for one sentence, but it rings true nonetheless: Data ownership holds people accountable for the data they create and maintain, while allowing them to take full control of their data products. Strong infrastructure and a self-service platform is needed to facilitate this practice of ownership, giving data product teams the autonomy they need to put their product out there, while also allowing for collaboration and sharing. Clear governance frameworks are needed to establish what quality looks like and guides data product teams in implementing best practices related to integration, collaboration, and more. The key to thriving in data mesh is a governance model that strikes the right balance between autonomy and oversight—allowing teams to produce while safeguarding the integrity and value of the organization's data ecosystem. Ready to embrace data mesh? Contact us for expert guidance and tailored solutions! {% module_block module "widget_82e3f15a-94e5-4379-b5f9-0c6ba5bd6db7" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Reach out now"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":220624043195,"href":"https://25145356.hs-sites-eu1.com/en/services/data-solutions","href_with_scheme":null,"type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more
Reading time 2 min
6 DEC 2023

Make it concrete for all stakeholders Data Mesh is frequently perceived as highly abstract and theoretical, leaving stakeholders uncertain about its precise implications and potential solutions. Therefore, at ACA Group, we focus on making it as concrete as possible for business stakeholders, technical stakeholders, and other impacted stakeholders in the organization. We recommend simultaneously addressing three key challenges: IDENTIFY BUSINESS VALUE – Define how Data Mesh exactly contributes to the business value by considering data as a product. ORGANIZE TEAMS – Specify the role of every team, team member and persona within the context of Data Mesh. BUILD PLATFORM – Show how data mesh influences the technical architecture. Challenge 1: Identifying the Data Mesh Business Value One of the first challenges in adopting Data Mesh is to explain and prove its business value. At ACA Group, we start by identifying potential data products, domains, and use cases. This process is grounded in business input and results in a data product landscape. An example of an e-commerce company is shown below (boxes are applications, hexagons are data products, colors are owning domains). This landscape serves as a navigation map, inspiring new innovative business ideas and showcasing the value that Data Mesh can bring to the organization. By demonstrating how Data Mesh can enable new possibilities, we clarify its relevance to business stakeholders. Aligning Data Mesh Solutions with Organizational Goals To get the most out of Data Mesh, alignment with the organization's overall goals and strategy is paramount. It's essential to ensure that the investment in technology and process aligns with the broader business objectives. This alignment helps maintain support and momentum, crucial for realizing the success of a Data Mesh initiative. Identifying Data Mesh Opportunities through Game Storming At ACA Group, we apply game storming techniques to discover domains and data products. This process begins with business capabilities and data use cases identified through workshops, such as impact mapping. By aligning Data Mesh with these aspects, we identify a data product landscape from two perspectives: an inventory of available data and potential data products inspires and generates new business ideas, while the desired business impact and goals helps to identify required data and data products. Challenge 2: Organizing Teams and Empowering Individuals Data Mesh is not just about technology; it's about transforming how teams and team members operate within the organization. ACA Group believes in organizing teams effectively to harness the power of Data Mesh. We interact with existing teams and team members, positioning their valuable roles and expertise within a Data Mesh team organization. This typically involves platform teams, domain teams, enabling teams, and a federated governance team. Additionally, we explore the various user journeys and experiences for each persona, ensuring that Data Mesh positively impacts the organization, its people, and their roles. Challenge 3: Building the Technical Architecture as a First-Class Component The technical architecture is a critical aspect of Data Mesh, and ACA Group is committed to making it a tangible reality. We demonstrate how Data Mesh can work in practice by developing a coded and working proof of concept. Leveraging our platform engineering expertise, we bring data products to life, showcasing how Data Mesh can leverage existing data technology while providing a future-proof and flexible architecture tailored to the client's unique context. Conclusion Adopting Data Mesh is a transformative journey for any organization. By breaking down the challenges into actionable steps, as ACA Group does, you can make Data Mesh more tangible, clarify its value, and align it with your organization's goals. These incremental actions serve to demystify Data Mesh, rendering it comprehensible to a wide array of stakeholders and facilitating well-informed decisions. Embracing Data Mesh represents an embrace of the future of data management, with its potential to unlock myriad possibilities for your organization. This journey is about making Data Mesh a practical reality while aligning it with your organizational objectives. 💡 Curious about what else Data Mesh has to offer you? Discover it here ✅

Read more
Reading time 3 min
15 NOV 2023

In the world of data mesh, data products are key. But what exactly is a data product and how do you approach the functional analysis of such a data product? You will discover it in this blog post. Data Product? The basic idea of ​​a data mesh consists of 2 parts: Increasing the accessibility, availability and usability of data for business users. Reducing dependencies between data teams. We base our approach on the principles of domain-oriented ownership, federated computational governance, self-service data platforms and product thinking . Particularly the latter is crucial in understanding and developing a data product. We aim to consider and shape data as a reusable product, enabling it to be utilised in various ways, thus maximising its value. A data product is an autonomous, read-optimised, standardised data unit containing at least one dataset (Domain Dataset), created for satisfying user needs — Majchrzak Jacek , Author at Data Mesh in Action It is a logical unit that encompasses all components required to process and store domain-specific data for various use cases, such as data analytics, and makes it accessible to other teams through 'output ports'. A data product also has its own independent lifecycle and management structures. In essence, you can compare a data product to a microservice, but designed for analytical data. Data products connect to sources through input ports, such as operational systems, data platforms, or other data products, and perform specific operations on the data, such as transformations, calculations, data anonymisation, and more. Developing a data product involves addressing various aspects, including defining input and output ports, data cleaning, transformations, field mapping, GDPR compliance, and so on. Therefore, a thorough analysis is crucial. But, how do you start such a data product analysis? Step-by-step Guide for Data Product Analysis A structured analysis approach ensures the best results. For data products, we rely on the Data Product Canvas , which we pair with a useful checklist . The canvas is a visual representation that simplifies the display of the various critical components of your data product analysis. The checklist ensures that you don't overlook anything. Data Product Canvas With the Data Product Canvas, we ensure a consistent process for designing a data product within an organisation. The canvas succinctly outlines the aspects you need to consider during your analysis. Using this canvas, you can engage various stakeholders of the data product. This collaborative effort leads to the desired result. Based on our experience with this canvas, we recommend filling it out in a specific sequence. It's best to start with specifying the data product. This way, you immediately capture all descriptive data, ensure that all stakeholders are well-informed, and clarify the data product's purpose. Next, move on to the output ports since the data product's stakeholders often have a good understanding of the data they require. You can compare them with end-users of an application who benefit from a good user experience. Afterwards, address the input ports. Based on feedback from data consumers, you will identify the input sources capable of providing the necessary data. Finally, conclude with the data product's design. In this phase, input and output converge, and you aim to come up with a logical approach to transform input into the desired output. Checklist 1. Data Product Specification Make sure to collect all the descriptive data for your data product. This includes: Domain name Data product name Ownership: who manages the data product: The technical team's contact information, The technical team's contact information, The data product's expiration date) Security: Data product security is crucial: it determines how and by whom the data product can be used. Specify the license or 'terms of use' associated with the data product. Update frequency: The frequency determines how often the data product is updated. It's a good idea to also indicate how the updates are performed (e.g., full load, incremental, etc.). Data Consumers: A data product's data consumers are applications or other data products that will use your data product's output. Use cases: Use cases describe your data product's purpose. They provide more information about the data product's needs and rationale. 2. Terminology In many companies and projects, confusion often arises regarding the meaning of certain concepts, terms, or words. Ubiquitous language , a Domain Driven Design principle, seeks to provide a solution by aiming for a vocabulary that is shared and clearly understood by all stakeholders. To avoid speech confusion and interpretation differences when developing and analysing a data product, it's crucial to pay attention to ubiquitous language. Therefore, be sure to include definitions of terms that are relevant within the context of the data product. References to other glossaries, wikis, etc., are also welcome. 3. Output Ports Output ports determine the format and protocol in which data is made available to your data consumers. Discuss with your data consumers (and other stakeholders) the format in which they prefer to consume data. Examples of output port types include analytical data, blob stores, Linked Data, etc. 4. Input Ports Input ports specify the format and method by which source data can be read. By reaching out to the owners of your source systems (or source data products), you can discover the available options. Be sure to mention the type of input port (e.g., API, database, file, etc.), and also note which tables of the source system the required data can be extracted from. It's also very useful to include a visual representation of the source system's domain at this stage. 5. Data Product Design This is the final and perhaps the most critical step in a data product's analysis. In this step, you will think about the logic of transformations within the data product. Consider: Identification of desired input data/fields Identification of desired output data/fields Constraints on data input (e.g. not all information is available in all cases) Data cleaning Data transformations Input field mapping Output field mapping Data enrichment Data anonymisation (e.g., in compliance with GDPR regulations) Data calculations In the practical example below, we address several of the above topics. It illustrates how source fields are mapped and how to handle them effectively. Additionally, certain fields are calculated based on input data (e.g., location). In Conclusion Analysing data products involves many aspects. However, with a well-structured approach, a visual framework, and a simple checklist, you can complete this task with ease.

Read more
Reading time 12 min
10 FEB 2023

Data transformation and generating data from other data are common tasks in software development. Different programming languages have different ways to achieve this, each with their strengths and weaknesses. Depending on the problem, some may be more preferable than others. In this blog, you find simple but powerful methods for generating and transforming data in Python. Before we discuss a more complex case, let’s start with a basic example. Imagine that we own a few stores and each store has its own database with items added by employees. Some fields are optional, which means employees do not always fill out everything. As we grow, it might become difficult to get a clear view of all the items in our stores. Therefore, we develop a Python script that takes the different items from our stores’ databases and collects them in a single unified database. from stores import store_1 , store_2 , store_3 # Typehints are used throughout the code. items_1 : Generator [ Item , None , None ] = store_1 . get_items ( ) items_2 : Generator [ Item , None , None ] = store_2 . get_items ( ) items_3 : Generator [ Item , None , None ] = store_3 . get_items ( ) Generators store_1.get_items() returns a generator of items. Generators will have a key role in this blogpost. With them, we can set up a complex chain of transformations over massive amounts of data without running out of memory, while keeping our code concise and clean. If you are not familiar with Python yet: def a_generator ( ) : for something in some_iterable : # do logic yield something Two things are important here. First, calling a generator will not return any data; it will return an iterator. Second, values are produced on demand. A more in-depth explanation can be found here . Syntax There are two ways to create generators. The first looks like a normal Python function, but has a yield statement instead of a return statement. The other is more concise but can quickly become convoluted as the logic gets more complex. It’s called the Python generator expression syntax and is mainly used for simpler generators. # Basic generator syntax def generate_until ( n : int ) - Generator [ int , None , None ] : while i n ; yield i i += 1 # Generator expression syntax gen_until_5 : Generator [ int , None , None ] = ( i for i in range ( 5 ) ) Code To keep it simple, we run the script once at the end of the day, leaving us with a complete database with all items from all stores. from stores import store_1 , store_2 , store_3 from database import all_items # Typehints are used throughout the code. items_1 : Generator [ Item , None , None ] = store_1 . get_items ( ) items_2 : Generator [ Item , None , None ] = store_2 . get_items ( ) items_3 : Generator [ Item , None , None ] = store_3 . get_items ( ) # Let's assume our `add_or_update()` function accepts generators. # If an Item already exists it updates it else it adds it to the database. # We can just add them one by one like here. all_items . add_or_update ( items_1 ) all_items . add_or_update ( items_2 ) all_items . add_or_update ( items_3 ) # The database now contains all the latest items from all the stores. For this use case, this is perfectly fine. But when the complexity grows and more stores are added, it can quickly become cluttered. Fortunately, Python has great built-in tools to simplify our code. Itertools One module in Python is called itertools. According to the Python docs, “ the module standardizes a core set of fast, memory-efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra”, making it possible to construct specialized tools succinctly and efficiently in pure Python. ” A great function is itertools.chain() . This is used to ‘chain’ together multiple iterables as if they are one. We can use it to chain our generators together. from stores import store_1 , store_2 , store_3 from database import all_items from itertools import chain # Typehints are used throughout the code. items_1 : Generator [ Item , None , None ] = store_1 . get_items ( ) items_2 : Generator [ Item , None , None ] = store_2 . get_items ( ) items_3 : Generator [ Item , None , None ] = store_3 . get_items ( ) # Using itertools.chain we can add the generators together into one. # Chain itself is also a generator function so no data will be generated yet. items : Generator [ Item , None , None ] = chain ( items_1 , items_2 , items_3 ) all_items . add_or_update ( items ) # - data will be generated here # The database now contains all the latest items from all the stores Genertator Functions Now let’s assume that our item is a tuple with five fields: name, brand, supplier, cost, and the number of pieces in the store. It has the following signature: tuple[str,str,str,int,int]. If we want the total value of the items in the store, we simply need to multiply the number of articles by the cost. # both receives and returns a generator def calc_total_val ( items : Generator ) - Generator : for item in items : # yield the first 3 items and the product of the last 2 yield * item [ : 3 ] , item [ 3 ] * item [ 4 ] # we can also write is as a generator expression since it's so simple ( ( * item [ : 3 ] , item [ 3 ] * item [ 4 ] ) for item in items ) Now it looks like this: tuple[str, str, str, int]. But we want to output it as JSON. For that, we can just create a generator that returns a dictionary and call json.dumps() on it. Let’s assume that we can pass an iterator of dicts to the add_or_update() function and that it automatically calls json.dumps(). # both receives and returns a generator def as_dict_item ( items : Generator ) - Generator : for item in items : yield { "name" : item [ 0 ] , "brand" : item [ 1 ] , "supplier" : item [ 2 ] , "total_value" : item [ 3 ] , } Now we have more logic, let’s see how we can put it together. One great thing about generators is how clear and concise it is to use them. We can create a function for each process step and run the data through it. from stores import store_1 , store_2 , store_3 from database import all_items from itertools import chain def calc_total_val ( items ) : for item in items : yield * item [ : 3 ] , item [ 3 ] * item [ 4 ] def as_item_dict ( items ) : for item in items : yield { "name" : item [ 0 ] , "brand" : item [ 1 ] , "supplier" : item [ 2 ] , "total_value" : item [ 3 ] , } items_1 = store_1 . get_items ( ) items_2 = store_2 . get_items ( ) items_3 = store_3 . get_items ( ) items = chain ( items_1 , items_2 , items_3 ) # - make one big iterable items = calc_total_val ( items ) # - calc the total value items = as_item_dict ( items ) # - transform it into a dict all_items . add_or_update ( items ) # - data will be generated here # The database now contains all the latest items from all the stores To show the steps that we have taken, I split everything up. There are still some things that could be improved. Take a look at the function calc_total_val(). This is a perfect example of a situation where a generator expression can be used. from stores import store_1 , store_2 , store_3 from database import all_items from itertools import chain def as_item_dict ( items ) : for item in items : yield { "name" : item [ 0 ] , "brand" : item [ 1 ] , "supplier" : item [ 2 ] , "total_value" : item [ 3 ] , } items_1 = store_1 . get_items ( ) items_2 = store_2 . get_items ( ) items_3 = store_3 . get_items ( ) items = chain ( items_1 , items_2 , items_3 ) items = ( ( * item [ : 3 ] , item [ 3 ] * item [ 4 ] ) for item in items ) items = as_item_dict ( items ) all_items . add_or_update ( items ) To make it even cleaner, we can put all of our functions into a separate module. In this way, our main file only contains the steps the data goes through. If we use descriptive names for our generators, we can immediately see what the code will do. So now we have created a pipeline for the data. While this is only a simple example, it can also be used for more complicated workflows. Data products Everything we did in the example above can be easily applied to a Data Product. If you are not familiar with data products, here is a great text on data meshes . Imagine that we have a data product that does some data aggregation. It has multiple inputs with different kinds of data. Each of those inputs needs to be filtered, transformed and cleaned before we can aggregate them into one output. The client requires the output to be a single JSON file stored in an S3 bucket. The existing infrastructure only allows for 500 Mb of RAM for the containers. Now let’s load all the data, do some transformations, aggregate everything, and parse it into a JSON file. from input_ports import port_1 , port_2 from output_ports import S3_port from json import dumps data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) output = [ ] for row in data_port_1 : # do some transformation or filtering here output . append ( row ) for row in data_port_2 : # do some transformation or filtering here output . append ( row ) S3_port . save ( dumps ( output ) ) While this looks like an excellent solution that does the job and is easy to understand, our container suddenly crashes due to an OutOfMemory error. After some local testing on our machine, we see that it has produced an 834Mb file that cannot work with only 500 MB of RAM for the container. The problem with the code above is that we keep everything in a list first, so all is saved in memory. Solution Let’s give it another try. For S3, we can use MultipartUpload. This means we do not need to keep the entire file in memory. And of course, we should replace our lists by generators. from input_ports import port_1 , port_2 from output_ports import S3_port from itertools import chain from json import dumps data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) def port_1_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row def port_2_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row output = chain ( port_1_transformer ( data_port_1 ) , port_2_transformer ( data_port_2 ) ) for part in output : S3_port . save_part ( dumps ( part ) ) Since we now only have one item in memory at a time, this uses dramatically less memory than the earlier solution with almost no extra work. However, sending a post request to S3 for each item might be a bit much. Especially if we have 300,000 items. But there is another issue … The ‘part size’ should be between 5MiB and 5GiB. To fix this, we can group multiple parts before we parse them. But if we group too many, we will once again reach the memory limit. The chunk size should therefore depend on how large the individual parts of your data are. To demonstrate this, let’s use a size of 1,000. The larger the chunk size, the more memory is used but the fewer requests to S3. So we prefer our chunks to be as large as possible without running out of memory. from input_ports import port_1 , port_2 from output_ports import S3_port from itertools import chain from json import dumps data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) def makebatch ( iterable , len ) : for first in iterable : yield chain ( [ first ] , islice ( iterable , len - 1 ) ) def port_1_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row def port_2_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row output = chain ( port_1_transformer ( data_port_1 ) , port_2_transformer ( data_port_2 ) ) for chunk in makebatch ( output , 1000 ) : S3_port . save_part ( dumps ( chunk ) ) This is all that needs to happen. It is enough to transform vast amounts of data and save them in an S3 bucket, even when resources are scarce. Bonus When your calculations are compute-intensive, running them in parallel is easy. With just a few extra lines, we can run our transformers with multiple cores. from multiprocessing . pool import Pool with Pool ( 4 ) as pool : # imap_unordered could also be used if the order is not important data_1 = pool . imap ( port_1_transformer , data_port_1 , chunksize = 500 ) data_2 = pool . imap ( port_2_transformer , data_port_2 , chunksize = 500 ) output = chain ( data_1 , data_2 ) The best part about this? We do not have to change anything else as imap can be iterated to get results, just like any other generator. Now let’s throw it all together. This is all we need for compute-intensive transformations, over large amounts of data, using multiple cores. from input_ports import port_1 , port_2 from output_ports import S3_port from itertools import chain from json import dumps from multiprocessing . pool import Pool data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) def makebatch ( iterable , len ) : for first in iterable : yield chain ( [ first ] , islice ( iterable , len - 1 ) ) def port_1_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row def port_2_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row with Pool ( 4 ) as pool : # imap_unordered could also be used if the order is not important data_1 = pool . imap ( port_1_transformer , data_port_1 , chunksize = 500 ) data_2 = pool . imap ( port_2_transformer , data_port_2 , chunksize = 500 ) output = chain ( data_1 , data_2 ) for chunk in makebatch ( output , 1000 ) : S3_port . save_part ( dumps ( chunk ) ) Conclusion Generators are often misunderstood by new developers, but they can be an excellent tool. Whether for a simple transformation or something more advanced such as a data product, Python is a great choice because of its ease of use and the abundance of tools available within the standard library.

Read more