We learn & share

ACA Group Blog

Read more about our thoughts, views, and opinions on various topics, important announcements, useful insights, and advice from our experts.

Featured

8 MAY 2025
Reading time 5 min

In the ever-evolving landscape of data management, investing in platforms and navigating migrations between them is a recurring theme in many data strategies. How can we ensure that these investments remain relevant and can evolve over time, avoiding endless migration projects? The answer lies in embracing ‘Composability’ - a key principle for designing robust, future-proof data (mesh) platforms. Is there a silver bullet we can buy off-the-shelf? The data-solution market is flooded with data vendor tools positioning themselves as the platform for everything, as the all-in-one silver bullet. It's important to know that there is no silver bullet. While opting for a single off-the-shelf platform might seem like a quick and easy solution at first, it can lead to problems down the line. These monolithic off-the-shelf platforms often end up inflexible to support all use cases, not customizable enough, and eventually become outdated.This results in big complicated migration projects to the next silver bullet platform, and organizations ending up with multiple all-in-one platforms, causing disruptions in day-to-day operations and hindering overall progress. Flexibility is key to your data mesh platform architecture A complete data platform must address numerous aspects: data storage, query engines, security, data access, discovery, observability, governance, developer experience, automation, a marketplace, data quality, etc. Some vendors claim their all-in-one data solution can tackle all of these. However, typically such a platform excels in certain aspects, but falls short in others. For example, a platform might offer a high-end query engine, but lack depth in features of the data marketplace included in their solution. To future-proof your platform, it must incorporate the best tools for each aspect and evolve as new technologies emerge. Today's cutting-edge solutions can be outdated tomorrow, so flexibility and evolvability are essential for your data mesh platform architecture. Embrace composability: Engineer your future Rather than locking into one single tool, aim to build a platform with composability at its core. Picture a platform where different technologies and tools can be seamlessly integrated, replaced, or evolved, with an integrated and automated self-service experience on top. A platform that is both generic at its core and flexible enough to accommodate the ever-changing landscape of data solutions and requirements. A platform with a long-term return on investment by allowing you to expand capabilities incrementally, avoiding costly, large-scale migrations. Composability enables you to continually adapt your platform capabilities by adding new technologies under the umbrella of one stable core platform layer. Two key ingredients of composability Building blocks: These are the individual components that make up your platform. Interoperability: All building blocks must work together seamlessly to create a cohesive system. An ecosystem of building blocks When building composable data platforms, the key lies in sourcing the right building blocks. But where do we get these? Traditional monolithic data platforms aim to solve all problems in one package, but this stifles the flexibility that composability demands. Instead, vendors should focus on decomposing these platforms into specialized, cost-effective components that excel at addressing specific challenges. By offering targeted solutions as building blocks, they empower organizations to assemble a data platform tailored to their unique needs. In addition to vendor solutions, open-source data technologies also offer a wealth of building blocks. It should be possible to combine both vendor-specific and open-source tools into a data platform tailored to your needs. This approach enhances agility, fosters innovation, and allows for continuous evolution by integrating the latest and most relevant technologies. Standardization as glue between building blocks To create a truly composable ecosystem, the building blocks must be able to work together, i.e. interoperability. This is where standards come into play, enabling seamless integration between data platform building blocks. Standardization ensures that different tools can operate in harmony, offering a flexible, interoperable platform. Imagine a standard for data access management that allows seamless integration across various components. It would enable an access management building block to list data products and grant access uniformly. Simultaneously, it would allow data storage and serving building blocks to integrate their data and permission models, ensuring that any access management solution can be effortlessly composed with them. This creates a flexible ecosystem where data access is consistently managed across different systems. The discovery of data products in a catalog or marketplace can be greatly enhanced by adopting a standard specification for data products. With this standard, each data product can be made discoverable in a generic way. When data catalogs or marketplaces adopt this standard, it provides the flexibility to choose and integrate any catalog or marketplace building block into your platform, fostering a more adaptable and interoperable data ecosystem. A data contract standard allows data products to specify their quality checks, SLOs, and SLAs in a generic format, enabling smooth integration of data quality tools with any data product. It enables you to combine the best solutions for ensuring data reliability across different platforms. Widely accepted standards are key to ensuring interoperability through agreed-upon APIs, SPIs, contracts, and plugin mechanisms. In essence, standards act as the glue that binds a composable data ecosystem. A strong belief in evolutionary architectures At ACA Group, we firmly believe in evolutionary architectures and platform engineering, principles that seamlessly extend to data mesh platforms. It's not about locking yourself into a rigid structure but creating an ecosystem that can evolve, staying at the forefront of innovation. That’s where composability comes in. Do you want a data platform that not only meets your current needs but also paves the way for the challenges and opportunities of tomorrow? Let’s engineer it together Ready to learn more about composability in data mesh solutions? {% module_block module "widget_f1f5c870-47cf-4a61-9810-b273e8d58226" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Contact us now!"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":230950468795,"href":"https://25145356.hs-sites-eu1.com/en/contact","href_with_scheme":null,"type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more
We learn & share

ACA Group Blog

Read more about our thoughts, views, and opinions on various topics, important announcements, useful insights, and advice from our experts.

Featured

8 MAY 2025
Reading time 5 min

In the ever-evolving landscape of data management, investing in platforms and navigating migrations between them is a recurring theme in many data strategies. How can we ensure that these investments remain relevant and can evolve over time, avoiding endless migration projects? The answer lies in embracing ‘Composability’ - a key principle for designing robust, future-proof data (mesh) platforms. Is there a silver bullet we can buy off-the-shelf? The data-solution market is flooded with data vendor tools positioning themselves as the platform for everything, as the all-in-one silver bullet. It's important to know that there is no silver bullet. While opting for a single off-the-shelf platform might seem like a quick and easy solution at first, it can lead to problems down the line. These monolithic off-the-shelf platforms often end up inflexible to support all use cases, not customizable enough, and eventually become outdated.This results in big complicated migration projects to the next silver bullet platform, and organizations ending up with multiple all-in-one platforms, causing disruptions in day-to-day operations and hindering overall progress. Flexibility is key to your data mesh platform architecture A complete data platform must address numerous aspects: data storage, query engines, security, data access, discovery, observability, governance, developer experience, automation, a marketplace, data quality, etc. Some vendors claim their all-in-one data solution can tackle all of these. However, typically such a platform excels in certain aspects, but falls short in others. For example, a platform might offer a high-end query engine, but lack depth in features of the data marketplace included in their solution. To future-proof your platform, it must incorporate the best tools for each aspect and evolve as new technologies emerge. Today's cutting-edge solutions can be outdated tomorrow, so flexibility and evolvability are essential for your data mesh platform architecture. Embrace composability: Engineer your future Rather than locking into one single tool, aim to build a platform with composability at its core. Picture a platform where different technologies and tools can be seamlessly integrated, replaced, or evolved, with an integrated and automated self-service experience on top. A platform that is both generic at its core and flexible enough to accommodate the ever-changing landscape of data solutions and requirements. A platform with a long-term return on investment by allowing you to expand capabilities incrementally, avoiding costly, large-scale migrations. Composability enables you to continually adapt your platform capabilities by adding new technologies under the umbrella of one stable core platform layer. Two key ingredients of composability Building blocks: These are the individual components that make up your platform. Interoperability: All building blocks must work together seamlessly to create a cohesive system. An ecosystem of building blocks When building composable data platforms, the key lies in sourcing the right building blocks. But where do we get these? Traditional monolithic data platforms aim to solve all problems in one package, but this stifles the flexibility that composability demands. Instead, vendors should focus on decomposing these platforms into specialized, cost-effective components that excel at addressing specific challenges. By offering targeted solutions as building blocks, they empower organizations to assemble a data platform tailored to their unique needs. In addition to vendor solutions, open-source data technologies also offer a wealth of building blocks. It should be possible to combine both vendor-specific and open-source tools into a data platform tailored to your needs. This approach enhances agility, fosters innovation, and allows for continuous evolution by integrating the latest and most relevant technologies. Standardization as glue between building blocks To create a truly composable ecosystem, the building blocks must be able to work together, i.e. interoperability. This is where standards come into play, enabling seamless integration between data platform building blocks. Standardization ensures that different tools can operate in harmony, offering a flexible, interoperable platform. Imagine a standard for data access management that allows seamless integration across various components. It would enable an access management building block to list data products and grant access uniformly. Simultaneously, it would allow data storage and serving building blocks to integrate their data and permission models, ensuring that any access management solution can be effortlessly composed with them. This creates a flexible ecosystem where data access is consistently managed across different systems. The discovery of data products in a catalog or marketplace can be greatly enhanced by adopting a standard specification for data products. With this standard, each data product can be made discoverable in a generic way. When data catalogs or marketplaces adopt this standard, it provides the flexibility to choose and integrate any catalog or marketplace building block into your platform, fostering a more adaptable and interoperable data ecosystem. A data contract standard allows data products to specify their quality checks, SLOs, and SLAs in a generic format, enabling smooth integration of data quality tools with any data product. It enables you to combine the best solutions for ensuring data reliability across different platforms. Widely accepted standards are key to ensuring interoperability through agreed-upon APIs, SPIs, contracts, and plugin mechanisms. In essence, standards act as the glue that binds a composable data ecosystem. A strong belief in evolutionary architectures At ACA Group, we firmly believe in evolutionary architectures and platform engineering, principles that seamlessly extend to data mesh platforms. It's not about locking yourself into a rigid structure but creating an ecosystem that can evolve, staying at the forefront of innovation. That’s where composability comes in. Do you want a data platform that not only meets your current needs but also paves the way for the challenges and opportunities of tomorrow? Let’s engineer it together Ready to learn more about composability in data mesh solutions? {% module_block module "widget_f1f5c870-47cf-4a61-9810-b273e8d58226" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Contact us now!"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":230950468795,"href":"https://25145356.hs-sites-eu1.com/en/contact","href_with_scheme":null,"type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more

All blog posts

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Lets' talk!

We'd love to talk to you!

Contact us and we'll get you connected with the expert you deserve!

Reading time 12 min
10 FEB 2023

Data transformation and generating data from other data are common tasks in software development. Different programming languages have different ways to achieve this, each with their strengths and weaknesses. Depending on the problem, some may be more preferable than others. In this blog, you find simple but powerful methods for generating and transforming data in Python. Before we discuss a more complex case, let’s start with a basic example. Imagine that we own a few stores and each store has its own database with items added by employees. Some fields are optional, which means employees do not always fill out everything. As we grow, it might become difficult to get a clear view of all the items in our stores. Therefore, we develop a Python script that takes the different items from our stores’ databases and collects them in a single unified database. from stores import store_1 , store_2 , store_3 # Typehints are used throughout the code. items_1 : Generator [ Item , None , None ] = store_1 . get_items ( ) items_2 : Generator [ Item , None , None ] = store_2 . get_items ( ) items_3 : Generator [ Item , None , None ] = store_3 . get_items ( ) Generators store_1.get_items() returns a generator of items. Generators will have a key role in this blogpost. With them, we can set up a complex chain of transformations over massive amounts of data without running out of memory, while keeping our code concise and clean. If you are not familiar with Python yet: def a_generator ( ) : for something in some_iterable : # do logic yield something Two things are important here. First, calling a generator will not return any data; it will return an iterator. Second, values are produced on demand. A more in-depth explanation can be found here . Syntax There are two ways to create generators. The first looks like a normal Python function, but has a yield statement instead of a return statement. The other is more concise but can quickly become convoluted as the logic gets more complex. It’s called the Python generator expression syntax and is mainly used for simpler generators. # Basic generator syntax def generate_until ( n : int ) - Generator [ int , None , None ] : while i n ; yield i i += 1 # Generator expression syntax gen_until_5 : Generator [ int , None , None ] = ( i for i in range ( 5 ) ) Code To keep it simple, we run the script once at the end of the day, leaving us with a complete database with all items from all stores. from stores import store_1 , store_2 , store_3 from database import all_items # Typehints are used throughout the code. items_1 : Generator [ Item , None , None ] = store_1 . get_items ( ) items_2 : Generator [ Item , None , None ] = store_2 . get_items ( ) items_3 : Generator [ Item , None , None ] = store_3 . get_items ( ) # Let's assume our `add_or_update()` function accepts generators. # If an Item already exists it updates it else it adds it to the database. # We can just add them one by one like here. all_items . add_or_update ( items_1 ) all_items . add_or_update ( items_2 ) all_items . add_or_update ( items_3 ) # The database now contains all the latest items from all the stores. For this use case, this is perfectly fine. But when the complexity grows and more stores are added, it can quickly become cluttered. Fortunately, Python has great built-in tools to simplify our code. Itertools One module in Python is called itertools. According to the Python docs, “ the module standardizes a core set of fast, memory-efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra”, making it possible to construct specialized tools succinctly and efficiently in pure Python. ” A great function is itertools.chain() . This is used to ‘chain’ together multiple iterables as if they are one. We can use it to chain our generators together. from stores import store_1 , store_2 , store_3 from database import all_items from itertools import chain # Typehints are used throughout the code. items_1 : Generator [ Item , None , None ] = store_1 . get_items ( ) items_2 : Generator [ Item , None , None ] = store_2 . get_items ( ) items_3 : Generator [ Item , None , None ] = store_3 . get_items ( ) # Using itertools.chain we can add the generators together into one. # Chain itself is also a generator function so no data will be generated yet. items : Generator [ Item , None , None ] = chain ( items_1 , items_2 , items_3 ) all_items . add_or_update ( items ) # - data will be generated here # The database now contains all the latest items from all the stores Genertator Functions Now let’s assume that our item is a tuple with five fields: name, brand, supplier, cost, and the number of pieces in the store. It has the following signature: tuple[str,str,str,int,int]. If we want the total value of the items in the store, we simply need to multiply the number of articles by the cost. # both receives and returns a generator def calc_total_val ( items : Generator ) - Generator : for item in items : # yield the first 3 items and the product of the last 2 yield * item [ : 3 ] , item [ 3 ] * item [ 4 ] # we can also write is as a generator expression since it's so simple ( ( * item [ : 3 ] , item [ 3 ] * item [ 4 ] ) for item in items ) Now it looks like this: tuple[str, str, str, int]. But we want to output it as JSON. For that, we can just create a generator that returns a dictionary and call json.dumps() on it. Let’s assume that we can pass an iterator of dicts to the add_or_update() function and that it automatically calls json.dumps(). # both receives and returns a generator def as_dict_item ( items : Generator ) - Generator : for item in items : yield { "name" : item [ 0 ] , "brand" : item [ 1 ] , "supplier" : item [ 2 ] , "total_value" : item [ 3 ] , } Now we have more logic, let’s see how we can put it together. One great thing about generators is how clear and concise it is to use them. We can create a function for each process step and run the data through it. from stores import store_1 , store_2 , store_3 from database import all_items from itertools import chain def calc_total_val ( items ) : for item in items : yield * item [ : 3 ] , item [ 3 ] * item [ 4 ] def as_item_dict ( items ) : for item in items : yield { "name" : item [ 0 ] , "brand" : item [ 1 ] , "supplier" : item [ 2 ] , "total_value" : item [ 3 ] , } items_1 = store_1 . get_items ( ) items_2 = store_2 . get_items ( ) items_3 = store_3 . get_items ( ) items = chain ( items_1 , items_2 , items_3 ) # - make one big iterable items = calc_total_val ( items ) # - calc the total value items = as_item_dict ( items ) # - transform it into a dict all_items . add_or_update ( items ) # - data will be generated here # The database now contains all the latest items from all the stores To show the steps that we have taken, I split everything up. There are still some things that could be improved. Take a look at the function calc_total_val(). This is a perfect example of a situation where a generator expression can be used. from stores import store_1 , store_2 , store_3 from database import all_items from itertools import chain def as_item_dict ( items ) : for item in items : yield { "name" : item [ 0 ] , "brand" : item [ 1 ] , "supplier" : item [ 2 ] , "total_value" : item [ 3 ] , } items_1 = store_1 . get_items ( ) items_2 = store_2 . get_items ( ) items_3 = store_3 . get_items ( ) items = chain ( items_1 , items_2 , items_3 ) items = ( ( * item [ : 3 ] , item [ 3 ] * item [ 4 ] ) for item in items ) items = as_item_dict ( items ) all_items . add_or_update ( items ) To make it even cleaner, we can put all of our functions into a separate module. In this way, our main file only contains the steps the data goes through. If we use descriptive names for our generators, we can immediately see what the code will do. So now we have created a pipeline for the data. While this is only a simple example, it can also be used for more complicated workflows. Data products Everything we did in the example above can be easily applied to a Data Product. If you are not familiar with data products, here is a great text on data meshes . Imagine that we have a data product that does some data aggregation. It has multiple inputs with different kinds of data. Each of those inputs needs to be filtered, transformed and cleaned before we can aggregate them into one output. The client requires the output to be a single JSON file stored in an S3 bucket. The existing infrastructure only allows for 500 Mb of RAM for the containers. Now let’s load all the data, do some transformations, aggregate everything, and parse it into a JSON file. from input_ports import port_1 , port_2 from output_ports import S3_port from json import dumps data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) output = [ ] for row in data_port_1 : # do some transformation or filtering here output . append ( row ) for row in data_port_2 : # do some transformation or filtering here output . append ( row ) S3_port . save ( dumps ( output ) ) While this looks like an excellent solution that does the job and is easy to understand, our container suddenly crashes due to an OutOfMemory error. After some local testing on our machine, we see that it has produced an 834Mb file that cannot work with only 500 MB of RAM for the container. The problem with the code above is that we keep everything in a list first, so all is saved in memory. Solution Let’s give it another try. For S3, we can use MultipartUpload. This means we do not need to keep the entire file in memory. And of course, we should replace our lists by generators. from input_ports import port_1 , port_2 from output_ports import S3_port from itertools import chain from json import dumps data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) def port_1_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row def port_2_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row output = chain ( port_1_transformer ( data_port_1 ) , port_2_transformer ( data_port_2 ) ) for part in output : S3_port . save_part ( dumps ( part ) ) Since we now only have one item in memory at a time, this uses dramatically less memory than the earlier solution with almost no extra work. However, sending a post request to S3 for each item might be a bit much. Especially if we have 300,000 items. But there is another issue … The ‘part size’ should be between 5MiB and 5GiB. To fix this, we can group multiple parts before we parse them. But if we group too many, we will once again reach the memory limit. The chunk size should therefore depend on how large the individual parts of your data are. To demonstrate this, let’s use a size of 1,000. The larger the chunk size, the more memory is used but the fewer requests to S3. So we prefer our chunks to be as large as possible without running out of memory. from input_ports import port_1 , port_2 from output_ports import S3_port from itertools import chain from json import dumps data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) def makebatch ( iterable , len ) : for first in iterable : yield chain ( [ first ] , islice ( iterable , len - 1 ) ) def port_1_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row def port_2_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row output = chain ( port_1_transformer ( data_port_1 ) , port_2_transformer ( data_port_2 ) ) for chunk in makebatch ( output , 1000 ) : S3_port . save_part ( dumps ( chunk ) ) This is all that needs to happen. It is enough to transform vast amounts of data and save them in an S3 bucket, even when resources are scarce. Bonus When your calculations are compute-intensive, running them in parallel is easy. With just a few extra lines, we can run our transformers with multiple cores. from multiprocessing . pool import Pool with Pool ( 4 ) as pool : # imap_unordered could also be used if the order is not important data_1 = pool . imap ( port_1_transformer , data_port_1 , chunksize = 500 ) data_2 = pool . imap ( port_2_transformer , data_port_2 , chunksize = 500 ) output = chain ( data_1 , data_2 ) The best part about this? We do not have to change anything else as imap can be iterated to get results, just like any other generator. Now let’s throw it all together. This is all we need for compute-intensive transformations, over large amounts of data, using multiple cores. from input_ports import port_1 , port_2 from output_ports import S3_port from itertools import chain from json import dumps from multiprocessing . pool import Pool data_port_1 : Generator = port_1 . get_data ( ) data_port_2 : Generator = port_2 . get_data ( ) def makebatch ( iterable , len ) : for first in iterable : yield chain ( [ first ] , islice ( iterable , len - 1 ) ) def port_1_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row def port_2_transformer ( data : Generator ) : for row in data : # do some transformation or filtering here yield row with Pool ( 4 ) as pool : # imap_unordered could also be used if the order is not important data_1 = pool . imap ( port_1_transformer , data_port_1 , chunksize = 500 ) data_2 = pool . imap ( port_2_transformer , data_port_2 , chunksize = 500 ) output = chain ( data_1 , data_2 ) for chunk in makebatch ( output , 1000 ) : S3_port . save_part ( dumps ( chunk ) ) Conclusion Generators are often misunderstood by new developers, but they can be an excellent tool. Whether for a simple transformation or something more advanced such as a data product, Python is a great choice because of its ease of use and the abundance of tools available within the standard library.

Read more
Reading time 4 min
16 MAY 2022

Python and NodeJS are both great solutions for building modern web applications. But which of the two should you use? In this blog, we explain their most important characteristics, allowing you to choose what is best for your project and team. In a previous blog , we already explained the difference between front-end and back-end frameworks. From a developer’s and architect’s perspective, the logic of a program is shared between front-end and back-end . That way, the responsibility is properly divided, while development, long-term support and adding new features to the application becomes easier and more maintainable. It encourages good collaboration between the team members who are working on the project. If you have ever developed a large application using plain JavaScript or jQuery and AJAX, there is an inherent messiness that comes with such a scale in that context. Using front-end frameworks keeps your code structured and clean on the front-end side and speeds up your application development , given that you are familiar with the framework you are using. Since React or VueJS, or any front-end framework, requires a RESTful API to interact with in order to make a web application dynamic and personalized, creating the backend API with something like FastAPI or Flask if you’re Python-versed, or NodeJS if you’re a JavaScript fan, is a great way of approaching that separation of concerns we’ve talked about in our other blog . Python or NodeJS? The server-side or back end of an application is the very backbone of the project, and the client-side depends on it. However beautiful, performant and responsive your frontend is, the application will fall flat on its face without a robust backend to hold it up. This leads to an inevitable choice for every architect to pick one between the two titans of back-end technology — NodeJS vs Python. Choosing between those two can be a daunting task, especially not knowing what specific scenarios they are tailored for. Depending on the scale of the project and the team you have at hand, choosing the “ideal” back-end technology may not be essential. Just choose what is ideal for your development team that will encourage collaboration, maintainability, code quality and speed of development. If you have a great Python team, leverage their skills by making them build the backend in combination with a front-end framework or a templating engine (such as Jinja2) if the frontend is simple enough. However, if you have a team of great full stack developers, leverage their experience and knowledge of JavaScript and have a unified code stack with NodeJS. If you are still considering both options and want to choose what is best for your project, then below you find a list of things that either Python or NodeJS are great at. The strengths of Python Python is a tried and tested programming language that has existed for a long time. It has been heavily optimized in the last few years and now offers very good performance. Inherent code clarity makes for an easily maintainable codebase and facilitates collaboration between team members. Debugging is simpler than in JavaScript, which further encourages productivity. It has tons of existing libraries, integrations, and well-written documentation to supplement back-end functionality without reinventing the wheel. Python is THE programming language for Big Data, Machine Learning, AI and many research-based or scientific processing tasks that would be harder to implement using JS. Some back-end solutions like FastAPI come with cloud-native solutions like Jina. The strengths of NodeJS A unified stack project is usually a good idea if you have experienced developers in the stack you are targeting. A Python-only web application has drawbacks and doesn’t scale well without a lot of hand-crafted optimizations. So if you’re going for a single stack, a JavaScript frontend framework + NodeJS is the better choice. While code maintainability can be an issue for large-scale projects, smaller projects or proof-of-concepts can be developed faster than with Python. It is inherently event-driven and has a non-blocking I/O model which makes it an ideal option for developing data-streaming real-time applications such as messaging applications or snappy e-commerce websites. NodeJS handles input and output streams more gracefully than Python, and is great at displaying and reacting to streams of data. When dealing with a lot of user interactions and “snappy” or near-instantaneous responses from the UI are desirable, NodeJS is a better bet than Python. Conclusion Python offers an amazing solution for back-end services built on a mature programming language. Using it as a full-stack, Python-only solution with templating is nonetheless restrictive. Leveraging Python through FastAPI or Flask, for example, as a backend web-server and combining it with a front-end framework, such as React JS or Vue JS, is a very good option for building a modern web application. It comes with all the necessary tools for building great applications that need to leverage machine learning, AI, data analysis and is overall a great choice if you need to take into account complex business rules that are applied to back-end logic. NodeJS is also fantastic and getting better by the day. It uses an event-driven, non-blocking I/O model which makes it an ideal option for developing data-intensive real-time applications. In general, it offers greater performance and speed than Python-based apps, but implementing complex business logic is harder than in Python. It is an ideal solution for developing messaging or chatting applications, or any sort of data-streaming applications. It is also great for e-commerce apps and websites that depend on the speed of processing user requests and responding in a “snappy” manner.

Read more
Why we chose Python (too) – Python series pt. 1
Reading time 3 min
30 JUN 2021

Why is Python one of the fastest-growing programming languages in the world? And why did we choose Python as one of our programming languages? To answer those questions, some of our experts on the matter will guide you through our story in the following blog posts in this series: Why we chose Python (too) How companies use Python in the real world Why Python is so popular in innovation The right solution at the right time ACA Group’s tradition and heart rests in customer-centricity and innovation. That's why we don't limit ourselves to just one programming language or one specific technology or solution. We research the market, experiment with new technologies and always search for the best possible solution for our current and future customers. This is the exact reason why we chose Python as one of our programming languages. But before we focus on the "why Python?", let's start with the how. How did it all start for Python? Fastest-growing programming language in the world Python actually started out as a hobby project from Guido Van Rossum. Over a long holiday break in December 1989, Guido started developing an ABC-like language that could talk to the OS and would be suitable for quickly developing OS utilities for Amoeba. He named his nascent project Python , taking inspiration from the Monty Python's Flying Circus television program. (Source: Computer - Guido van Rossum: The Early Years of Python https://www.computer.org/csdl/magazine/co/2015/02/mco2015020007/13rRUy3gmYB ) Flash forward to 2021: Guido's hobby project has witnessed an incredible growth and has turned into a world - famous programming language. And you don't need to trust my word on it, numerous research shows: Python truly is the fastest-growing programming language in the world with more than six million developers . Just go and have a look at one of these popular and well-known data links: RedMonk rating , Github , Stack Overflow , PYPYL-index , Slashdata and TIOBE index . Why is Python so popular? Here we go: the clue you probably all have been waiting for. “Why is Python so popular and why did ACA choose Python as one of its programming languages?” Simplicity efficacy Python was designed to be a highly readable language and that simplicity is one of the most important reasons why it's so popular. Python is a powerful and elegant language that aims to be clear and consistent with a simple syntax. This means it's very accessible to beginners and that it has a relatively uncluttered visual layout. Because it's written in and can be read a lot like everyday English (without punctuation marks), Python has quickly become one of the easiest programming languages to learn. Last but not least, this simplicity as well as consistency also makes it highly effective for programmers to use, and therefore more cost efficient to build applications with. Community libraries With the Python community by your side, you’re never alone. There are a lot of big and active communities across the world that offer a lot of support. The fact that Python is so widespread across different industries, companies and people, means that there’s a huge number of developers working with the language. A big community like that which keeps on growing, results in a lot of support material, reliability and trust. Developers can not only rely on the community, but on an excellent and extensive list of libraries as well. These libraries and frameworks are an incredible resource and save time in projects. In turn, this makes both the libraries and Python even more popular. Versatility Flexibility One of the things developers love about this programming language is the fact that it can be used in a variety of projects and across multiple industrie, including data science, machine learning, blockchain, and so much more. In other words, Python doesn't restrict you to any sort of application. Python wasn't created to answer a specific need, so it isn't driven by specific templates or APIs, which both allows freedom or suitability for rapid development. These are the most important and well-known reasons behind Python's success. But what about the possible disadvantages, the usability and its relation to innovation? Want to know more about Python? If you know a little bit about Python, you know we only scratched the surface today. No worries! Our next two blog posts will be launched soon and reveal: What you should take into account when choosing or starting Python Where and how companies across the world are using Python today Why Python is one of the most sought after skills in data science Why Python is so popular in innovation Stay tuned!

Read more
Reading time 5 min
1 FEB 2021

Python owes much of its development and growth to Google, which has been an avid user of the language. They even employed the Dutchman Guido Van Rossum (Python’s founder) for a long time. “Python if we can, C if we must”, has been a slogan at Google for numerous years. The reason behind that is fairly obvious: the advantages of Python are crystal clear. It’s easy to learn and has clean and short syntax. That means its code is short and readable, and quickly written. Saving time writing code is one of the biggest advantages of Python. What is Python used for? As we mentioned above, Python is heavily used in AI and data science. Additionally, it’s also the better choice in DevOps and task automation. On the other hand, web development is a domain it’s taken Python some considerable time to become an obvious option. The main reason for this is probably because there’s a lot of competition in this area from languages like Java, PHP, Dotnet and NodeJS (= Javascript framework), with PHP and NodeJS being uniquely developed for websites. There’s also another important reason, which we will discuss later in this article. Python is an interpreted language Just like JavaScript, Python is an interpreted language. That means Python isn’t directly usable by a computer, but that it needs another program to execute the code. The advantage of this approach is that there’s no need for compilation, which saves time. However, this also comes with a downside: an interpreted language is a lot slower than a compiled language. There are Python implementations like Cython and Jython, which convert your Python code to bytecode or C extensions. They are outside the scope of this article. Although Python and Javascript are interpreted languages, Python is (was) in most cases the slowest. One of the reasons for this is that NodeJS was built for the web, has advanced multithreading abilities and is event driven. Python on the other hand is as old as the Internet (first released in 1991) and had to serve different purposes. Multithreading is not built into Python, but it is available in its toolbox. Speeding up Python with concurrency When developing websites or APIs in Python, there are a lot of frameworks to pick from. The most popular frameworks are Flask and Django. However, if you only need an API for your fancy React or Angular website, there’s a much better option: FastAPI . Currently at version 0.70.0, FastAPI is rapidly climbing up the popularity ranks. One of the reasons is that it does what it says: it’s fast. Not because it's lightweight, but because it has out-of-the-box support for concurrency (asynchronous functions, requires Python 3.6+). Concurrency is the element speeding everything up. It’s all about optimizing CPU-bound or I/O-bound tasks, with the latter slowing your code down the most. For example, if your API depends on another API to collect some additional data, the HTTP request will take a considerably large amount of time (in comparison with CPU tasks). If you have multiple calls, that could slow down your code dramatically. Let’s look at an example. The script below will download 50 websites without concurrency (synchronous). import requests import time def download(url, session): with session.get(url) as response: print(f"Read {len(response.content)} from {url}") def download_all(sites): with requests.Session() as session: for url in sites: download(url, session) if __name__ == "__main__": sites = [ "https://www.python.org", "https://www.jython.org", "https://pypi.org/", "https://fastapi.tiangolo.com/", "https://flask.palletsprojects.com/en/2.0.x/" ] * 10 start_time = time.time() download_all(sites) duration = time.time() - start_time print(f"Downloaded {len(sites)} in {duration} seconds") Default If we execute this script, this is the result: … output skipped … Read 41381 from https://flask.palletsprojects.com/en/2.0.x/ Downloaded 50 sites in 5.598579168319702 seconds Default 5 seconds to download 50 websites is not bad, but we can improve this dramatically. To speed things up, we can choose out of two ways Python can handle concurrency: threading and asyncio. Threading has been introduced in Python 3.2 and works very well. There are a few downsides though. Threads can interact in ways that are subtle and hard to detect. You’ll need to experiment with the number of threads, as that number varies depending on what you want to achieve. An even better way to speed things up is asyncio, introduced in Python 3.6. Asyncio gives you more control and is easier to implement than threading. The same code rewritten to use asyncio would look like this: import asyncio import time import aiohttp async def download(session, url): async with session.get(url) as response: print("Read {0} from {1}".format(response.content_length, url)) async def download_all(sites): async with aiohttp.ClientSession() as session: tasks = [] for url in sites: task = asyncio.ensure_future(download(session, url)) tasks.append(task) await asyncio.gather(*tasks, return_exceptions=True) if __name__ == "__main__": sites = [ "https://www.python.org", "https://www.jython.org", "https://pypi.org/", "https://fastapi.tiangolo.com/", "https://flask.palletsprojects.com/en/2.0.x/" ] * 10 start_time = time.time() asyncio.get_event_loop().run_until_complete(download_all(sites)) duration = time.time() - start_time print(f"Downloaded {len(sites)} sites in {duration} seconds") Default If we execute this script, this is the result: … output skipped … Read 41381 from https://flask.palletsprojects.com/en/2.0.x/ Downloaded 50 sites in 1.0634150505065918 seconds Default The same 50 websites are downloaded in 1 second, which is 5 times faster than the synchronous code ! In benchmarks with NodeJS, FastAPI is faster in almost all cases. Combine this with the development speed of Python, the tons of Python libraries available on the web and the built-in OpenAPI (Swagger) documentation and full-featured testing system and the choice is evident. Want to speed up your Python development? Contact us to see how we can help! Fun fact: there’s even Python on Mars Did you know NASA used Python on the Perseverance Mars mission? The moment of landing itself was recorded by 5 cameras placed in different parts of the probe. Python scripts were used to process the images and transfer them to the flight control center. If NASA uses a programming language, it's probably trustworthy. 😉

Read more