We learn & share

ACA Group Blog

Read more about our thoughts, views, and opinions on various topics, important announcements, useful insights, and advice from our experts.

Featured

8 MAY 2025

Embracing Composability in Data Mesh Platforms

Reading time 5 min

Tom De Wolf

In the ever-evolving landscape of data management, investing in platforms and navigating migrations between them is a recurring theme in many data strategies. How can we ensure that these investments remain relevant and can evolve over time, avoiding endless migration projects? The answer lies in embracing ‘Composability’ - a key principle for designing robust, future-proof data (mesh) platforms. Is there a silver bullet we can buy off-the-shelf? The data-solution market is flooded with data vendor tools positioning themselves as the platform for everything, as the all-in-one silver bullet. It's important to know that there is no silver bullet. While opting for a single off-the-shelf platform might seem like a quick and easy solution at first, it can lead to problems down the line. These monolithic off-the-shelf platforms often end up inflexible to support all use cases, not customizable enough, and eventually become outdated.This results in big complicated migration projects to the next silver bullet platform, and organizations ending up with multiple all-in-one platforms, causing disruptions in day-to-day operations and hindering overall progress. Flexibility is key to your data mesh platform architecture A complete data platform must address numerous aspects: data storage, query engines, security, data access, discovery, observability, governance, developer experience, automation, a marketplace, data quality, etc. Some vendors claim their all-in-one data solution can tackle all of these. However, typically such a platform excels in certain aspects, but falls short in others. For example, a platform might offer a high-end query engine, but lack depth in features of the data marketplace included in their solution. To future-proof your platform, it must incorporate the best tools for each aspect and evolve as new technologies emerge. Today's cutting-edge solutions can be outdated tomorrow, so flexibility and evolvability are essential for your data mesh platform architecture. Embrace composability: Engineer your future Rather than locking into one single tool, aim to build a platform with composability at its core. Picture a platform where different technologies and tools can be seamlessly integrated, replaced, or evolved, with an integrated and automated self-service experience on top. A platform that is both generic at its core and flexible enough to accommodate the ever-changing landscape of data solutions and requirements. A platform with a long-term return on investment by allowing you to expand capabilities incrementally, avoiding costly, large-scale migrations. Composability enables you to continually adapt your platform capabilities by adding new technologies under the umbrella of one stable core platform layer. Two key ingredients of composability Building blocks: These are the individual components that make up your platform. Interoperability: All building blocks must work together seamlessly to create a cohesive system. An ecosystem of building blocks When building composable data platforms, the key lies in sourcing the right building blocks. But where do we get these? Traditional monolithic data platforms aim to solve all problems in one package, but this stifles the flexibility that composability demands. Instead, vendors should focus on decomposing these platforms into specialized, cost-effective components that excel at addressing specific challenges. By offering targeted solutions as building blocks, they empower organizations to assemble a data platform tailored to their unique needs. In addition to vendor solutions, open-source data technologies also offer a wealth of building blocks. It should be possible to combine both vendor-specific and open-source tools into a data platform tailored to your needs. This approach enhances agility, fosters innovation, and allows for continuous evolution by integrating the latest and most relevant technologies. Standardization as glue between building blocks To create a truly composable ecosystem, the building blocks must be able to work together, i.e. interoperability. This is where standards come into play, enabling seamless integration between data platform building blocks. Standardization ensures that different tools can operate in harmony, offering a flexible, interoperable platform. Imagine a standard for data access management that allows seamless integration across various components. It would enable an access management building block to list data products and grant access uniformly. Simultaneously, it would allow data storage and serving building blocks to integrate their data and permission models, ensuring that any access management solution can be effortlessly composed with them. This creates a flexible ecosystem where data access is consistently managed across different systems. The discovery of data products in a catalog or marketplace can be greatly enhanced by adopting a standard specification for data products. With this standard, each data product can be made discoverable in a generic way. When data catalogs or marketplaces adopt this standard, it provides the flexibility to choose and integrate any catalog or marketplace building block into your platform, fostering a more adaptable and interoperable data ecosystem. A data contract standard allows data products to specify their quality checks, SLOs, and SLAs in a generic format, enabling smooth integration of data quality tools with any data product. It enables you to combine the best solutions for ensuring data reliability across different platforms. Widely accepted standards are key to ensuring interoperability through agreed-upon APIs, SPIs, contracts, and plugin mechanisms. In essence, standards act as the glue that binds a composable data ecosystem. A strong belief in evolutionary architectures At ACA Group, we firmly believe in evolutionary architectures and platform engineering, principles that seamlessly extend to data mesh platforms. It's not about locking yourself into a rigid structure but creating an ecosystem that can evolve, staying at the forefront of innovation. That’s where composability comes in. Do you want a data platform that not only meets your current needs but also paves the way for the challenges and opportunities of tomorrow? Let’s engineer it together Ready to learn more about composability in data mesh solutions? {% module_block module "widget_f1f5c870-47cf-4a61-9810-b273e8d58226" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Contact us now!"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":230950468795,"href":"https://25145356.hs-sites-eu1.com/en/contact","href_with_scheme":null,"type":"CONTENT"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

We learn & share

ACA Group Blog

Read more about our thoughts, views, and opinions on various topics, important announcements, useful insights, and advice from our experts.

Featured

8 MAY 2025

Embracing Composability in Data Mesh Platforms

Reading time 5 min

Tom De Wolf

Searching for something specific?

Lets' talk!

We'd love to talk to you!

Lets' talk!

We'd love to talk to you!

Lets' talk!

We'd love to talk to you!

Lets' talk!

We'd love to talk to you!

How to build a highly available Atlassian stack on Kubernetes

Reading time 7 min

Bregt Coenen

6 MAY 2025

Within ACA, there are multiple teams working on different (or the same!) projects. Every team has their own domains of expertise, such as developing custom software, marketing and communications, mobile development and more. The teams specialized in Atlassian products and cloud expertise combined their knowledge to create a highly-available Atlassian stack on Kubernetes. Not only could we improve our internal processes this way, we could also offer this solution to our customers! In this blogpost, we’ll explain how our Atlassian and cloud teams built a highly-available Atlassian stack on top of Kubernetes. We’ll also discuss the benefits of this approach as well as the problems we’ve faced along the path. While we’re damn close, we’re not perfect after all 😉 Lastly, we’ll talk about how we monitor this setup. The setup of our Atlassian stack Our Atlassian stack consists of the following products: Amazon EKS Amazon EFS Atlassian Jira Data Center Atlassian Confluence Data Center Amazon EBS Atlassian Bitbucket Data Center Amazon RDS As you can see, we use AWS as the cloud provider for our Kubernetes setup. We create all the resources with Terraform. We’ve written a separate blog post on what our Kubernetes setup exactly looks like. You can read it here ! The image below should give you a general idea. The next diagram should give you an idea about the setup of our Atlassian Data Center. While there are a few differences between the products and setups, the core remains the same. The application is launched as one or more pods described by a StatefulSet. The pods are called node-0 and node-1 in the diagram above. The first request is sent to the load balancer and will be forwarded to either the node-0 pod or the node-1 pod. Traffic is sticky, so all subsequent traffic from that user will be sent to node-1. Both pod-0 and pod-1 require persistent storage which is used for plugin cache and indexes. A different Amazon EBS volume is mounted on each of the pods. Most of the data like your JIRA issues, Confluence spaces, … is stored in a database. The database is shared, node-0 and node-1 both connect to the same database. We usually use PostgreSQL on Amazon RDS. The node-0 and node-1 pod also need to share large files which we don’t want to store in a database, for example attachments. The same Amazon EFS volume is mounted on both pods. When changes are made, for example an attachment is uploaded to an issue, the attachment is immediately available on both pods. We use CloudFront (CDN) to cache static assets and improve the web response times. The benefits of this setup By using this setup, we can leverage the advantages of Docker and Kubernetes and the Data Center versions of the Atlassian tooling. There are a lot of benefits to this kind of setup, but we’ve listed the most important advantages below. It’s a self-healing platform : containers and worker nodes will automatically replace themselves when a failure occurs. In most cases, we don’t even have to do anything and the stack takes care of itself. Of course, it’s still important to investigate any failures so you can prevent them from occurring in the future. Exactly zero downtime deployments : when upgrading the first node within the cluster to a new version, we can still serve the old version to our customers on the second. Once the upgrade is complete, the new version is served from the first node and we can upgrade the second node. This way, the application stays available, even during upgrades. Deployments are predictable : we use the same Docker container for development, staging and production. It’s why we are confident the container will be able to start in our production environment after a successful deploy to staging. Highly available applications: when failure occurs on one of the nodes, traffic can be routed to the other node. This way you have time to investigate the issue and fix the broken node while the application stays available. It’s possible to sync data from one node to the other . For example, syncing the index from one node to the other to fix a corrupt index can be done in just a few seconds, while a full reindex can take a lot longer. You can implement a high level of security on all layers (AWS, Kubernetes, application, …) AWS CloudTrail prevents unauthorized access on AWS and sends an alert in case of anomaly. AWS Config prevents AWS security group changes. You can find out more on how to secure your cloud with AWS Config in our blog post. Terraform makes sure changes on the AWS environment are approved by the team before rollout. Since upgrading Kubernetes master and worker nodes has little to no impact, the stack is always running a recent version with the latest security patches. We use a combination of namespacing and RBAC to make sure applications and deployments can only access resources within their namespace with least privilege . NetworkPolicies are rolled out using Calico. We deny all traffic between containers by default and only allow specific traffic. We use recent versions of the Atlassian applications and implement Security Advisories whenever they are published by Atlassian. Interested in leveraging the power of Kubernetes yourself? You can find more information about how we can help you on our website! {% module_block module "widget_3d4315dc-144d-44ec-b069-8558f77285de" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Apply the power of Kubernetes"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":null,"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %} Apply the power of Kubernetes Problems we faced during the setup Migrating to this stack wasn’t all fun and games. We’ve definitely faced some difficulties and challenges along the way. By discussing them here, we hope we can facilitate your migration to a similar setup! Some plugins (usually older plugins) were only working on the standalone version of the Atlassian application. We needed to find an alternative plugin or use vendor support to have the same functionality on Atlassian Data Center. We had to make some changes to our Docker containers and network policies (i.e. firewall rules) to make sure both nodes of an application could communicate with each other. Most of the applications have some extra tools within the container. For example, Synchrony for Confluence, ElasticSearch for BitBucket, EazyBI for Jira, and so on. These extra tools all needed to be refactored for a multi-node setup with shared data. In our previous setup, each application was running on its own virtual machine. In a Kubernetes context, the applications are spread over a number of worker nodes. Therefore, one worker node might run multiple applications. Each node of each application will be scheduled on a worker node that has sufficient resources available. We needed to implement good placement policies so each node of each application has sufficient memory available. We also needed to make sure one application could not affect another application when it asks for more resources. There were also some challenges regarding load balancing. We needed to create a custom template for nginx ingress-controller to make sure websockets are working correctly and all health checks within the application are reporting a healthy status. Additionally, we needed a different load balancer and URL for our BitBucket SSH traffic compared to our web traffic to the BitBucket UI. Our previous setup contained a lot of data, both on filesystem and in the database. We needed to migrate all the data to an Amazon EFS volume and a new database in a new AWS account. It was challenging to find a way to have a consistent sync process that also didn’t take too long because during migration, all applications were down to prevent data loss. In the end, we were able to meet these criteria and were able to migrate successfully. Monitoring our Atlassian stack We use the following tools to monitor all resources within our setup Datadog to monitor all components created within our stack and to centralize logging of all components. You can read more about monitoring your stack with Datadog in our blog post here . NewRelic for APM monitoring of the Java process (Jira, Confluence, Bitbucket) within the container. If our monitoring detects an anomaly, it creates an alert within OpsGenie . OpsGenie will make sure that this alert is sent to the team or the on-call person that is responsible to fix the problem. If the on-call person does not acknowledge the alert in time, the alert will be escalated to the team that’s responsible for that specific alert. Conclusion In short, we are very happy we migrated to this new stack. Combining the benefits of Kubernetes and the Atlassian Data Center versions of Jira, Confluence and BitBucket feels like a big step in the right direction. The improvements in self-healing, deploying and monitoring benefits us every day and maintenance has become a lot easier. Interested in your own Atlassian Stack? Do you also want to leverage the power of Kubernetes? You can find more information about how we can help you on our website! Our Atlassian hosting offering

What does our Kubernetes setup at ACA look like?

Reading time 6 min

Bregt Coenen

6 MAY 2025

At ACA, we live and breathe Kubernetes. We set up new projects with this popular container orchestration system by default, and we’re also migrating existing customers to Kubernetes. As a result, the amount of Kubernetes clusters the ACA team manages, is growing rapidly! We’ve had to change our setup multiple times to accommodate for more customers, more clusters, more load, less maintenance and so on. From an Amazon ECS to a Kubernetes setup In 2016, we had a lot of projects that were running in Docker containers. At that point in time, our Docker containers were either running in Amazon ECS or on Amazon EC2 Virtual Machines running the Docker daemon. Unfortunately, this setup required a lot of maintenance. We needed a tool that would give us a reliable way to run these containers in production. We longed for an orchestrator that would provide us high availability, automatic cleanup of old resources, automatic container scheduling and so much more. → Enter Kubernetes ! Kubernetes proved to be the perfect candidate for a container orchestration tool. It could reliably run containers in production and reduce the amount of maintenance required for our setup. Creating a Kubernetes-minded approach Agile as we are, we proposed the idea for a Kubernetes setup for one of our next projects. The customer saw the potential of our new approach and agreed to be part of the revolution. At the beginning of 2017, we created our first very own Kubernetes cluster. At this stage, there were only two certainties: we wanted to run Kubernetes and it would run on AWS . Apart from that, there were still a lot of questions and challenges. How would we set up and manage our cluster? Can we run our existing docker containers within the cluster? What type of access and information can we provide the development teams? We’ve learned that in the end, the hardest task was not the cluster setup. Instead, creating a new mindset within ACA Group to accept this new approach, and involving the development teams in our next-gen Kubernetes setup proved to be the harder task at hand. Apart from getting to know the product ourselves and getting other teams involved as well, we also had some other tasks that required our attention: we needed to dockerize every application, we needed to be able to setup applications in the Kubernetes cluster that were high available and if possible also self-healing, and clustered applications needed to be able to share their state using the available methods within the selected container network interface. Getting used to this new way of doing things in combination with other tasks, like setting up good monitoring, having a centralized logging setup and deploying our applications in a consistent and maintainable way, proved to be quite challenging. Luckily, we were able to conquer these challenges and about half a year after we’d created our first Kubernetes cluster, our first production cluster went live (August 2017). These were the core components of our toolset anno 2017: Terraform would deploy the AWS VPC, networking components and other dependencies for the Kubernetes cluster Kops for cluster creation and management An EFK stack for logging was deployed within the Kubernetes cluster Heapster, influxdb and grafana in combination with Librato for monitoring within the cluster Opsgenie for alerting Nice! … but we can do better: reducing costs, components and downtime Once we had completed our first setup, it became easier to use the same topology and we continued implementing this setup for other customers. Through our infrastructure-as-code approach (Terraform) in combination with a Kubernetes cluster management tool (Kops), the effort to create new clusters was relatively low. However, after a while, we started to notice some possible risks related to this setup. The amount of work required for the setup and the impact of updates or upgrades on our Kubernetes stack was too large. At the same time, the number of customers that wanted their very own Kubernetes cluster was growing. So, we needed to make some changes to reduce maintenance effort on the Kubernetes part of this setup to keep things manageable for ourselves. Migration to Amazon EKS and Datadog At this point the Kubernetes service from AWS (Amazon EKS) became generally available. We were able to move all things that are managed by Kops to our Terraform code, making things a lot less complex. As an extra benefit, the Kubernetes master nodes are now managed by EKS. This means we now have less nodes to manage and EKS also provides us cluster upgrades with a touch of the button. Apart from reducing the workloads on our Kubernetes management plane, we’ve also reduced the number of components within our cluster. In the previous setup we were using an EFK (ElasticSearch, Fluentd and Kibana) stack for our logging infrastructure. For our monitoring, we were using a combination of InfluxDB, Grafana, Heapster and Librato. These tools gave us a lot of flexibility but required a lot of maintenance effort, since they all ran within the cluster. We’ve replaced them all with Datadog agent, reducing our maintenance workloads drastically. Upgrades in 60 minutes Furthermore, because of the migration to Amazon EKS and the reduction in the number of components running within the Kubernetes cluster, we were able to reduce the cost and availability impact of our cluster upgrades. With the current stack, using Datadog and Amazon EKS, we can upgrade a Kubernetes cluster within an hour. If we were to use the previous stack, it would take us about 10 hours on average. So where are we now? We currently have 16 Kubernetes clusters up and running , all running the latest available EKS version. Right now, we want to spread our love for Kubernetes wherever we can. Multiple project teams within ACA Group are now using Kubernetes, so we are organizing workshops to help them get up to speed with the technology quickly. At the same time, we also try to catch up with the latest additions to this rapidly changing platform. That’s why we’ve attended the Kubecon conference in Barcelona and shared our opinions in our Kubecon Afterglow event. What’s next? Even though we are very happy with our current Kubernetes setup, we believe there’s always room for improvement . During our Kubecon Afterglow event, we’ve had some interesting discussions with other Kubernetes enthusiasts. These discussions helped us defining our next steps, bringing our Kubernetes setup to an even higher level. Some things we’d like to improve in the near future: add service mesh to our Kubernetes stack, 100% automatic worker node upgrades without application downtime. Of course, these are just a few focus points. We’ll implement many new features and improvements whenever they are released! What about you? Are you interested in your very own Kubernetes cluster? Which improvements do you plan on making to your stack or Kubernetes setup? Or do you have an unanswered Kubernetes question we might be able to help you with? Contact us at cloud@aca-it.be and we will help you out! {% module_block module "widget_7e6bdbd6-406c-4a0a-8393-27a28f436c6d" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Our Kubernetes services"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":null,"href":"https://www.acagroup/be/en/services/kubernetes","href_with_scheme":"https://www.acagroup/be/en/services/kubernetes","type":"EXTERNAL"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

KubeCon / CloudNativeCon 2022 highlights!

Reading time 7 min

ACA Group Team

5 MAY 2025

Didn’t make it to KubeCon this year? Read along to find out our highlights of the KubeCon / CloudNativeCon conference this year by ACA Group’s Cloud Native team! What is KubeCon / CloudNativeCon? KubeCon (Kubernetes Conference) / CloudNativeCon , organized yearly at EMAE by the Cloud Native Computing Foundation (CNCF), is a flagship conference that gathers adopters and technologists from leading open source and cloud native communities in a location. This year, approximately 5,000 physical and 10,000 virtual attendees showed up for the conference. CNCF is the open source, vendor-neutral hub of cloud native computing, hosting projects like Kubernetes and Prometheus to make cloud native universal and sustainable. Bringing 300+ sessions from partners, industry leaders, users and vendors on topics covering CI/CD, GitOps, Kubernetes, machine learning, observability, networking, performance, service mesh and security. It's clear there's always something interesting to hear about at KubeCon, no matter your area of interest or level of expertise! It's clear that the Cloud Native ecosystem has grown to a mature, trend-setting and revolutionizing game-changer in the industry. All is initiated on the Kubernetes trend and a massive amount of organizations that support, use and have grown their business by building cloud native products or using them in mission-critical solutions. 2022's major themes What struck us during this year’s KubeCon were the following major themes: The first was increasing maturity and stabilization of Kubernetes and associated products for monitoring, CI/CD, GitOps, operators, costing and service meshes, plus bug fixing and small improvements. The second is a more elaborate focus on security . Making pods more secure, preventing pod trampoline breakouts, end-to-end encryption and making full analysis of threats for a complete k8s company infrastructure. The third is sustainability and a growing conscience that systems running k8s and the apps on it consume a lot of energy while 60 to 80% of CPU remains unused. Even languages can be energy (in)efficient. Java is among the most power efficient, while Python apparently is far less due to the nature of the interpreter / compiler. Companies all need to plan and work on decreasing energy footprint in both applications and infrastructure. Autoscaling will play an important role in achieving this. Sessions highlights Sustainability Data centers worldwide consume 8% of all generated electricity worldwide. So we'll need to reflect on the effective usage of our infrastructure and avoid idle time (on average CPU utilization is only between 20 and 40%) when servers are running, make them work with running as many workloads as possible shut down resources when they are not needed by applying autoscaling approaches the coding technology used in your software, some programming languages use less CPU. CICD / GitOps GitOps automates infrastructure updates using a Git workflow with continuous integration (CI) and continuous delivery (CI/CD). When new code is merged, the CI/CD pipeline enacts the change in the environment. Flux is a great example of this. Flux provides GitOps for both apps and infrastructure. It supports GitRepository, HelmRepository, HelmRepository and Bucket CRD as the single source of truth. With A/B or Canary deployments, it makes it easy to deploy new features without impacting all the users. When the deployment fails, it can easily roll back. Checkout the KubeCon schedule page for more information! Kubernetes Even though Kubernetes 1.24 was released a few weeks before the start of the event, not many talks were focused on the Kubernetes core. Most talks were focused on extending Kubernetes (using APIs, controllers, operators, …) or best practices around security, CI/CD, monitoring … for whatever will run within the Kubernetes cluster. If you're interested in the new features that Kubernetes 1.24 has to offer, you can check the official website . Observability Getting insights on how your application is running in your cluster is crucial, but not always practical. This is where eBPF comes into play, which is used by tools such as Pixie to collect data without any code changes. Check out the KubeCon schedule page for more information! FinOps Now that more and more people are using Kubernetes, a lot of workloads have been migrated. All these containers have a footprint. Memory, CPU, storage, … needs to be allocated, and they all have a cost. Cost management was a recurring topic during the talks. Using autoscaling (adding but also removing capacity) to match the required resources and identifying unused resources are part of this new movement. New services like 'kubecost' are becoming increasingly popular. Performance One of the most common problems in a cluster is not having enough space or resources. With the help of a Vertical Pod Autoscaler (VPA) this can be a thing of the past. A VPA will analyze and store Memory and CPU metrics/data to automatically adjust to the right CPU and memory request limits. The benefits of this approach will let you save money, avoid waste, size optimally the underlying hardware, tune resources on worker nodes and optimize placements of pods in a Kubernetes cluster. Check out the KubeCon schedule page for more information! Service mesh We all know it's extremely important to know which application is sharing data with other applications in your cluster. Service mesh provides traffic control inside your cluster(s). You can block or permit any request that is sent or received from any application to other applications. It also provides Metrics, Specs, Split, ... information to understand the data flow. In the talk, Service Mesh at Scale: How Xbox Cloud Gaming Secures 22k Pods with Linkerd , Chris explains why they choose Linkerd and what the benefits are of a service mesh. Check out the KubeCon schedule page for more information! Security Trampoline pods, sounds fun, right? During a talk by two security researchers from Palo Alto Networks, we learned that they aren’t all that fun. In short, these are pods that can be used to gain cluster admin privileges. To learn more about the concept and how to deal with them, we strongly recommend taking a look at the slides on the KubeCon schedule page ! Lachlan Evenson from Microsoft gave a clear explanation of Pod Security in his The Hitchhiker's Guide to Pod Security talk. Pod Security is a built-in admission controller that evaluates Pod specifications against a predefined set of Pod Security Standards and determines whether to admit or deny the pod from running. — Lachlan Evenson , Principal Program Manager at Microsoft P o d Security is replacing PodSecurityPolicy starting fro m Kubernetes 1.23. So if you are using PodSecurityPolicy, now might be a good time to further research Pod Security and the migration path. In version 1.25, support for PodSecurityPolicy will be removed. If you aren’t using PodSecurityPolicy or Pod Security, it is definitely time to further investigate it! Another one of the recurring themes of this KubeCon 2022 were operators. Operators enable the extension of the Kubernetes API with operational knowledge. This is achieved by combining Kubernetes controllers and watched objects that describe the desired state. They introduce Custom Resource Definitions, custom controllers, Kubernetes or cloud resources and logging and metrics, making life easier for Dev as well as Ops. H owever, during a talk by Kevin Ward from ControlPlane, we learned that there are some risks. Additionally, and more importantly, he also talked about how we can identify those risks with tools such as BadRobot and an operator thread matrix . Checkout the KubeCon schedule page for more information! Scheduling Telemetry Aware Scheduling helps you schedule your workloads based on metrics from your worker nodes. You can for example set a rule to not schedule new workloads on worker nodes with more than 90% used memory. The cluster will take this into account when scheduling a pod. Another nice feature of this tool is that it can also reschedule pods to make sure your rules are kept in line. Checkout the KubeCon schedule page for more information! Cluster autoscaling A great way for stateless workloads to scale cost effectively is to use AWS EC2 Spot, which is spare VM capacity available at a discount. To use Spot instances effectively in a K8S cluster, you should use aws-node-termination-handler . This way, you can move your workloads off of a worker node when Spot decides to reclaim it. Another good tool is Karpenter , a tool to provision Spot instances just in time for your cluster. With these two tools, you can cost effectively host your stateless workloads! Check out the KubeCon schedule page for more information! Event-driven autoscaling Using the Horizontal Pod Autoscaler (HPA) is a great way to scale pods based on metrics such as CPU utilization, memory usage, and more. Instead of scaling based on metrics, Kubernetes Event Driven Autoscaling (KEDA) can scale based on events (Apache Kafka, RabbitMQ, AWS SQS, …) and it can even scale to 0 unlike HPA. Check out the KubeCon schedule page for more information! Wrap-up We had a blast this year at the conference. We left with an inspired feeling that we'll no doubt translate into internal projects, apply for new customer projects and discuss with existing customers where applicable. Not only that, but we'll brief our colleagues and organize an afterglow session for those interested back home in Belgium. If you appreciated our blog article, feel free to drop us a small message. We are always happy when the content that we publish is also of any value or interest to you. If you think we can help you or your company in adopting Cloud Native, drop me a note at peter.jans@aca-it.be . As a final note we'd like to thank Mona for the logistics, Stijn and Ronny for this opportunity and the rest of the team who stayed behind to keep an eye on the systems of our valued customers.

How to add functionality to Kubernetes with Operators

Reading time 6 min

Bregt Coenen

16 JUN 2022

I started writing this blog post the day after I came home from KubeCon and CloudNativeCon 2022. The main thing I noticed was that the content of the talks has changed over the last few years. Kubernetes’ new challenges When looking at the topics of this year’s KubeCon / CloudNativeCon, it feels like a lot of questions about Kubernetes, types of cloud, logging tools and more are answered for most companies. This makes sense, because more and more organizations have already successfully adopted Kubernetes. Kubernetes is no longer considered the next big thing, but rather the logical choice. However, we’ve noticed (during the talks, but also in our own journey) that new problems and challenges have arisen, leading to other questions: How can I implement more automation? How can I control/lower the costs for these setups? Is there a way to expand on whatever exists and add my own functionalities to Kubernetes? One of the possible ways to add functionalities to Kubernetes is using Operators. In this blog post, I will briefly explain how Operators work. How Operators work The concept of an operator is quite simple. I believe the easiest way to explain it is by actually installing an operator. Within ACA, we use the istio operator. The exact steps of installing depends on the operator you are installing, but usually they’re quite similar. First, install the istioctl binary on the machine that has access to the Kubernetes api. The next step is to run the command to install the operator. curl -sL https://istio.io/downloadIstioctl | sh - export PATH=$PATH:$HOME/.istioctl/bin istioctl operator init Default This will create the operator resource(s) in the istio-system namespace. You should see a pod running. kubectl get pods -n istio-operator NAMESPACE NAME READY STATUS RESTARTS AGE istio-operator istio-operator-564d46ffb7-nrw2t 1/1 Running 0 20s kubectl get crd NAME CREATED AT istiooperators.install.istio.io 2022-05-21T19:19:43Z Default As you can see, a new CustomResourceDefinition called istiooperators.install.istio.io is created. This is a blueprint that specifies how resource definitions should be added to the cluster. To create config, we need to know what ‘kind’ of config the CRD expects to be created. kubectl get crd istiooperators.install.istio.io -oyaml … status: acceptedNames: kind: IstioOperator … Default Let’s create a simple config file. kubectl apply -f - EOF apiVersion: install.istio.io/v1alpha1 kind: IstioOperator metadata: namespace: istio-system name: istio-controlplane spec: profile: minimal EOF Default Once the ResourceDefinition that contains the configuration is added to the cluster, the operator will make sure the resources in the cluster match whatever is defined in the configuration. You’ll see that new resources are created. kubectl get pods -A istio-system istiod-7dc88f87f4-rsc42 0/1 Pending 0 2m27s Default Since I run a small kind cluster, the istiod pod can’t be scheduled and is stuck in a Pending state. Let me explain the process first before changing this. The istio-operator will keep watching the IstioOperator configuration file for changes. If changes are made to the file, it will only make the changes that are required to update the resources in the cluster to match the state specified in the configuration file. This behavior is called reconciliation . Let’s watch the IstioOperator configuration file status. Note that it’s created in the istio-system namespace. kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane RECONCILING 3m Default As you can see, this is still reconciling, because the pod can’t start. After some time, it’ll go in an ERROR state. kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane ERROR 6m58s Default You can also check the istio-operator log for useful information. kubectl -n istio-operator logs istio-operator-564d46ffb7-nrw2t --tail 20 - Processing resources for Istiod. - Processing resources for Istiod. Waiting for Deployment/istio-system/istiod ✘ Istiod encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition. Since I’m running a small demo cluster, I’ll update the memory limit so the POD can be scheduled. This is done within the spec: part of the IstioOperator definition. kubectl -n istio-system edit istiooperator istio-controlplane spec: profile: minimal components: pilot: k8s: resources: requests: memory: 128Mi The istiooperator will go back to a RECONCILING state. kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane RECONCILING 11m Default And after some time, it becomes HEALTHY . kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane HEALTHY 12m Default You can see the istiod pod is running. NAMESPACE NAME READY STATUS istio-system istiod-7dc88f87f4-n86z9 1/1 Running Default Apart from the istiod deployment, a lot of new CRDs are added as well. authorizationpolicies.security.istio.io 2022-05-21T20:08:05Z destinationrules.networking.istio.io 2022-05-21T20:08:05Z envoyfilters.networking.istio.io 2022-05-21T20:08:05Z gateways.networking.istio.io 2022-05-21T20:08:05Z istiooperators.install.istio.io 2022-05-21T20:07:01Z peerauthentications.security.istio.io 2022-05-21T20:08:05Z proxyconfigs.networking.istio.io 2022-05-21T20:08:05Z requestauthentications.security.istio.io 2022-05-21T20:08:05Z serviceentries.networking.istio.io 2022-05-21T20:08:05Z sidecars.networking.istio.io 2022-05-21T20:08:05Z telemetries.telemetry.istio.io 2022-05-21T20:08:05Z virtualservices.networking.istio.io 2022-05-21T20:08:05Z wasmplugins.extensions.istio.io 2022-05-21T20:08:06Z workloadentries.networking.istio.io 2022-05-21T20:08:06Z workloadgroups.networking.istio.io 2022-05-21T20:08:06Z Default How the operator works - summary As you can see, this is a very easy way to quickly set up istio within our cluster. In short, these are the steps: Install the operator One (or more) CustomResourceDefinitions is added that provides a blueprint for the objects that can be created/managed. A deployment is created, which in turn creates a Pod that monitors the Configurations of the kinds that are specified by the CRD. The user adds configuration to the cluster, with its type specified by the CRD. The operator POD notices the new configuration and takes all steps that are required to make sure the cluster is in the desired state specified by the configuration. Benefits of the operator approach The operator approach makes it easy to package a set of resources like Deployments, Jobs, CustomResourceDefinitions. This way, it’s easy to add additional behavior and capabilities to Kubernetes. There’s a library which lists the available operators which can be found at https://operatorhub.io/ , counting 255 operators at the moment of writing. The operators are usually installed with just a few commands or lines of code. It’s also possible to create your own operators. It might make sense to package a set of deployments, jobs, CRDs, … that provide a specific functionality as an operator. The operator can be handled as operators and use pipelines for CVE validations, E2E tests, rollout to test environments, and more before a new version is promoted to production. Pitfalls We have been using Kubernetes for a long time within the ACA Group and have collected some security best-practices during this period. We’ve noticed that one-file-deployments and helm charts from the internet are usually not as well configured as we want them to be. Think about RBAC rules that give too many permissions, resources not currently namespaced or containers running as root. When using operators from operatorhub.io, you basically trust the vendor or provider to follow security best-practices. However … one of the talks at KubeCon 2022 that made the biggest impression on me, stated that a lot of the operators have issues regarding security. I would suggest you to watch Tweezering Kubernetes Resources: Operating on Operators - Kevin Ward, ControlPlane before installing. Another thing we’ve noticed is that using operators can speed up the process to implement new tools and features. Be sure to read the documentation that was provided by the creator of an operator before you dive into advanced configuration. It might be possible that not all features are actually implemented on the CRD that is created by the operator. However, it is bad practice to directly manipulate the resources that were created by the operator. The operator is not tested against your manual changes and this might cause inconsistencies. Additionally, new operator versions might (partly) undo your changes, which also might cause problems. At that point, you’re basically stuck, unless you create your own operator that provides additional features. We’ve also noticed that there is no real ‘rule book’ on how to provide CRDs and documentation is not always easy to find or understand. Conclusion Operators are currently a hot topic within the Kubernetes community. The number of available operators is growing fast, making it easy to add functionality to your cluster. However, there is no rule book or a minimal baseline of quality. When installing operators from the operatorhub, be sure to check the contents or validate the created resources on a local setup. We expect to see some changes and improvements in the near future, but at this point they can be very useful already. AUTHOR Bregt Coenen

ACA Group Blog

Featured

ACA Group Blog

Featured

All blog posts

We'd love to talk to you!

We'd love to talk to you!

We'd love to talk to you!

We'd love to talk to you!