Written by
Bregt Coenen
Bregt Coenen
Bregt Coenen
All blog posts
code
code
Reading time 3 min
8 MAY 2025

In the complex world of modern software development, companies are faced with the challenge of seamlessly integrating diverse applications developed and managed by different teams. An invaluable asset in overcoming this challenge is the Service Mesh. In this blog article, we delve into Istio Service Mesh and explore why investing in a Service Mesh like Istio is a smart move." What is Service Mesh? A service mesh is a software layer responsible for all communication between applications, referred to as services in this context. It introduces new functionalities to manage the interaction between services, such as monitoring, logging, tracing, and traffic control. A service mesh operates independently of the code of each individual service, enabling it to operate across network boundaries and collaborate with various management systems. Thanks to a service mesh, developers can focus on building application features without worrying about the complexity of the underlying communication infrastructure. Istio Service Mesh in Practice Consider managing a large cluster that runs multiple applications developed and maintained by different teams, each with diverse dependencies like ElasticSearch or Kafka. Over time, this results in a complex ecosystem of applications and containers, overseen by various teams. The environment becomes so intricate that administrators find it increasingly difficult to maintain a clear overview. This leads to a series of pertinent questions: What is the architecture like? Which applications interact with each other? How is the traffic managed? Moreover, there are specific challenges that must be addressed for each individual application: Handling login processes Implementing robust security measures Managing network traffic directed towards the application ... A Service Mesh, such as Istio, offers a solution to these challenges. Istio acts as a proxy between the various applications (services) in the cluster, with each request passing through a component of Istio. How Does Istio Service Mesh Work? Istio introduces a sidecar proxy for each service in the microservices ecosystem. This sidecar proxy manages all incoming and outgoing traffic for the service. Additionally, Istio adds components that handle the incoming and outgoing traffic of the cluster. Istio's control plane enables you to define policies for traffic management, security, and monitoring, which are then applied to the added components. For a deeper understanding of Istio Service Mesh functionality, our blog article, "Installing Istio Service Mesh: A Comprehensive Step-by-Step Guide" , provides a detailed, step-by-step explanation of the installation and utilization of Istio. Why Istio Service Mesh? Traffic Management: Istio enables detailed traffic management, allowing developers to easily route, distribute, and control traffic between different versions of their services. Security: Istio provides a robust security layer with features such as traffic encryption using its own certificates, Role-Based Access Control (RBAC), and capabilities for implementing authentication and authorization policies. Observability: Through built-in instrumentation, Istio offers deep observability with tools for monitoring, logging, and distributed tracing. This allows IT teams to analyze the performance of services and quickly detect issues. Simplified Communication: Istio removes the complexity of service communication from application developers, allowing them to focus on building application features. Is Istio Suitable for Your Setup? While the benefits are clear, it is essential to consider whether the additional complexity of Istio aligns with your specific setup. Firstly, a sidecar container is required for each deployed service, potentially leading to undesired memory and CPU overhead. Additionally, your team may lack the specialized knowledge required for Istio. If you are considering the adoption of Istio Service Mesh, seek guidance from specialists with expertise. Feel free to ask our experts for assistance. More Information about Istio Istio Service Mesh is a technological game-changer for IT professionals aiming for advanced control, security, and observability in their microservices architecture. Istio simplifies and secures communication between services, allowing IT teams to focus on building reliable and scalable applications. Need quick answers to all your questions about Istio Service Mesh? Contact our experts

Read more
kubernetes aca group
kubernetes aca group
Reading time 7 min
6 MAY 2025

Within ACA, there are multiple teams working on different (or the same!) projects. Every team has their own domains of expertise, such as developing custom software, marketing and communications, mobile development and more. The teams specialized in Atlassian products and cloud expertise combined their knowledge to create a highly-available Atlassian stack on Kubernetes. Not only could we improve our internal processes this way, we could also offer this solution to our customers! In this blogpost, we’ll explain how our Atlassian and cloud teams built a highly-available Atlassian stack on top of Kubernetes. We’ll also discuss the benefits of this approach as well as the problems we’ve faced along the path. While we’re damn close, we’re not perfect after all 😉 Lastly, we’ll talk about how we monitor this setup. The setup of our Atlassian stack Our Atlassian stack consists of the following products: Amazon EKS Amazon EFS Atlassian Jira Data Center Atlassian Confluence Data Center Amazon EBS Atlassian Bitbucket Data Center Amazon RDS As you can see, we use AWS as the cloud provider for our Kubernetes setup. We create all the resources with Terraform. We’ve written a separate blog post on what our Kubernetes setup exactly looks like. You can read it here ! The image below should give you a general idea. The next diagram should give you an idea about the setup of our Atlassian Data Center. While there are a few differences between the products and setups, the core remains the same. The application is launched as one or more pods described by a StatefulSet. The pods are called node-0 and node-1 in the diagram above. The first request is sent to the load balancer and will be forwarded to either the node-0 pod or the node-1 pod. Traffic is sticky, so all subsequent traffic from that user will be sent to node-1. Both pod-0 and pod-1 require persistent storage which is used for plugin cache and indexes. A different Amazon EBS volume is mounted on each of the pods. Most of the data like your JIRA issues, Confluence spaces, … is stored in a database. The database is shared, node-0 and node-1 both connect to the same database. We usually use PostgreSQL on Amazon RDS. The node-0 and node-1 pod also need to share large files which we don’t want to store in a database, for example attachments. The same Amazon EFS volume is mounted on both pods. When changes are made, for example an attachment is uploaded to an issue, the attachment is immediately available on both pods. We use CloudFront (CDN) to cache static assets and improve the web response times. The benefits of this setup By using this setup, we can leverage the advantages of Docker and Kubernetes and the Data Center versions of the Atlassian tooling. There are a lot of benefits to this kind of setup, but we’ve listed the most important advantages below. It’s a self-healing platform : containers and worker nodes will automatically replace themselves when a failure occurs. In most cases, we don’t even have to do anything and the stack takes care of itself. Of course, it’s still important to investigate any failures so you can prevent them from occurring in the future. Exactly zero downtime deployments : when upgrading the first node within the cluster to a new version, we can still serve the old version to our customers on the second. Once the upgrade is complete, the new version is served from the first node and we can upgrade the second node. This way, the application stays available, even during upgrades. Deployments are predictable : we use the same Docker container for development, staging and production. It’s why we are confident the container will be able to start in our production environment after a successful deploy to staging. Highly available applications: when failure occurs on one of the nodes, traffic can be routed to the other node. This way you have time to investigate the issue and fix the broken node while the application stays available. It’s possible to sync data from one node to the other . For example, syncing the index from one node to the other to fix a corrupt index can be done in just a few seconds, while a full reindex can take a lot longer. You can implement a high level of security on all layers (AWS, Kubernetes, application, …) AWS CloudTrail prevents unauthorized access on AWS and sends an alert in case of anomaly. AWS Config prevents AWS security group changes. You can find out more on how to secure your cloud with AWS Config in our blog post. Terraform makes sure changes on the AWS environment are approved by the team before rollout. Since upgrading Kubernetes master and worker nodes has little to no impact, the stack is always running a recent version with the latest security patches. We use a combination of namespacing and RBAC to make sure applications and deployments can only access resources within their namespace with least privilege . NetworkPolicies are rolled out using Calico. We deny all traffic between containers by default and only allow specific traffic. We use recent versions of the Atlassian applications and implement Security Advisories whenever they are published by Atlassian. Interested in leveraging the power of Kubernetes yourself? You can find more information about how we can help you on our website! {% module_block module "widget_3d4315dc-144d-44ec-b069-8558f77285de" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Apply the power of Kubernetes"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":null,"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %} Apply the power of Kubernetes Problems we faced during the setup Migrating to this stack wasn’t all fun and games. We’ve definitely faced some difficulties and challenges along the way. By discussing them here, we hope we can facilitate your migration to a similar setup! Some plugins (usually older plugins) were only working on the standalone version of the Atlassian application. We needed to find an alternative plugin or use vendor support to have the same functionality on Atlassian Data Center. We had to make some changes to our Docker containers and network policies (i.e. firewall rules) to make sure both nodes of an application could communicate with each other. Most of the applications have some extra tools within the container. For example, Synchrony for Confluence, ElasticSearch for BitBucket, EazyBI for Jira, and so on. These extra tools all needed to be refactored for a multi-node setup with shared data. In our previous setup, each application was running on its own virtual machine. In a Kubernetes context, the applications are spread over a number of worker nodes. Therefore, one worker node might run multiple applications. Each node of each application will be scheduled on a worker node that has sufficient resources available. We needed to implement good placement policies so each node of each application has sufficient memory available. We also needed to make sure one application could not affect another application when it asks for more resources. There were also some challenges regarding load balancing. We needed to create a custom template for nginx ingress-controller to make sure websockets are working correctly and all health checks within the application are reporting a healthy status. Additionally, we needed a different load balancer and URL for our BitBucket SSH traffic compared to our web traffic to the BitBucket UI. Our previous setup contained a lot of data, both on filesystem and in the database. We needed to migrate all the data to an Amazon EFS volume and a new database in a new AWS account. It was challenging to find a way to have a consistent sync process that also didn’t take too long because during migration, all applications were down to prevent data loss. In the end, we were able to meet these criteria and were able to migrate successfully. Monitoring our Atlassian stack We use the following tools to monitor all resources within our setup Datadog to monitor all components created within our stack and to centralize logging of all components. You can read more about monitoring your stack with Datadog in our blog post here . NewRelic for APM monitoring of the Java process (Jira, Confluence, Bitbucket) within the container. If our monitoring detects an anomaly, it creates an alert within OpsGenie . OpsGenie will make sure that this alert is sent to the team or the on-call person that is responsible to fix the problem. If the on-call person does not acknowledge the alert in time, the alert will be escalated to the team that’s responsible for that specific alert. Conclusion In short, we are very happy we migrated to this new stack. Combining the benefits of Kubernetes and the Atlassian Data Center versions of Jira, Confluence and BitBucket feels like a big step in the right direction. The improvements in self-healing, deploying and monitoring benefits us every day and maintenance has become a lot easier. Interested in your own Atlassian Stack? Do you also want to leverage the power of Kubernetes? You can find more information about how we can help you on our website! Our Atlassian hosting offering

Read more
kubernetes setup
kubernetes setup
Reading time 6 min
6 MAY 2025

At ACA, we live and breathe Kubernetes. We set up new projects with this popular container orchestration system by default, and we’re also migrating existing customers to Kubernetes. As a result, the amount of Kubernetes clusters the ACA team manages, is growing rapidly! We’ve had to change our setup multiple times to accommodate for more customers, more clusters, more load, less maintenance and so on. From an Amazon ECS to a Kubernetes setup In 2016, we had a lot of projects that were running in Docker containers. At that point in time, our Docker containers were either running in Amazon ECS or on Amazon EC2 Virtual Machines running the Docker daemon. Unfortunately, this setup required a lot of maintenance. We needed a tool that would give us a reliable way to run these containers in production. We longed for an orchestrator that would provide us high availability, automatic cleanup of old resources, automatic container scheduling and so much more. → Enter Kubernetes ! Kubernetes proved to be the perfect candidate for a container orchestration tool. It could reliably run containers in production and reduce the amount of maintenance required for our setup. Creating a Kubernetes-minded approach Agile as we are, we proposed the idea for a Kubernetes setup for one of our next projects. The customer saw the potential of our new approach and agreed to be part of the revolution. At the beginning of 2017, we created our first very own Kubernetes cluster. At this stage, there were only two certainties: we wanted to run Kubernetes and it would run on AWS . Apart from that, there were still a lot of questions and challenges. How would we set up and manage our cluster? Can we run our existing docker containers within the cluster? What type of access and information can we provide the development teams? We’ve learned that in the end, the hardest task was not the cluster setup. Instead, creating a new mindset within ACA Group to accept this new approach, and involving the development teams in our next-gen Kubernetes setup proved to be the harder task at hand. Apart from getting to know the product ourselves and getting other teams involved as well, we also had some other tasks that required our attention: we needed to dockerize every application, we needed to be able to setup applications in the Kubernetes cluster that were high available and if possible also self-healing, and clustered applications needed to be able to share their state using the available methods within the selected container network interface. Getting used to this new way of doing things in combination with other tasks, like setting up good monitoring, having a centralized logging setup and deploying our applications in a consistent and maintainable way, proved to be quite challenging. Luckily, we were able to conquer these challenges and about half a year after we’d created our first Kubernetes cluster, our first production cluster went live (August 2017). These were the core components of our toolset anno 2017: Terraform would deploy the AWS VPC, networking components and other dependencies for the Kubernetes cluster Kops for cluster creation and management An EFK stack for logging was deployed within the Kubernetes cluster Heapster, influxdb and grafana in combination with Librato for monitoring within the cluster Opsgenie for alerting Nice! … but we can do better: reducing costs, components and downtime Once we had completed our first setup, it became easier to use the same topology and we continued implementing this setup for other customers. Through our infrastructure-as-code approach (Terraform) in combination with a Kubernetes cluster management tool (Kops), the effort to create new clusters was relatively low. However, after a while, we started to notice some possible risks related to this setup. The amount of work required for the setup and the impact of updates or upgrades on our Kubernetes stack was too large. At the same time, the number of customers that wanted their very own Kubernetes cluster was growing. So, we needed to make some changes to reduce maintenance effort on the Kubernetes part of this setup to keep things manageable for ourselves. Migration to Amazon EKS and Datadog At this point the Kubernetes service from AWS (Amazon EKS) became generally available. We were able to move all things that are managed by Kops to our Terraform code, making things a lot less complex. As an extra benefit, the Kubernetes master nodes are now managed by EKS. This means we now have less nodes to manage and EKS also provides us cluster upgrades with a touch of the button. Apart from reducing the workloads on our Kubernetes management plane, we’ve also reduced the number of components within our cluster. In the previous setup we were using an EFK (ElasticSearch, Fluentd and Kibana) stack for our logging infrastructure. For our monitoring, we were using a combination of InfluxDB, Grafana, Heapster and Librato. These tools gave us a lot of flexibility but required a lot of maintenance effort, since they all ran within the cluster. We’ve replaced them all with Datadog agent, reducing our maintenance workloads drastically. Upgrades in 60 minutes Furthermore, because of the migration to Amazon EKS and the reduction in the number of components running within the Kubernetes cluster, we were able to reduce the cost and availability impact of our cluster upgrades. With the current stack, using Datadog and Amazon EKS, we can upgrade a Kubernetes cluster within an hour. If we were to use the previous stack, it would take us about 10 hours on average. So where are we now? We currently have 16 Kubernetes clusters up and running , all running the latest available EKS version. Right now, we want to spread our love for Kubernetes wherever we can. Multiple project teams within ACA Group are now using Kubernetes, so we are organizing workshops to help them get up to speed with the technology quickly. At the same time, we also try to catch up with the latest additions to this rapidly changing platform. That’s why we’ve attended the Kubecon conference in Barcelona and shared our opinions in our Kubecon Afterglow event. What’s next? Even though we are very happy with our current Kubernetes setup, we believe there’s always room for improvement . During our Kubecon Afterglow event, we’ve had some interesting discussions with other Kubernetes enthusiasts. These discussions helped us defining our next steps, bringing our Kubernetes setup to an even higher level. Some things we’d like to improve in the near future: add service mesh to our Kubernetes stack, 100% automatic worker node upgrades without application downtime. Of course, these are just a few focus points. We’ll implement many new features and improvements whenever they are released! What about you? Are you interested in your very own Kubernetes cluster? Which improvements do you plan on making to your stack or Kubernetes setup? Or do you have an unanswered Kubernetes question we might be able to help you with? Contact us at cloud@aca-it.be and we will help you out! {% module_block module "widget_7e6bdbd6-406c-4a0a-8393-27a28f436c6d" %}{% module_attribute "buttons" is_json="true" %}{% raw %}[{"appearance":{"link_color":"light","primary_color":"primary","secondary_color":"primary","tertiary_color":"light","tertiary_icon_accent_color":"dark","tertiary_text_color":"dark","variant":"primary"},"content":{"arrow":"right","icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"tertiary_icon":{"alt":null,"height":null,"loading":"disabled","size_type":null,"src":"","width":null},"text":"Our Kubernetes services"},"target":{"link":{"no_follow":false,"open_in_new_tab":false,"rel":"","sponsored":false,"url":{"content_id":null,"href":"https://www.acagroup/be/en/services/kubernetes","href_with_scheme":"https://www.acagroup/be/en/services/kubernetes","type":"EXTERNAL"},"user_generated_content":false}},"type":"normal"}]{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}{}{% endraw %}{% end_module_attribute %}{% module_attribute "definition_id" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "field_types" is_json="true" %}{% raw %}{"buttons":"group","styles":"group"}{% endraw %}{% end_module_attribute %}{% module_attribute "isJsModule" is_json="true" %}{% raw %}true{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}201493994716{% endraw %}{% end_module_attribute %}{% module_attribute "path" is_json="true" %}{% raw %}"@projects/aca-group-project/aca-group-app/components/modules/ButtonGroup"{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "smart_objects" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "smart_type" is_json="true" %}{% raw %}"NOT_SMART"{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "type" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% module_attribute "wrap_field_tag" is_json="true" %}{% raw %}"div"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Read more
Reading time 6 min
16 JUN 2022

I started writing this blog post the day after I came home from KubeCon and CloudNativeCon 2022. The main thing I noticed was that the content of the talks has changed over the last few years. Kubernetes’ new challenges When looking at the topics of this year’s KubeCon / CloudNativeCon, it feels like a lot of questions about Kubernetes, types of cloud, logging tools and more are answered for most companies. This makes sense, because more and more organizations have already successfully adopted Kubernetes. Kubernetes is no longer considered the next big thing, but rather the logical choice. However, we’ve noticed (during the talks, but also in our own journey) that new problems and challenges have arisen, leading to other questions: How can I implement more automation? How can I control/lower the costs for these setups? Is there a way to expand on whatever exists and add my own functionalities to Kubernetes? One of the possible ways to add functionalities to Kubernetes is using Operators. In this blog post, I will briefly explain how Operators work. How Operators work The concept of an operator is quite simple. I believe the easiest way to explain it is by actually installing an operator. Within ACA, we use the istio operator. The exact steps of installing depends on the operator you are installing, but usually they’re quite similar. First, install the istioctl binary on the machine that has access to the Kubernetes api. The next step is to run the command to install the operator. curl -sL https://istio.io/downloadIstioctl | sh - export PATH=$PATH:$HOME/.istioctl/bin istioctl operator init Default This will create the operator resource(s) in the istio-system namespace. You should see a pod running. kubectl get pods -n istio-operator NAMESPACE NAME READY STATUS RESTARTS AGE istio-operator istio-operator-564d46ffb7-nrw2t 1/1 Running 0 20s kubectl get crd NAME CREATED AT istiooperators.install.istio.io 2022-05-21T19:19:43Z Default As you can see, a new CustomResourceDefinition called istiooperators.install.istio.io is created. This is a blueprint that specifies how resource definitions should be added to the cluster. To create config, we need to know what ‘kind’ of config the CRD expects to be created. kubectl get crd istiooperators.install.istio.io -oyaml … status: acceptedNames: kind: IstioOperator … Default Let’s create a simple config file. kubectl apply -f - EOF apiVersion: install.istio.io/v1alpha1 kind: IstioOperator metadata: namespace: istio-system name: istio-controlplane spec: profile: minimal EOF Default Once the ResourceDefinition that contains the configuration is added to the cluster, the operator will make sure the resources in the cluster match whatever is defined in the configuration. You’ll see that new resources are created. kubectl get pods -A istio-system istiod-7dc88f87f4-rsc42 0/1 Pending 0 2m27s Default Since I run a small kind cluster, the istiod pod can’t be scheduled and is stuck in a Pending state. Let me explain the process first before changing this. The istio-operator will keep watching the IstioOperator configuration file for changes. If changes are made to the file, it will only make the changes that are required to update the resources in the cluster to match the state specified in the configuration file. This behavior is called reconciliation . Let’s watch the IstioOperator configuration file status. Note that it’s created in the istio-system namespace. kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane RECONCILING 3m Default As you can see, this is still reconciling, because the pod can’t start. After some time, it’ll go in an ERROR state. kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane ERROR 6m58s Default You can also check the istio-operator log for useful information. kubectl -n istio-operator logs istio-operator-564d46ffb7-nrw2t --tail 20 - Processing resources for Istiod. - Processing resources for Istiod. Waiting for Deployment/istio-system/istiod ✘ Istiod encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition. Since I’m running a small demo cluster, I’ll update the memory limit so the POD can be scheduled. This is done within the spec: part of the IstioOperator definition. kubectl -n istio-system edit istiooperator istio-controlplane spec: profile: minimal components: pilot: k8s: resources: requests: memory: 128Mi The istiooperator will go back to a RECONCILING state. kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane RECONCILING 11m Default And after some time, it becomes HEALTHY . kubectl get istiooperator -n istio-system NAME REVISION STATUS AGE istio-controlplane HEALTHY 12m Default You can see the istiod pod is running. NAMESPACE NAME READY STATUS istio-system istiod-7dc88f87f4-n86z9 1/1 Running Default Apart from the istiod deployment, a lot of new CRDs are added as well. authorizationpolicies.security.istio.io 2022-05-21T20:08:05Z destinationrules.networking.istio.io 2022-05-21T20:08:05Z envoyfilters.networking.istio.io 2022-05-21T20:08:05Z gateways.networking.istio.io 2022-05-21T20:08:05Z istiooperators.install.istio.io 2022-05-21T20:07:01Z peerauthentications.security.istio.io 2022-05-21T20:08:05Z proxyconfigs.networking.istio.io 2022-05-21T20:08:05Z requestauthentications.security.istio.io 2022-05-21T20:08:05Z serviceentries.networking.istio.io 2022-05-21T20:08:05Z sidecars.networking.istio.io 2022-05-21T20:08:05Z telemetries.telemetry.istio.io 2022-05-21T20:08:05Z virtualservices.networking.istio.io 2022-05-21T20:08:05Z wasmplugins.extensions.istio.io 2022-05-21T20:08:06Z workloadentries.networking.istio.io 2022-05-21T20:08:06Z workloadgroups.networking.istio.io 2022-05-21T20:08:06Z Default How the operator works - summary As you can see, this is a very easy way to quickly set up istio within our cluster. In short, these are the steps: Install the operator One (or more) CustomResourceDefinitions is added that provides a blueprint for the objects that can be created/managed. A deployment is created, which in turn creates a Pod that monitors the Configurations of the kinds that are specified by the CRD. The user adds configuration to the cluster, with its type specified by the CRD. The operator POD notices the new configuration and takes all steps that are required to make sure the cluster is in the desired state specified by the configuration. Benefits of the operator approach The operator approach makes it easy to package a set of resources like Deployments, Jobs, CustomResourceDefinitions. This way, it’s easy to add additional behavior and capabilities to Kubernetes. There’s a library which lists the available operators which can be found at https://operatorhub.io/ , counting 255 operators at the moment of writing. The operators are usually installed with just a few commands or lines of code. It’s also possible to create your own operators. It might make sense to package a set of deployments, jobs, CRDs, … that provide a specific functionality as an operator. The operator can be handled as operators and use pipelines for CVE validations, E2E tests, rollout to test environments, and more before a new version is promoted to production. Pitfalls We have been using Kubernetes for a long time within the ACA Group and have collected some security best-practices during this period. We’ve noticed that one-file-deployments and helm charts from the internet are usually not as well configured as we want them to be. Think about RBAC rules that give too many permissions, resources not currently namespaced or containers running as root. When using operators from operatorhub.io, you basically trust the vendor or provider to follow security best-practices. However … one of the talks at KubeCon 2022 that made the biggest impression on me, stated that a lot of the operators have issues regarding security. I would suggest you to watch Tweezering Kubernetes Resources: Operating on Operators - Kevin Ward, ControlPlane before installing. Another thing we’ve noticed is that using operators can speed up the process to implement new tools and features. Be sure to read the documentation that was provided by the creator of an operator before you dive into advanced configuration. It might be possible that not all features are actually implemented on the CRD that is created by the operator. However, it is bad practice to directly manipulate the resources that were created by the operator. The operator is not tested against your manual changes and this might cause inconsistencies. Additionally, new operator versions might (partly) undo your changes, which also might cause problems. At that point, you’re basically stuck, unless you create your own operator that provides additional features. We’ve also noticed that there is no real ‘rule book’ on how to provide CRDs and documentation is not always easy to find or understand. Conclusion Operators are currently a hot topic within the Kubernetes community. The number of available operators is growing fast, making it easy to add functionality to your cluster. However, there is no rule book or a minimal baseline of quality. When installing operators from the operatorhub, be sure to check the contents or validate the created resources on a local setup. We expect to see some changes and improvements in the near future, but at this point they can be very useful already. AUTHOR Bregt Coenen

Read more