Written by
Stijn Van den Enden
Stijn Van den Enden
Stijn Van den Enden
All blog posts
AWS machine learning models
AWS machine learning models
Reading time 7 min
8 MAY 2025

In my previous blog post, I emphasized the importance of gathering data. However, in some cases you might not have any suitable data available. You might have raw data, but perhaps it’s unlabeled and unfit for machine learning. If you don’t have the financial resources to label this data, or you don’t have a product like reCAPTCHA to do it for free, there’s another option. Since Amazon launched its Amazon Web Services cloud platform as a side-business over a decade ago, it has grown at a tremendous pace. AWS offers now more than 165 services, giving anyone, from startups to multinational corporations, access to a dependable and scalable technical infrastructure. Some of these services offer what we call pre-trained machine learning models. Amazon’s pre-trained machine learning models can recognize images or objects, process text, give recommendations and more. The best part of it all is that you are able to use services based on Deep Learning without having to know anything about machine learning at all. These services are trained by Amazon, using data from its websites, its massive product catalog and its warehouses. The information on the AWS websites might be a bit overwhelming at first. That’s why in this blog post I would like to give an overview of a few services using Amazon’s machine learning models, which I think can easily be introduced into your applications. Computer vision with Rekognition Amazon Rekognition is a service that analyzes images and videos. You can use this service to identify people’s faces, everyday objects or even different celebrities. Practical uses are adding labels to videos, for instance following the ball during a football match, or picking out celebrities in an audience. Since Rekognition also has an API to compare similarities between persons in multiple images, you can use it to verify someone’s identity or automatically tag friends on social media. Speaking of social media: depending on the context of a platform, some user contributions might not be deemed acceptable. Through Rekognition, a social media platform can semi-automatically control suggestive or explicit content, giving the opportunity to blur or deny uploaded media when certain labels are associated with it. Digitalize archives with Textract Amazon Textract allows you to extract text from a scanned document. It uses Optical Character Recognition (OCR) and goes a step further by taking context into account. If your company receives a lot of printed forms instead of their digital counterpart, you might have a few thousand papers you need to digitalize manually. With regular OCR, it’s challenging to detect where a form label ends and a form field begins. Likewise it would be difficult for OCR to read newspapers, when text is placed in two or more columns. Textract is able to identify which group of words belong together, whether it’s a paragraph, a form field or a data table, helping you to reduce the time and effort you need to digitalize those archives. Analyze text with Comprehend Amazon Comprehend is a Natural Language Processing (NLP) service. It helps you discover the subject of a document, key phrases, important locations, people mentioned and more. One of its features is to analyze sentiment in a text. This can give you a quick insight in interactions with customers: are they happy, angry, satisfied? Amazon Comprehend can even highlight similar frustrations around a certain topic. If reviews around a certain product are automatically found to be mostly positive, you could easily incorporate this in a promotional campaign. Similarly if reviews are mostly negative, that might be something to forward to the manufacturer. A subservice of Comprehend called Comprehend Medical, is used to mine patient records and extract patient data and treatment information. Its goal is to help health care providers to quickly get an overview of previous interactions with a patient. By identifying key information from medical notes and adding some structure to it, Comprehend Medical assists medical customers to process a ton of documents in a short period of time. Take notes with Transcribe Amazon Transcribe is a general-purpose service to convert speech to text, with support for 14 languages. It automatically adds punctuation and formatting, making the text easier to read and search through. A great application for this is creating a transcript from an audio file and sending it to Comprehend for further analysis. A call center could use real-time streaming transcription to detect the name of a customer and present their information to the operator. Alternatively, the call center could label conversations with keywords to analyze which issues arise frequently. One of Transcribe’s features is to identify multiple speakers. This is useful for transcribing interviews or creating meeting minutes without having one of the meeting participants spend extra time jotting everything down. Multilingual with Translate When you’re getting reactions from customers on your products, you can translate them into your preferred language, so you can grasp subtle implications of certain words. Or you can extend your reach by translating your social media posts. You can even combine Transcribe and Translate to automatically generate subtitles for live events in multiple languages. Express yourself with Polly The Polly service can be considered the inverse of Transcribe. With Polly, you can convert text to speech, making the voice sound as close to natural speech as possible. With support for over 30 languages and many more lifelike voices, nothing is stopping you from making your applications talk back to you. Polly has some support for Speech Synthesis Markup Language (SSML), which gives you more control on how certain parts of the text are pronounced. Besides adding pauses, you can put emphasis on words, exchange acronyms with their unabbreviated form and even add breathing sounds. This amount of customization makes it possible to synthesize voice samples that sound very natural. Generating realistic speech has been a key factor to the success of apps like Duolingo , where pronunciation is of great significance. You can read about this particular use case in this blogpost . Bonus: if you don’t feel like reading, you can have it read to you by Polly! Make suggestions with Personalize When you look for any product on Amazon’s website , you immediately get suggestions for similar products or products that other customers have bought in combination. It’s mind blowing that out of the millions of items offered by Amazon, you get an accurate list of related products at the same moment the page loads. This powerful tool is available to you through Amazon Personalize . You need to provide an item inventory (products, documents, video’s, …), some demographic information about your users, and Personalize will combine this with an activity stream from your application to generate recommendations either in real-time or in bulk. This can easily be applied to a multitude of applications. You can present a list of similar items to customers of a webshop. A course provider would be able to suggest courses similar to a topic of interest. Found a restaurant you liked? Here’s a list of similar restaurants in your area. If you can provide the data, Personalize can provide the recommendations. Create conversations with Lex Amazon Lex is a service that provides conversational AI. It uses the same Natural Language Understanding technology as Amazon’s virtual assistant Alexa. Users can chat to your application instead of clicking through it. Everything starts with an intent . This defines the intention of the user, the goal we want to achieve for our user. It can be as simple as scheduling an appointment, providing directions to a location or getting a recipe that matches a list of ingredients. Intents are triggered by utterances . An utterance is something you say that has meaning. “I need an appointment with Dr. Smith”, “When can I see Dr Smith?”, “Is Dr. Smith available next week Wednesday?” are all utterances for the same intent: making an appointment. Lex is powerful enough to generalize these utterances so that slight variations can also trigger the correct intent. Finally, in the case of registering an appointment, you need to specify a few slots , pieces of data required for the user to provide in order to fulfill the intent. In the case of the example above, the name of the person you want to see, the time period and perhaps the reason of your visit. Even though the requirements are pretty simple, everything depends on the quality of the utterances and the chaining of intents. If you don’t have enough sample sentences or the conversation keeps asking information that the user already presented, your user will end up frustrated and overwhelmed. Predict demand with Forecast A fairly new service provided by AWS is called Forecast . This service also emerged from Amazon’s own necessity to estimate the demand for their immense product inventory. With Forecast, you can get insight in historical time series data. For instance, you could analyze the energy consumption of a region to project it to the near future. This gives you a probability of what the electricity demand tomorrow would be. Likewise, you might be able to predict that a component of your production facility needs maintenance before it wears out. Forecast can leverage Automated Machine Learning (AutoML) to find the optimal learning parameters to fit your use case. The quality of this services depends on the amount and quality of the data you can provide. This service used to be only available to a select group until very recently, but is now available to everyone. You can sign up for Forecast here . 🚀 Takeaway If you want to bring machine learning to your customers but are held back by a lack of understanding, Amazon offers out-of-the-box services to add intelligence to your applications. These services, trained and used by Amazon, can help your business grow and can give a personal experience to your customers, without any prior knowledge on machine learning.

Read more
machine learning
machine learning
Reading time 4 min
6 MAY 2025

Whether we unlock our phones with facial recognition, shout voice commands to our smart devices from across the room or get served a list of movies we might like… machine learning has in many cases changed our lives for the better. However, as with many great technologies, it has its dark side as well. A major one being the massive, often unregulated, collection and processing of personal data. Sometimes it seems that for every positive story, there’s a negative one about our privacy being at risk . It’s clear that we are forced to give privacy the attention it deserves. Today I’d like to talk about how we can use machine learning applications without privacy concerns and worrying that private information might become public . Machine learning with edge devices By placing the intelligence on edge devices on premise, we can ensure that certain information does not leave the sensor that captures it. An edge device is a piece of hardware that is used to process data closely to its source. Instead of sending videos or sound to a centralized processor, they are dealt with on the machine itself. In other words, you avoid transferring all this data to an external application or a cloud-based service. Edge devices are often used to reduce latency. Instead of waiting for the data to travel across a network, you get an immediate result. Another reason to employ an edge device is to reduce the cost of bandwidth. Devices that are using a mobile network might not operate well in rural areas. Self-driving cars, for example, take full advantage of both these reasons. Sending each video capture to a central server would be too time-consuming and the total latency would interfere with the quick reactions we expect from an autonomous vehicle. Even though these are important aspects to consider, the focus of this blog post is privacy. With the General Data Protection Regulation (GDPR) put in effect by the European Parliament in 2018, people have become more aware of how their personal information is used . Companies have to ask consent to store and process this information. Even more, violations of this regulation, for instance by not taking adequate security measures to protect personal data, can result in large fines. This is where edge devices excel. They can immediately process an image or a sound clip without the need for external storage or processing. Since they don’t store the raw data, this information becomes volatile. For instance, an edge device could use camera images to count the number of people in a room. If the camera image is processed on the device itself and only the size of the crowd is forwarded, everybody’s privacy remains guaranteed. Prototyping with Edge TPU Coral, a sub-brand of Google, is a platform that offers software and hardware tools to use machine learning. One of the hardware components they offer is the Coral Dev Board . It has been announced as “ Google’s answer to Raspberry Pi ”. The Coral Dev Board runs a Linux distribution based on Debian and has everything on board to prototype machine learning products. Central to the board is a Tensor Processing Unit (TPU) which has been created to run Tensorflow (Lite) operations in a power-efficient way. You can read about Tensorflow and how it helps enable fast machine learning in one of our previous blog posts . If you look closely at a machine learning process, you can identify two stages. The first stage is training a model from examples so that it can learn certain patterns. The second stage is to apply the model’s capabilities to new data. With the dev board above, the idea is that you train your model on cloud infrastructure. It makes sense, since this step usually requires a lot of computing power. Once all the elements of your model have been learned, they can be downloaded to the device using a dedicated compiler. The result is a little machine that can run a powerful artificial intelligence algorithm while disconnected from the cloud. Keeping data local with Federated Learning The process above might make you wonder about which data is used to train the machine learning model. There are a lot of publicly available datasets you can use for this step. In general these datasets are stored on a central server. To avoid this, you can use a technique called Federated Learning. Instead of having the central server train the entire model, several nodes or edge devices are doing this individually. Each node sends updates on the parameters they have learned, either to a central server (Single Party) or to each other in a peer-to-peer setup (Multi Party). All of these changes are then combined to create one global model. The biggest benefit to this setup is that the recorded (sensitive) data never leaves the local node . This has been used for example in Apple’s QuickType keyboard for predicting emojis , from the usage of a large number of users. Earlier this year, Google released TensorFlow Federated to create applications that learn from decentralized data. Takeaway At ACA we highly value privacy, and so do our customers. Keeping your personal data and sensitive information private is (y)our priority. With techniques like federated learning, we can help you unleash your AI potential without compromising on data security. Curious how exactly that would work in your organization? Send us an email through our contact form and we’ll soon be in touch.

Read more
ai
ai
Reading time 5 min
6 MAY 2025

In the near future, Artificial Intelligence (AI) will bring your company to the next level. Increasing productivity, use of resources, maintainability, staffing efficiency and much more. But before that can happen, you need to collect data and provide enough examples to train your AI algorithms. Whether your company is active in the financial sector or the medical sector, whether you’re focused on warehousing or garbage disposal, every company has one thing in common: data already flows through the organization. This blog post aims to make you aware of the importance of data collection as a stepping stone to Artificial Intelligence . Only when your data is visible, adequate, and complemented with external data and representative for your demographic, can you profit from positive opportunities that present themselves in today’s world and enables you to make better business decisions. What is Artificial Intelligence? Artificial Intelligence (AI) in its simplest form is the imitation of human intelligence by a machine. In other words, it enables programs to make human-like decisions and follow human-like reasoning. A popular subdomain of Artificial Intelligence is Machine Learning. Instead of explicitly programming a set of rules, Machine Learning applications deduct patterns from examples and ‘learn’ how things work. Unhide your data Accessible data can be put to good use. Surely somebody knows how many people are working for your company, how much inventory you keep, how much stock you’ve been moving over the last couple of months, and how your factory scores on efficiency and productivity. But what happens with this data once it has been acquired? A nice presentation to the board? Are these numbers stored somewhere in the cloud? Perhaps they are available in a centralized database? Or worst of all, perhaps they are in an Excel file on a private drive collecting dust? In many companies, only a limited number of people have access to certain assets. Since this implies that data is isolated from the rest of the organization, we call them information silos. Not only does this imply distrust in the organization, it provides a limitation to the team or application processing the data. For the same data, there might be different interpretations between teams, or a correlation between features might remain hidden because the data is distributed over different silos. There’s a big advantage when data is generally available in a standardized way. Not only can you rely on the trustworthiness of the source, you can guarantee a minimum of quality and completeness. If you build a company culture centered around data and start collecting that data in a uniformed way today, it will fuel your artificial intelligence tomorrow. Keep more than just YOUR data Although predicting the future is never certain, you can avoid surprises by incorporating external factors. For instance, when you’re selling electric cars, an increasing oil price might have a positive influence on your sales. A change of government policy on the other hand might have a negative influence. A heat wave might require that your employees get more breaks to prevent exhaustion, which has an influence on productivity. Even annotating data with company initiatives can be beneficial: marketing campaigns (hopefully) result in increased visibility of your organization and solutions, which leads to more sales. That’s why the numbers of your organization should be stored together with external facts and figures that impact the processes which are valuable for your business. A machine learning algorithm can easily consider these extra parameters to extract a connection between multiple sets of data. It’s able to make a distinction between seasonal effects, the effect of climatic conditions and a general trend of increasing sales. Centralizing decision-making around company data is important, but so is external data: the world around us changes constantly. Be prepared to collect a LOT of data. Be wary of biased data There are many examples of where data mining has wrongfully concluded the significance of a certain input feature. Having a complete representation of your inventory or customer base is vital to the impact of data analysis. Besides that, normalization of your input can prevent that your model ever becomes aware of unwanted features. A neural network designed to detect skin cancer was able to identify a correlation between the presence of a ruler next to a tumor when analysing pictures. In an attempt to classify wolves and huskies, scientists deliberately selected images with a specific background to train their algorithm. Thus proving that biased data leads to an inaccurate machine learning model. This is a difficulty that even experienced data scientists face. No wonder experts say they spend more time preparing the data than designing models and training them… " It makes more sense to worry about the data and be less picky about what algorithm to apply. " – Artificial Intelligence: A Modern Approach (S. Russell and P. Norvig) Even though collected data is very valuable for your company, you probably didn’t collect it with use for AI applications in mind. It therefore probably contains disruptive features which will influence the learning process. It’s vital to reflect on and asses your data collection from here on out if you want to prepare it for use in AI applications. Takeaway More and more companies are changing their process to be data-driven in order to have a competitive advantage. For one to understand how certain aspects influence your productivity, it’s important to collect high quality data. When your sources are reliable and you have a suitable application to present insightful patterns, you can use this to support business decisions. Today, the hard part is not collecting the data. There are enough tools that will help you do just that. The real challenge lies in the structuring and capturing of the right data . Finding a solution that fits for your specific case isn’t easy, but you can start by setting up a database or data warehouse, thinking about how you’ll structure your data, and then applying it. If you need help or if you have questions, click here to contact us and shoot us a message! Take action today, because knowing how to realize this takes time and practice. Prepare your company for a data-driven culture and start building knowledge on machine learning to leverage the potential benefit you gain from your data.

Read more
How we built an intelligent stock management system
How we built an intelligent stock management system
Reading time 5 min
5 MAY 2025

What’s the ideal store supply level for a product? How can we determine the number of future sales? Is it possible to reduce the number of product deliveries without going out-of-stock? ACA is building an intelligent stock management system for a customer with about 30 retail locations. Between these stores, that customer sells several thousand products. Who is able to answer the questions above for all these products for every store, at all times? Let me tell you the story of how we took a dive in historical data, combined some data science with machine learning and got some answers to the questions above. Gathering Data For each individual product in the catalogue, a shop manager has to determine the desired store supply. However, a human shop manager simply can’t take as many variables into account as an AI model. This results in more anticipation (a larger buffer of stock) and therefore higher costs for storage. Our goal is to help the shop manager figure out the right amount of stock with an intelligent stock management system powered by machine learning. By looking at evolution from the past, we can give an indication on how many goods will most likely be sold in the coming time period. "If one wants to define the future, they must study the past." – Confucius Before we started to think on a model for predicting product demand, we explored the sales data. From the application we are building, we had about 9 months of product history. We were able to consult legacy systems to supplement our data. These two combined gave us 21 months of sales data, which is still less than ideal. When you want to detect seasonal effects, you need multiple years’ worth of data. We decided to give it a go anyway. Our goal was to assist the people in their process, not to automate the current system based on the model’s predictions. Some products are more popular depending on the weather. For instance, de-icing salt is sold more during days with freezing conditions. The popularity of pumpkin lanterns peaks right before October 31st. This was also the case for some of the items in our client’s product catalogue. Depending on the type of product, the temperature, precipitation or hours of sunshine might be a factor in customer demand. So we gathered historical weather data for the location of the shop. Needless to say, store opening times, promotions for the product itself or similar products, variation of price and product unavailability all have an influence on customer behavior as well. Public holidays or sport events might also affect your business. By adding all these predictor variables , you can further improve the accuracy of a time-series model. Building the model Now that we found the necessary data, it’s time to start working on a model. Because a daily prediction was too fine grained for our client, we decided we would target a weekly prediction for a few reasons: while the uncertainty increases the further you look into the future, the running costs of a daily prediction are too high. there is not enough variation for articles that sell only 0 to 3 times a week. a weekly restock is ideal for most retailers and/or suppliers. However, a weekly prediction posed an additional challenge. On average there are 52,18 weeks in a year. That means that seasonal effects might take place somewhat later each year. There are advantages, too: a weekly prediction gave us the ability to include less popular products, which are not sold on a daily basis. We considered a few techniques on predicting time series. Because of the limited timeframe on the data, we went for a model based on structural time series. To implement the model we selected the STS module from TensorFlow Probability. Below is the result of a prediction from our model. The red line represents the number of articles in the store for a particular product. The blue line is the weekly-based prediction of our model, reduced with daily sold items for that product. Even though at some points we’re going out-of-stock, this gives a pretty good estimate of how much supplies the store needs in the coming week. Putting it all together It’s difficult to put an exact value on the cost of oversupply. By looking at the total articles stored per week, we gathered that our model would reduce the inventory carrying cost by almost 75%. Clearly, empty shelves in a store are not appealing from a customer’s point of view. But this information gives our client the opportunity to reduce the size and frequency of deliveries to an optimal point. In addition to the fact that the model gives a good prediction, we can get information on how much influence a feature has on the model’s prediction. A structural time series is represented as the sum of simpler components. This means we can actually see what effect the temperature has on sales. Furthermore, if we were to start a marketing campaign for this product, we can infer the causal impact. Basically, we can estimate how many products would have been sold when we didn’t have a promotion. There’s often a big challenge in explaining how a machine learning model exactly produces its target. With structural time series, we are able to point out which features have the biggest influence on the prediction. The graph above shows the influence of a season (13 weeks) on product sales. Even in this short time period, there’s a clear increase in sales in July. Takeaway There’s no easy way to predict the future. But by looking back in time, we can discover patterns which we can project forward. We used this technique to give one of our clients an idea of how much sales they might generate in the coming week(s). Going further, we can assume our model becomes more and more reliable as the historical data grows. I started this blog post with asking who would be able to determine the store supply of thousands of products in multiple stores. With a little nudge in the right direction from an intelligent stock management system, anybody can be that person.

Read more