In Part 1. The Scientific Method, we explored the scientific method's application to solving business challenges. In Part 2, we dive into a real-world problem and demonstrate how the scientific method is employed at ACA to revolutionize train arrival detection for a client.
In Part 1. The Scientific Method, we explored the scientific method's application to solving business challenges. In Part 2, we dive into a real-world problem and demonstrate how the scientific method is employed at ACA to revolutionize train arrival detection for a client.
The journey began with a client's question: "Can we accurately detect train arrivals and departures at a platform within a few seconds?" To address this, we applied the scientific method's iterative approach, encompassing five key steps:
Solving a question like the above usually takes a few iterations to come to a fully satisfying result.
The research question is the following: "Is it possible to detect the arrival and departure of a train at a platform with the time resolution of a few seconds using a smartphone?" In the first step it is important to analyze the problem: understanding the properties of arrival and departure. Taking into account the project limitations, these properties can be measured using the GPS functionality in a smartphone. Furthermore, the velocity is accurately available in the received data as it is measured using the Doppler shift of the carrier frequencies.
In this case, we could rely on established knowledge and assumptions, providing a solid starting point for problem-solving. If no known truths exist, you should conduct preliminary investigations, in this case we had earlier work to rely on.
Our hypothesis: “When we collect the position and velocity with aGPS, we will be able to reliably determine the arrival and departure at a platform based on the position and velocity characteristics, and with a good time resolution.”
The hypothesis is measurable, which is important to setup an experiment where labeled data points can be compared to the predicted data.
In step 3, we set up a proof of concept focusing on essential data and the proposed solution to avoid unnecessary details before proving the hypothesis. We developed an app storing aGPS info and velocity labeled for driving or standing still. Knowing aGPS limitations, we applied filters using accelerometer data, train constraints, and predicted routes, significantly improving location accuracy. This resulted in a significant improvement of the location data.
The results of the experiments:
Despite good initial results, the proposed solution didn't meet the essential time resolution requirement. Constructing a system with a consistent output rate becomes unfeasible while driving within a massive Faraday cage capable of blocking signals.
The first iteration didn't meet expectations, but that didn't discourage us. We realized the oversight in time resolution. Furthermore, during that time at ACA we successfully applied new techniques on a similar problem.
The accelerometer provided reliable data, and by combining it with the gyroscope and magnetometer, we obtained directional acceleration. Anticipating distinct acceleration patterns for different transportation modes and states, such as standstill and driving, we decided to leverage a Naïve Bayes algorithm. This supervised learning algorithm creates a probabilistic classifier, predicting standstill and driving based on measured properties.
A new hypothesis emerged: "Naïve Bayes classifier can be used to differentiate between walking acceleration pattern and its super position with a train in motion acceleration pattern."
An application was created to collect data that was labeled with the correct state. Then, we trained the Naïve Base classifier using various data features, such as maximum, minimum, mean, norm, standard deviation, distance to the platform, and minimum required velocity.
Multiple classifiers were trained, tested, and assessed using confusion matrices. The results showed driving was correctly identified 94% of the time, but standstill was only identified accurately 48% of the time, with the classifier mistakenly labeling standstill as driving in 52% of cases.
The Naïve Bayes classifiers lacked the needed accuracy for our problem. The features had significant overlap, preventing differentiation between standstill and driving. Additionally, the classifiers couldn't capture the transient nature of arrival and departure accurately, prompting further exploration for a solution.
The previous experiments revealed that velocity and position lacked sufficient time resolution, and probabilistic models struggled with the transient nature of arrival and departure. Recognizing the importance of the transient aspect, it's evident that properties like velocity or position aren't crucial; instead, acceleration is key. Arrival can be defined as a deceleration followed by a period of no acceleration, while departure is acceleration preceded by a period of no acceleration.
A new hypothesis was formulated: “It is possible to use the transient nature of arrival and departure at a platform to detect the arrival and departure with a good time resolution and good accuracy.”
To focus on using accelerometer data for arrival and departure detection, we utilized data from the previous iteration, entering a new research domain aiming to build a signal analysis algorithm for classification.
In the following steps, multiple graphs are added. The red background indicates that the train was at the platform, green indicates that it was in motion.
Take a look at the graph of the Y-channel of the accelerometer. We chose not to project the acceleration onto the real-world axis initially to keep it in its raw form, minimizing the required computing power.
There wasn’t much to tell our program with this data, it looks mostly random.
Data analysis occurred primarily during the experiment. Subsequently, our algorithm was tested with unused data, revealing some false positives but no false negatives. This facilitates data optimization. While data cannot be generated from nothing, combining measurements with aGPS data helps filter out false positives.
The algorithm that was developed in this iteration of the scientific method, proved our last hypothesis.
“It is possible to use the transient nature of arrival and departure at a platform to detect the arrival and departure with a good time resolution and good accuracy.”
In this blog post, we detailed the complete process to solving a complex problem with precision. Through iterations, learning from failures, and discarding assumed knowledge, we crafted a reliable solution.
The algorithm, combining known mathematics with aGPS data, can accurately detect train arrival and departure at a platform using only a smartphone, achieving a time resolution of less than 1 second.