As a modern digital agency, we always strive to keep educating ourselves and grow with our environment. One technology that has rapidly gained momentum in recent years and is now a term familiar to everyone is Artificial Intelligence (AI), or more precisely, Machine Learning. This technology has recently penetrated numerous industries, making workflows simpler, more efficient, and cost-effective. We recently had the opportunity to develop an ML project from scratch for a client and test it in practice. Here's an experience report.
What is Machine Learning Anyway?
Machine Learning is a discipline within the broader field of "artificial intelligence." Traditionally in software development, a developer wrote rules that a computer could use to process inputs and produce outputs. In the classical approach, thus, the inputs and rules were always known in advance, while the outputs were the unknowns in the equation.
Machine Learning turns this concept somewhat on its head: Instead of providing the computer with inputs and rules, we leave the rules as unknowns and only provide the computer with the inputs along with their corresponding outputs. This might sound a bit strange initially, but it's easy to explain, at least superficially. We feed a Machine Learning model with thousands of input-output combinations (training data), upon which the computer "learns" the rules for deducing an output from a given input. It does this by examining the training data in numerous iterations (also known as "epochs"), slightly altering the weighting of connections between the individual neurons defined in the AI model. Thus, a Machine Learning model functions similarly to the human brain, but is much smaller and less complex. An AI model cannot think independently or become empathetic. Instead, it can only learn and operate within predefined boundaries, i.e., the provided training data. If everything goes right during training, an optimal AI model can generalize from the given training data, applying the learned rules to new, unseen inputs to arrive at conclusive results. To achieve this, numerous components are essential, including the quality and quantity of training data, its preparation, the number of epochs, the hyperparameters (which we'll get to later), and much more.
Football is Pure Chance! Or Is It?
A while ago, a potential client came to us with a clear task: "Develop a football formula!" Okay, granted: maybe not exactly in those words, but the aim of the project was to calculate winning probabilities in football for an international sports platform. This principle might already be familiar to some in the realm of sports betting. Betting companies (bookmakers) calculate probabilities for various game events for each game that can be bet on their platform, such as the winner of the match or the number of goals. Bookmakers use a variety of algorithms for this purpose. However, the principle of machine learning is still relatively new.
The question the client wanted to explore and answer with this project was: "Can a Machine Learning model beat the bookmaker?" After agile refinement of the project requirements and collecting thousands of datasets from past matches in various football leagues, including the Bundesliga, we started implementing the solution.
The First Test Model
To familiarize ourselves with the data and make initial attempts, we set up a first test model. Before diving into the actual development of the Machine Learning model, it is crucial to prepare the training data so a neural network can actually work with it. For the first model, we used the following data points as input parameters:
- Home team ID
- Away team ID
- Arena ID
- Season (year)
- Matchday
- Weekday
- Evening match
- Home team league points
- Away team league points
In our view, this represents a solid foundation to make initial, very simple predictions about the outcome of a match. Identifiers like those for the teams and the arena were "one-hot encoded." If we didn't encode them this way, an ML model might consider them as simple numeric values and draw false conclusions. Therefore, it's important to separate them clearly, often achieved in practice through one-hot encoding, transforming data values into a one-dimensional array of zeros and ones, where exactly one value is a one, and all other values are zero. Other values like the matchday or league points were retained as numeric values.
Additionally, for simplicity's sake, we removed all datasets from this model where one or more of these data points were missing. This ensured that the ML model learns with the "purest" and simplest data possible.
Designing the Model
The next step was to design the actual ML model. Again, for the first prototype, we opted for the simplest possible model to optimize and fine-tune it best afterward. For the development of the model, we used Google's well-known framework Tensorflow and the related or abstracting framework Keras. Keras makes it very easy to design simple models with "predefined" network layers. Without going too deep into the technical side, suffice it to say that after several attempts, we ended up with a model featuring an incoming flatten layer (which converts the combination of one-hot encoded and numeric values into a simple, one-dimensional array), two hidden dense layers with associated dropouts, and an output layer.
Results and Evaluation
Fortunately, just as the first prototype was completed, the new season of the Bundesliga began. A perfect moment to try out the AI live. After everyone in the agency placed personal bets, it was AI's turn. Initially, an admittedly simple game was up: Eintracht Frankfurt vs. FC Bayern Munich. The AI predicted a 66% chance of a win for Munich. Had Munich had the home advantage instead, the AI predicted a win probability of over 75%. Logical - and ultimately correct. Bayern won the game 1:6. Interestingly, the predicted win probability for Munich continued to rise with each matchday. This signaled that the AI was missing crucial data at the start of the season to make truly convincing predictions.
The graphical evaluation based on test data (i.e., datasets that the AI did not see during the training process but were only used for model evaluation) was rather sobering, albeit totally expected:
You can see here that the precision of the AI's predictions converges around 50%. Thus, the precision is definitely much higher than the pure random chance of 33% with three possible match outcomes (home win, draw, away win). However, no war can be won against the bookmakers' odds with 50%.
The Second Model "Jarvis"
So back to the drawing board. Based on our experiences with the first model, it was fine-tuned, the data points used were adjusted, and the data preparation was slightly modified. As this model is the one used productively by the customer, we, of course, cannot disclose too many details here. However, we supplemented the data points, for example, with the starting line-ups and numerous details about the individual players. Thus, the model has the opportunity to consider and compare the line-ups against each other. The age of the datasets was additionally limited, so no datasets from very old seasons, such as 2011, flowed into the training.
Finally, we fine-tuned the model's hyperparameters. The optimal hyperparameters, such as the optimizer used, the number of hidden layers in the model, or the loss function used, are the subject of numerous discussions in developer communities and science and are individual for each model. A popular and simple way to optimize them is through an automatic tuner like "KerasTuner." When using KerasTuner, the model is trained with varying hyperparameters, which are constantly adjusted through specific algorithms. This creates the optimal conditions for the model's functionality.
After successfully expanding the data points used and fine-tuning the hyperparameters, the model could fully convince. Our best model achieved a precision (accuracy) on the validation data of over 71%. Thus, this model was about 42% better than our first model - a complete success. With 71%, something else was achieved: The winning probability of the favorite of some selected bookmakers consistently fell just below 71%. Therefore, our model could achieve better values than the algorithms used by bookmakers.
Of course, we then had to give the artificial intelligence a name: it was lovingly named Jarvis, after Tony Stark's artificial intelligence in the "Iron Man" films.
Takeaways and Conclusion
This practical project shows what successes can already be achieved with simple AI models in complex markets and what dimensions can be reached through the optimization of used training data and hyperparameters. The development of Machine Learning models will accompany our agency life more intensively in the coming years - we are prepared for it.