Data Scouting in Football: How Moneyball Finds Hidden Talent

footballytics
19. Mai 2023
10 Min. Lesezeit

Aktualisiert: 4. Apr.

Yes, our scouting algorithm would have already found Kaoru Mitoma in 2020 at Kawasaki Frontale in the J1 League

Data-driven scouting is changing how football talent is identified and valued.

By combining traditional expertise with smart data models, clubs uncover hidden potential, reduce uncertainty and make better recruitment decisions.

If you don't know something well, or even if you haven't tried it intensively, it's difficult to assess its full potential. We show in this blog how we use our approach and different methods and mathematical functions, a smart and complementary data scouting to exploit as much as possible the full potential.

Greater reach in scouting through data-based player identification

The global talent pool can no longer be fully covered by traditional scouting alone. Our proven data-based player identification expands your global observation radius without tying up additional resources. Data-driven pre-scouting and precise player profiles significantly reduce the effort involved in pre-selection and provide a focused shortlist of players who are the best fit for your playing philosophy. This allows your scouts to concentrate on the most relevant profiles and apply their expertise where it creates the most value.

Data scouting is player identification and not player selection

The goal of data scouting is not to find the one best player, but to find many good and suitable players. The profiles of many players are analyzed to sort out the unsuitable ones. The more detailed the search profile, the better the filter and the result.

Data is not used to find the best player! Instead, it is used to quickly determine the 50 most suitable players out of 1,000, of which the best 10 are very likely to be among them (pre-scouting). This saves valuable time in #scouting, because the scouts can concentrate on just a few players without having to sift through hundreds of players. Identifying the very best candidates is and remains the task of the scouts with their expertise and experience.

Data scouting analyzes entire soccer world with one click

How long does it take you to roughly estimate whether a player fits your search position profile or not? Five minutes or ten? At Ten minutes, you need about 16 hours for 100 players. An algorithm can do this for 100,000 players in one minute. In addition, you can find surprising and cheaper players with the data, because the demand is lower outside the mass standard search. Data scouting guarantees that the focus is not too quickly on individual or recommended players, but that the entire spectrum is searched. Even there, where one does not suspect to find anything.

Purely statistically, due to the quantity and variance, one can say that there really is a decade talent to be found in every league.

Data scouting reduces dependence on consultants

There are still far too many clubs that operate a largely reactive scouting. They are offered several players a day by agents and the scouts have to spend a lot of time and effort to make a quick assessment and, for the most part, a rejection. A lot of effort for players they are not even interested in.

With this reactive approach, the clubs are already focused on a small number of players much too early. They are, so to speak, only fishing in the ponds of the agents/consultants and in the interest of them. However, if you use data to identify suitable players, you can turn the game around and fish in the oceans. And then approach the appropriate consultants as needed. Change the game.

Data Scouting Funnel/Process

The ideal scouting process starts with data scouting (pre-scouting)

All profiles of all available players are analyzed. The output is a candidate list of 30-100 players. After that, the candidates are further evaluated by the scouts using video and live scouting.

Here our Blog Data scouting the foundation of successful scouting

Maturities in the use of data in scouting

Based on our experience and conversations with national and international clubs, we can categorize clubs into the following levels of maturity when using data.

Foundational: Scouts use data in tools from professional providers. Player statistics and metrics are thus compared in a simple standardized way. Reactive Scouting following recommendations from consultants/agents.
Emerging: No own analytics resources. Existing scouting resources are additionally assigned analytics tasks. Data is exported from tools and additionally analyzed in third party applications such as Excel, Tableau, Power-Bi according to club specific needs. Develop own club specific KPI's. Mix between reactive and preventive scouting. Often no end-to-end data process implemented in scouting.
Established: clubs have dedicated analyst resources working with scouts. Apply advanced KPI's, logics, algorithms and mathematical functions in data scouting. Development of own basic models. Scouting and data-driven decisions are established across all stakeholders in scouting and, where applicable, in game analysis.
Advanced: Have entire teams of analysts, DataScientists, and mathematicians. Take a scientific approach and develop their own methods and logic for analyzing data. Development of own complex models. Data culture is established across the entire club.

Nothing to the use of standardized data tools. These are very cheap and very good. But to recognize and exploit the potential of analytics also requires investment in people, own resources and knowhow.

Every club should consider exploring the potential themselves. Right now, clubs are looking for data analysts almost daily, to build or strengthen their analytics department.

New technologies always have the potential to transform systems, shift power relationships, and change the demands on leadership positions.

Data analysis tells us that the player in question fits the search profile and has performed well. But the data is only a specific view. Like an additional eye or perspective. Like any other perspective, it is not the only truth. But it would be foolish to disdain this perspective for important decisions.

In any case, the view of the data is an added value. Either it confirms our own perception, or it forces us to look more closely.

All clubs use data to compare and scout players. But only a few clubs really really exploit the full potential.

For Scouting to be successful in a sustainable and systematic way, different people and groups need to work well together and some positive things need to come together. Put another way:

In the areas of knowledge, skill, communication, process, and leadership, there is an incredible amount of opportunity to not only come close to realizing data analytics potential

Data driven scouting is more than just juggling numbers

Data scouting is much more than comparing numbers, processing them in Excel, sorting them or slamming them into an x/y scatter chart.

#DataAnalytics consists of football, ComputerScience and mathematics. From experience, the latter is erroneously highly undervalued. But without mathematical knowhow and functions it is not possible to analyze and compare mountains of data intelligently.

Our complementary data scouting approach

We aim to find as many good players as possible and not to miss the very best ones.

We use mathematical functions to help us compare and interpret the data. As examples, average, median, and standard deviation are probably the better known. While others such as harmonic mean, sigmoid, Z score and cosine are probably somewhat less well known.

We analyze the available data using three different approaches. This ensures that we can find interesting players in multiple ways.

NASA is also using this approach in spaceflight. In order to maximize fail-safety, the very most important equipment must not only be duplicated, but also built with two different technologies.

1) Value Adjustment - Adjusting Values

To compare players as fairly as possible, we adjust the quantitative offensive and defensive metrics with different units. Depending on the metric, per possession, per pass, per touch, or even per specific pass. The point is to bring the input into meaningful relation with the output. This generates in the selection also players from teams, with less possession and lets them shine.

The stronger the correlation between the quantitative metric and the Adjusted unit, the more robust and meaningful the data analysis will be.

Possession adjustment matters because context matters. Most models stop at team possession. We developed a method that adjusts at the player possession level (% possession during players playing time). And we may be the only ones doing it. It’s the fairest and most precise way to normalize performance.

A more detailed article on this can be found here Compare players fairly

2) Individual data scouting with position profiles

We have defined over 20 detailed positional profiles that allow for a precise analysis of players and their tactical role. For example, a defensive midfielder requires different key metrics than an attacking midfielder, while a strong central defender with high ball progression has completely different requirements than a classic clearer.

Thanks to this precise fine-tuning, we identify the best and most suitable players - individually tailored to the clubs' style of play. In addition to classic positions and roles, we also take modern interpretations into account, such as the inverted full-back or the false nine ..We have defined seven position profiles for midfielders alone.

Here is an example 4-2-3-1 from the Premier League

3) Similarity Algorithm

Similarity scores were first used in sports by Sabermetric pioneer Bill James. His methods were also used by Billy Beane in baseball with the Oakland Athletics. Known from the book/movie Moneyball (The Art of Winning an Unfair Game, by Michael Lewis).

It is a mathematical method to determine the similarity between two vectors. In very simplified terms, it compares the overall similarity of the distance to the average for all relevant metrics among the players. This allows us to examine the datasets of each player for overall similarity and obtain a coverage of -1 (opposite) and 1 (identical) as a result .

We can thus start from a benchmark player and find similar players.

Let's take Lionel Messi for example. We can use the algorithm to search all the leagues in the world for players who have a profile/expression as similar as possible.

This does not mean that the players found are equally good as Messi. But that they have the same strengths and weaknesses and are very similar in style and expression. Also with this method we find interesting players, who sometimes become even more interesting only when they are looked at more closely.... .

As a practical example, let's take Manchester City's Champions League winner Kevin de Bruyne. He has outstanding scores in several dimensions. If we apply the algorithm to him, we find Iliass Bel Hassani from RKC Waalwik in the Eredivisie, with a similarity of 93.21%. By the way, his contract expires on 31 May 2023 ;-)

Profile comparison.

Bel Hassani is unfortunately already thirty and from the best scouting age. But it gets exciting when we apply the algorithm to younger players.

Among others, we find Matt O'Riley (22) from Celtic Glasgow with a similarity of 84.8%. Of course, in a league with a different league strength, but a young player capable of development with a similar strength profile to Kevin De Bruyne.

Update summer 2024: Matt O'Riley was signed by Brighton & Hove Albion as of 26.08.2024.

Update December 2024:

Jens Petter Hauge from Bodo-Glimt has a similarity of 95.17% to Florian Wirtz.

No wonder he was signed by Eintracht Frankfurt at the end of 2024

Of course, we also find young players from any other league.

Lennart Karl already showed top indicators in FC Bayern Munich's U17 team,

with a similarity score to Ousmane Dembélé of 89.45%.

This allows us to identify the stars of tomorrow at an early stage.

Whether it's De Bruyne, Messi, Rodri, Kimmich, Barella, Grealish, Pedri, Haaland, Neymar, Osimhen, Kim Min-jae and Co. With smart math, you are able to identify talented and young players with similar profiles of the superstars.

The possibilities are extensive. One could also determine Manchester City's U23 clone team as an example. Also dedicated in a league or region.

4) Peak Scouting

Another of our complementary approaches is to find players who outperform everyone in a metric or count the number of metrics in which he ranks above the 85th percentile compared to other players. Brighton & Hove Albion in particular follow this approach.

An example is Malik Tilmann (21, PSV) who ranks above the 85th percentile in 6 offensive metrics.

Omar Marmoush from Eintracht Frankfurt has six absolute top values in his metrics as of December 2024.

Datascouting is not an exact science. There is no one golden path. But the more you study and learn about the topic, the more complementary options and solutions you have.

I encourage all clubs and to invest not only in tools, but also in people and analytics know-how. Otherwise, they will remain one in the crowd. Start a learning journey and gather your own experiences. This is the only way to grasp the potential of data use and, if possible, to exploit it.

Data and video are complementary. Together they help us to sharpen our understanding of the game and of the players. Without video, we miss out on information. Without data, we miss out on information. "

The smart and successful use of data goes far beyond the mere purchase of standardized tools. We have many years of experience in the areas of football coaching, match development, match analysis and data scouting. Our unique selling point is the combination of sound tactical knowledge and expertise in innovation methods, design thinking and a systemic approach.

We can't help but conclude with a quote from a mathematician.

"Give me a fixed point, and I will unhinge the earth." Archimedes

More on Football Analytics

Data is transforming football in a lasting way. If you want to understand how modern football works with data, explore our content on football analytics, data-driven game analysis and data driven scouting.

Understanding the Basics

Data Analysis in Football – Learn how football analytics has evolved, which metrics are crucial, and how professional clubs use data in practice.

Data Driven Scouting

Player Scouting with Similarity Algorithms – How Algorithms Transform Scouting: Identifying Tomorrow’s Stars with Data

Understanding Metrics

Expected Threat (xT) – One of the most important metrics in football for evaluating every single offensive action based on data and assessing players’ impact more precisely.

Tactical Innovation in Football

Tactical Innovation in Football – How new ideas and data-driven approaches are changing the game, including concrete real-world examples.

Learn Analytics

Learn Football Analytics - 30+ videos and podcasts, explained in a concise, understandable, and practical way.

For Clubs and Agencies

Football Analytics Services – Data analysis, scouting, and customized solutions for better decision-making in professional football. Follow us on LinkedIn for case studies, analyses and new football analytics approaches.

Did you like the quality of our post? Then reward us with your "credits" and like and share this post within your social network. Thank you very much.

To not miss any post you can subscribe to the blog.

footballytics – we know how to make the data talk

We support clubs, coaches, agencies and players with analysis and consulting services in the use and interpretation of data. To make better decisions in scouting, in match analysis and on the pitch.

Work with a partner, not just a platform. Here you find a description of our services

Blog von www.footballytics.ch

About Data Analytics in football. improve the game - change the ǝɯɐƃ

Share this post