top of page
  • Autorenbildfootballytics

Moneyball: Exploiting Potential with Smart Data Scouting


pic by Thomas T  & Dall-E
pic by Thomas T & Dall-E

Yes, our scouting algorithm would have already found Kaoru Mitoma in 2020 at Kawasaki Frontale in the J1 League


If you don't know something well, or even if you haven't tried it intensively, it's difficult to assess its full potential. We show in this blog how we use our approach and different methods and mathematical functions, a smart and complementary data scouting to exploit as much as possible the full potential.


Our efforts in school have finally paid off after all. Mathematics can also be used to compare and scout players. This is really one of the best applications so far :-)



Data scouting is player identification and not player selection

The goal of data scouting is not to find the one best player, but to find many good and suitable players. The profiles of many players are analyzed to sort out the unsuitable ones. The more detailed the search profile, the better the filter and the result.


Data is not used to find the best player! Instead, it is used to quickly determine the 50 most suitable players out of 1,000, of which the best 10 are very likely to be among them (pre-scouting). This saves valuable time in #scouting, because the scouts can concentrate on just a few players without having to sift through hundreds of players. Identifying the very best candidates is and remains the task of the scouts with their expertise and experience.



Data scouting analyzes entire soccer world with one click

How long does it take you to roughly estimate whether a player fits your search position profile or not? Five minutes or ten? At Ten minutes, you need about 16 hours for 100 players. An algorithm can do this for 100,000 players in one minute. In addition, you can find surprising and cheaper players with the data, because the demand is lower outside the mass standard search. Data scouting guarantees that the focus is not too quickly on individual or recommended players, but that the entire spectrum is searched. Even there, where one does not suspect to find anything.


Purely statistically, due to the quantity and variance, one can say that there really is a decade talent to be found in every league.


Data scouting reduces dependence on consultants

There are still far too many clubs that operate a largely reactive scouting. They are offered several players a day by agents and the scouts have to spend a lot of time and effort to make a quick assessment and, for the most part, a rejection. A lot of effort for players they are not even interested in.


With this reactive approach, the clubs are already focused on a small number of players much too early. They are, so to speak, only fishing in the ponds of the agents/consultants and in the interest of them. However, if you use data to identify suitable players, you can turn the game around and fish in the oceans. And then approach the appropriate consultants as needed. Change the game.




Data Scouting Funnel/Process

The ideal scouting process starts with data scouting (pre-scouting)

All profiles of all available players are analyzed. The output is a candidate list of 30-100 players. After that, the candidates are further evaluated by the scouts using video and live scouting.





Maturities in the use of data in scouting

Based on our experience and conversations with national and international clubs, we can categorize clubs into the following levels of maturity when using data.


  • User: Scouts use data in tools from professional providers. Player statistics and metrics are thus compared in a simple standardized way. Reactive Scouting following recommendations from consultants/agents.

  • Individual: No own analytics resources. Existing scouting resources are additionally assigned analytics tasks. Data is exported from tools and additionally analyzed in third party applications such as Excel, Tableau, Power-Bi according to club specific needs. Develop own club specific KPI's. Mix between reactive and preventive scouting. Often no end-to-end data process implemented in scouting.

  • Analyst: clubs have dedicated analyst resources working with scouts. Apply advanced KPI's, logics, algorithms and mathematical functions in data scouting. Data scouting and data-driven decisions are established across all stakeholders in scouting and, where applicable, in game analysis.

  • Analyst Teams: Have entire teams of analysts, DataScientists, and mathematicians. Take a scientific approach and develop their own methods and logic for analyzing data. Data culture is established across the entire club.


Nothing to the use of standardized data tools. These are very cheap and very good. But to recognize and exploit the potential of analytics also requires investment in people, own resources and knowhow.

Every association should consider exploring the potential themselves. Right now, clubs are looking for data analysts almost daily, to build or strengthen their analytics department.


New technologies always have the potential to transform systems, shift power relationships, and change the demands on leadership positions.

Data analysis tells us that the player in question fits the search profile and has performed well. But the data is only a specific view. Like an additional eye or perspective. Like any other perspective, it is not the only truth. But it would be foolish to disdain this perspective for important decisions.


In any case, the view of the data is an added value. Either it confirms our own perception, or it forces us to look more closely.

All clubs use data to compare and scout players. But only a few clubs really really exploit the full potential.

For Scouting to be successful in a sustainable and systematic way, different people and groups need to work well together and some positive things need to come together. Put another way:


In the areas of knowledge, skill, communication, process, and leadership, there is an incredible amount of opportunity to not only come close to realizing data analytics potential

Data driven scouting is more than just juggling numbers

Data scouting is much more than comparing numbers, processing them in Excel, sorting them or slamming them into an x/y scatter chart.

#DataAnalytics consists of football, ComputerScience and mathematics. From experience, the latter is erroneously highly undervalued. But without mathematical knowhow and functions it is not possible to analyze and compare mountains of data intelligently.


Our complementary data scouting approach

We aim to find as many good players as possible and not to miss the very best ones.

We use mathematical functions to help us compare and interpret the data. As examples, average, median, and standard deviation are probably the better known. While others such as harmonic mean, sigmoid, Z score and cosine are probably somewhat less well known.


We analyze the available data using three different approaches. This ensures that we can find interesting players in multiple ways.


NASA is also using this approach in spaceflight. In order to maximize fail-safety, the very most important equipment must not only be duplicated, but also built with two different technologies.



1) Value Adjustment - Adjusting Values

To compare players as fairly as possible, we adjust the quantitative offensive and defensive metrics with different units. Depending on the metric, per possession, per pass, per touch, or even per specific pass. The point is to bring the input into meaningful relation with the output. This generates in the selection also players from teams, with less possession and lets them shine.


The stronger the correlation between the quantitative metric and the Adjusted unit, the more robust and meaningful the data analysis will be.


A more detailed article on this can be found here Compare players fairly




2) Position profiles and key metrics with filters.


  • Profiles: First we define the search profiles or positions and their characteristics: As an example in midfield a Deep Playmaker, Progressive Midfielder, a Box-2-Box or Attacking Midfielder. Also, a wide winger has different requirements than a centered inverse winger, etc. Thus, for all positions, different characteristics can be treated and analyzed differently

  • Key Metrics: Then we define the Key Metrics per position and proficiency. For example, a defensive midfielder has different key metrics than an offensive midfielder. And a central defender with high demands on ball progression, different Key Metrics than a pure "clearer". But it can also be that the same Key Metrics are defined, but with different high requirements

  • Standard deviation & Z-score: Then we calculate the Z-score by including the standard deviation per player and key metric. The standard deviation is a measure of the spread (variance) of the values. In simple terms, the average distance of all measured expressions of a characteristic from the average. The Z Score is used to make data comparable to each other by converting them to a standardized scale (-3 to 3). The Z Score shows exactly where the player lies in the entire performance spectrum compared to all players and the average. This is a good way to identify outliers and gaps. The Z Score value per key metric helps us to set the level of demand on the filter in the following step.



  • Requirement Filter: We then define Z-Score thresholds to define the minimum requirement per key metric. For certain metrics, a player must be average at the minimum, above average for certain, and exceptionally good for certain. And this natural varies for all role profiles. There is an art to tuning these filters so that the right players can overcome it. The level of the hurdles is, of course, defined individually for each search mandate. The filters need to be well tuned. If too many players still pass the filter, the requirements are increased.

  • Fine-tuning: At the end, after some iterations and fine-tuning, there is the final player identification with the most suitable and best players. Surprising players will certainly appear on this list, which one did not have in focus before. In addition, players who have a lower market value because they are not popular through the standard search.



Example Longlist: Central Progressive Midfielder, U25, max 2Mio, Europe



Here are the top results (without names) in a mandate for a Progressive CM.

List sorted by best performance in his league, according to search profile. Market values from Transfermarkt


De Bruyne was at the top. Others from top 5+ leagues fell victim to market value. However, it still includes excellent, cheap and lesser known players from many different leagues.







3) Similarity Algorithm

Similarity scores were first used in sports by Sabermetric pioneer Bill James. His methods were also used by Billy Beane in baseball with the Oakland Athletics. Known from the book/movie Moneyball (The Art of Winning an Unfair Game, by Michael Lewis).


It is a mathematical method to determine the similarity between two vectors. In very simplified terms, it compares the overall similarity of the distance to the average for all relevant metrics among the players. This allows us to examine the datasets of each player for overall similarity and obtain a coverage of -1 (opposite) and 1 (identical) as a result .


We can thus start from a benchmark player and find similar players.

Let's take Lionel Messi for example. We can use the algorithm to search all the leagues in the world for players who have a profile/expression as similar as possible.


This does not mean that the players found are equally good as Messi. But that they have the same strengths and weaknesses and are very similar in style and expression. Also with this method we find interesting players, who sometimes become even more interesting only when they are looked at more closely.... .


As a practical example, let's take Manchester City's Champions League winner Kevin de Bruyne. He has outstanding scores in several dimensions. If we apply the algorithm to him, we find Iliass Bel Hassani from RKC Waalwik in the Eredivisie, with a similarity of 93.21%. By the way, his contract expires on 31 May 2023 ;-)



Profile comparison.

Bel Hassani is unfortunately already thirty and from the best scouting age. But it gets exciting when we apply the algorithm to younger players.


Among others, we find Matt O'Riley (22) from Celtic Glasgow with a similarity of 84.8%. Of course, in a league with a different league strength, but a young player capable of development with a similar strength profile to Kevin De Bruyne.


Of course, we also find young players from any other league.


Whether it's De Bruyne, Messi, Kimmich, Barella, Grealish, Pedri, Haaland, Neymar, Osimhen, Kim Min-jae and Co. With smart math, you are able to identify talented and young players with similar profiles of the superstars.


The possibilities are extensive. One could also determine Manchester City's U23 clone team as an example. Also dedicated in a league or region.


Datascouting is not an exact science. There is no one golden path. But the more you study and learn about the topic, the more complementary options and solutions you have.


I encourage all clubs and to invest not only in tools, but also in people and analytics know-how. Otherwise, they will remain one in the crowd. Start a learning journey and gather your own experiences. This is the only way to grasp the potential of data use and, if possible, to exploit it.

Thank you very much to Ben Griffis for inspiring me!


Here is our data scouting on demand service for support or as a benchmark

for your own scouting


Data and video are complementary. Together they help us to sharpen our understanding of the game and of the players. Without video, we miss out on information. Without data, we miss out on information. "


126 million profit in 2 years with data driven scouting

In April 2022, we scouted eleven U21 players with our data driven algorithm. In two years, as of April 2024, 8 million market value has become 134 million (+126), forget Bitcoin :-)

Here to the article with original and current market values: footballytics U21 (data) Dreamteam




We can't help but conclude with a quote from a mathematician.


"Give me a fixed point, and I will unhinge the earth." Archimedes

Did you like the quality of our post? Then reward us with your "credits" and like and share this post within your social network. Thank you very much.

To not miss any post you can subscribe to the blog.


footballytics - we know how to make the data talk

We support clubs, coaches, agencies and players with analysis and consulting services to use and interpret data. To make better decisions in scouting, in match analysis and on the pitch.

 

Blog von www.footballytics.ch improve the game - change the ǝɯɐƃ

Share this post

bottom of page