What Are The Main Data Sources For Sports Analysis?

Intro

Over the past few years, significant advancements in technology have revolutionized the way data is collected by automating the process and minimizing the need for human intervention.

Thanks to the development of high-quality video recording and the enhancement of machine learning, computer vision, and artificial intelligence, data collection has taken on a new dimension. In fact, in many instances, the output is no longer limited to just data, but includes advanced analytics, football game models, and key performance metrics.

This post offers a brief overview of the most widely-used data sources in football and their fundamental applications. Specifically, we will describe the three most popular types of game data: event data, tracking data, and video footage.

Event Data

Event data can be traced back to the era of hand annotations, where analysts used annotation schemes to manually track actions during the course of a game on their notepads. Nowadays, this task is typically carried out by companies that provide sports organizations with packages containing annotated matches for entire leagues, with most of the annotations still being done manually.

However, there are no established conventions or standard practices in this field. Each provider has its own glossary of events, which are used to manually tag the games. These glossaries vary significantly in terms of scope, depth, accuracy, and the definitions used for game actions. As a result, conceptualizations, implementations, and data formats differ among providers. Additionally, because the descriptions of match play become complex rather quickly, the logic used when storing this data may also differ among providers.

SportAnalytics offers a solution to address the inconsistencies among data providers. To achieve this, we have developed the SportAnalytics data model, which serves as the foundation for all the metrics we create. Our data model has undergone verification and validation by video analysts and match analysts from various clubs, with over 300 games used to test it.

Verification involves ensuring that our model implementation accurately reflects the user’s conceptual description of the model and its solution.

Validation, on the other hand, focuses on assessing the degree to which our model is a faithful representation of the real world from the perspective of its intended users.

Our algorithms automatically clean the data and synchronize different data feeds, using data fusion technology to enhance accuracy. We have tested our data model with multiple data providers to ensure consistency in the interpretation of events and conventions at the end of the ingestion process.

Tracking Data

Tracking data is a crucial resource for top clubs, but also a highly confidential and valuable one due to privacy concerns. While fans may be fascinated by the movement of dots on the screen, the collection and protection of this data is subject to strict regulations that vary from country to country. Typically, the use of this data is restricted to clubs only.

There are several types of tracking data, including GPS, sensor-based tracking, broadcast tracking, and multi-camera optical tracking systems. The method of data acquisition has a significant impact on the eventual data, with variations in representation (e.g., latitude and longitude or coordinates in a reference system), frame rates, additional information (e.g., accelerations or impacts), accuracy, and missing data.

Optical tracking data is the most accurate form of tracking data. This data is typically collected using a system of fixed cameras installed inside the stadium, generating a file with three million data points that describe the (x, y) coordinates of all players and the ball at each moment in time. Compared to standard GPS systems, optical tracking data includes the ball’s position, which is key to understanding the tactical context of the game. Compared to broadcast tracking systems, optical tracking data provides a full picture since every player is visible within the camera view, and it is more accurate due to the use of multiple points of view (multi-camera). On the other side, broadcast tracking data can be a useful source to get data in abroad competitions for scouting a recruitment.

 

 

Video

Last but not least, video is perhaps the most common data source for a sport organization. Clubs collect data from academy to senior team by using different camera systems. By combining tracking with automated event data and video it is possible to have an excellent picture of the game. Besides, SportAnalytics solutions are equipped with a proprietary data fusion technology that produces an augmented – enriched – data feed to ensure the highest accuracy to our partners.

What’s next?

Sport, in particular football, is characterized by a very high pace and a short possession time. It requires players to be well trained in cognitive abilities and taking decisions fast.

Body orientation is a very useful information to evaluate the player’s cognitive skills and decision-making ability. Body orientation can be obtained by pose estimation, a computer vision technique to track the movements of a person or an object. This is usually performed by finding the location of key points for the given objects. Based on these key points it is possible to compare various movements and postures.

At the moment this technology has been applied for the VAR – Video Assistant Referee – the system that enables referees to review incidents on the pitch, make informed decisions, and ensure fairness. The system performs pose estimation on multiple cameras at once. The implementation of the VAR technology and infrastructure requires a substantial financial investment, limiting its accessibility to only the top-tier leagues and clubs at the moment.

The next frontier is to use these data for player’ movement analysis, to help them improving technical skills and to take better decisions. If we consider how fast the evolution of artificial intelligence systems have been in recent years .. The sky is really the only limit here.