Talent detection in sport:
Machine Learning methods for performance prediction

Arthur LEROY (University of Paris - IRMES)

Servane GEY (University of Paris) - Jean-Francois TOUSSAINT (IRMES)

Pierre LATOUCHE (University of Paris) - Benjamin GUEDJ (INRIA)

MathSport International 2019 Conference - 01/07/2019

Context

Traditional talent identification:
\(\rightarrow\) Best young athlete + coach intuition

G. Boccia et al. (2017) :

\(\simeq\) 60% of 16 years old elite athletes do not maintain their level of performance

Philip E. Kearney & Philip R. Hayes (2018) :

\(\simeq\) only 10% of senior top 20 were also top 20 before 13 years

Data

Performances from FF of Swimming members since 2002:

Irregular time series
Different number \(N_i\) of observations between individuals
Different observational timestamps \(t_i^k\)
\(N_i\) \(\simeq x \times10^1\)

Data

Performances from FF of Swimming members since 2002:

Irregular time series
Different number \(N_i\) of observations between individuals
Different observational timestamps \(t_i^k\)
\(N_i\) \(\simeq x \times10^1\) | \(N\) \(= \sum\limits_{i=1}^{M}\) \(N_i\) \(\simeq x \times 10^5\)

Data

Performances from FF of Swimming members since 2002:

Irregular time series
Different number \(N_i\) of observations between individuals
Different observational timestamps \(t_i^k\)
\(N_i\) \(\simeq x \times10^1\) | \(N\) \(= \sum\limits_{i=1}^{M}\) \(N_i\) \(\simeq x \times 10^5\)

Curves clustering

Functional data \(\simeq\) coefficients \(\alpha_k\) of B-splines functions:

\[y_i(t) = \sum\limits_{k=1}^{K}{\alpha_k B_k(t)}\]

Clustering: Algo FunHDDC (gaussian mixture + EM)
Bouveyron & Jacques - 2011

Using the multidimensional version : curve + derivative
\(\rightarrow\) Information about performance level and trend of improvement

Curve clustering

Leroy et al. - 2018

Different patterns of progression
Consistent groups for sport experts

Curve clustering

Leroy et al. - 2018

Different patterns of progression
Consistent groups for sport experts

New objectives

Prediction of the future values of the progression curve
\(\rightarrow\) Functional regression
Quantification of prediction uncertainty
\(\rightarrow\) Probabilistic framework

Gaussian process regression

Bishop - 2006 | Rasmussen & Williams - 2006

GPR : a kernel method to estimate \(f\) when:

\[y = f(x) +\epsilon\]

\(\rightarrow\) No restrictions on \(f\) but a prior probability:

\[f \sim \mathcal{GP}(0,C(\cdot,\cdot))\]

An example of exponential kernel for the covariance function: \[cov(f(x),f(x'))= C(x,x') = \alpha exp(- \dfrac{1}{2\theta^2} |x - x'|^2)\] Kernel definition \(\Rightarrow\) prefered properties on \(f\)

Prediction

\(\textbf{y}_{N+1} = (y_1,...,y_{N+1})\) has the following prior density: \[\textbf{y}_{N+1} \sim \mathcal{N}(0, C_{N+1}), \ C_{N+1} = \begin{pmatrix} C_N & k_{N+1} \\ k_{N+1}^T & c_{N+1} \end{pmatrix}\]

When the joint density is gaussian, so does the conditionnal dentisty:

\[y_{N+1}|\textbf{y}_{N}, \textbf{x}_{N+1} \sim \mathcal{N}(k^T \color{red}{C_N^{-1}}\textbf{y}_{N}, c_{N+1}- k_{N+1}^T \color{red}{C_N^{-1}} k_{N+1})\]

Prediction: \(\hat{y}_{N+1} = \mathbb{E}[y_{N+1}|\textbf{y}_{N}, \textbf{x}_{N+1}]\)
Uncertainty: CI with \(\mathbb{V}[y_{N+1}|\textbf{y}_{N}, \textbf{x}_{N+1}]\)

Visualization of GPR