Traditional talent identification:
\(\rightarrow\) Best young athlete + coach intuition
G. Boccia et al. (2017) :
\(\simeq\) 60% of 16 years old elite athletes do not maintain their level of performance
Philip E. Kearney & Philip R. Hayes (2018) :
\(\simeq\) only 10% of senior top 20 were also top 20 before 13 years
Performances from FF of Swimming members since 2002:
Performances from FF of Swimming members since 2002:
Performances from FF of Swimming members since 2002:
Functional data \(\simeq\) coefficients \(\alpha_k\) of B-splines functions:
\[y_i(t) = \sum\limits_{k=1}^{K}{\alpha_k B_k(t)}\]
Clustering: Algo FunHDDC (gaussian mixture + EM)
Bouveyron & Jacques - 2011
Using the multidimensional version : curve + derivative
\(\rightarrow\) Information about performance level and trend of improvement
Leroy et al. - 2018
Leroy et al. - 2018
Bishop - 2006 | Rasmussen & Williams - 2006
GPR : a kernel method to estimate \(f\) when:
\[y = f(x) +\epsilon\]
\(\rightarrow\) No restrictions on \(f\) but a prior probability:
\[f \sim \mathcal{GP}(0,C(\cdot,\cdot))\]
An example of exponential kernel for the covariance function: \[cov(f(x),f(x'))= C(x,x') = \alpha exp(- \dfrac{1}{2\theta^2} |x - x'|^2)\] Kernel definition \(\Rightarrow\) prefered properties on \(f\)
\(\textbf{y}_{N+1} = (y_1,...,y_{N+1})\) has the following prior density: \[\textbf{y}_{N+1} \sim \mathcal{N}(0, C_{N+1}), \ C_{N+1} = \begin{pmatrix} C_N & k_{N+1} \\ k_{N+1}^T & c_{N+1} \end{pmatrix}\]
When the joint density is gaussian, so does the conditionnal dentisty:
\[y_{N+1}|\textbf{y}_{N}, \textbf{x}_{N+1} \sim \mathcal{N}(k^T \color{red}{C_N^{-1}}\textbf{y}_{N}, c_{N+1}- k_{N+1}^T \color{red}{C_N^{-1}} k_{N+1})\]
Key points:
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
\(\rightarrow\) Using the shared information between individuals (GPR-ME)
Shi & Wang - 2008 | Shi & Choi - 2011
\[Y_i(t) = \mu_0(t) + f_i(t) + \epsilon_i\] avec:
GPFDA R package
Limits:
\[Y_i(t) = \mu_0(t) + f_i(t) + \epsilon_i\] with:
It follows that:
\[Y_i(\cdot) \vert \mu_0 \sim \mathcal{GP}(\mu_0(\cdot), \Sigma_{\theta_i}(\cdot,\cdot) + \sigma^2), \ Y_i \vert \mu_0 \perp \!\!\! \perp\]
\(\rightarrow\) Shared information through \(\mu_0\) and its uncertainty
\(\rightarrow\) Unified non parametric probabilistic framework
\(\textbf{y} = (y_1^1,\dots,y_i^k,\dots,y_M^{N_M})^T\)
\(\textbf{t} = (t_1^1,\dots,t_i^k,\dots,t_M^{N_M})^T\)
\(\Theta = \{ \theta_0, (\theta_i)_i, \sigma^2 \}\)
\(\Sigma\): covariance matrix from the process \(f_i\) evaluated on \(\textbf{t}\)
\(\Sigma = \left[ \Sigma_{\theta_i}(t_i^k, t_j^l)_{(i,j), (j,l)} \right]\)
\(\Psi = \Sigma + \sigma^2 Id_N\)
Since \((Y_i)_i\vert \mu_0 \perp \!\!\! \perp\), then:
\[\Psi = \left.\left( \vphantom{\begin{array}{c}1\\1\\1\\1\\1\\1\end{array}} \smash{ \begin{array}{cccccc} \Psi_1&0&\cdots &\cdots&0\\ \vdots&\ddots&&\ddots&\vdots\\ 0&&\Psi_i &&0\\ \vdots&\ddots&&\ddots&\\ 0&\cdots&\cdots&0 &\Psi_M \end{array} } \right)\right\} \,\color{red}{N} \times \color{red}{N} \]
\[\Psi_i = \left.\left( \vphantom{\begin{array}{c}1\\1\end{array}} \smash{ cov(y(t_i^l),y(t_i^k))_{l,k} } \right)\right\} \,\color{green}{N_i}\times\color{green}{N_i}\]
Step E: Computing the posterior
\[p(\mu_0(\textbf{t}) \vert \textbf{t}, \textbf{y}, \Theta) = \mathcal{N}( \hat{\mu}_0(\textbf{t}), \hat{K})\]
Efficiently computable if \(K_{\theta_0}\) is block diagonal
Step M: Estimating \(\Theta\)
\[\hat{\Theta} = \underset{\Theta}{\arg\max} \ \mathbb{E}_{\mu_0} [ \log \ p(\textbf{y}, \mu_0(\textbf{t}) \vert \textbf{t}, \Theta ) \ \vert \Theta]\]
Initialize hyperparameters
while(sufficient condition of convergence){
Iterate alternatively steps E and M}
For a new time \(t^*\), we have a posterior density for \(Y_i(t^*)\)
Prediction + uncertainty for future performances
Split a \(O(\color{red}{N^3})\) problem into \(M \ O(\color{green}{N_i^3})\) problems
Remains computationaly extensive but tractable
Code available soon on https://github.com/ArthurLeroy
Mixture of GP to perform cluster-specific predictions
Study and design of different covariance functions
Using several other variables, multivariate functional regression
Application to other sports (track and field, rowing, …)