Traditional talent identification:
\(\rightarrow\) Best young athlete + coach intuition
G. Boccia et al. (2017) :
\(\simeq\) 60% of 16 years old elite athletes do not maintain their level of performance
Philip E. Kearney & Philip R. Hayes (2018) :
\(\simeq\) only 10% of senior top 20 were also top 20 before 13 years
Performances from FF of Swimming members since 2002:
Performances from FF of Swimming members since 2002:
Performances from FF of Swimming members since 2002:
Functional data \(\simeq\) coefficients \(\alpha_k\) of B-splines functions:
\[y_i(t) = \sum\limits_{k=1}^{K}{\alpha_k B_k(t)}\]
Clustering: Algo FunHDDC (gaussian mixture + EM)
Bouveyron & Jacques - 2011
Using the multidimensional version : curve + derivative
\(\rightarrow\) Information about performance level and trend of improvement
Leroy et al. - 2018
Leroy et al. - 2018
Bishop - 2006 | Rasmussen & Williams - 2006
GPR : a kernel method to estimate \(f\) when:
\[y = f(x) +\epsilon\]
\(\rightarrow\) No restrictions on \(f\) but a prior probability:
\[f \sim \mathcal{GP}(0,C(\cdot,\cdot))\]
An example of exponential kernel for the covariance function: \[cov(f(x),f(x'))= C(x,x') = \alpha exp(- \dfrac{1}{2\theta^2} |x - x'|^2)\] Kernel definition \(\Rightarrow\) prefered properties on \(f\)
\(\textbf{y}_{N+1} = (y_1,...,y_{N+1})\) has the following prior density: \[\textbf{y}_{N+1} \sim \mathcal{N}(0, C_{N+1}), \ C_{N+1} = \begin{pmatrix} C_N & k_{N+1} \\ k_{N+1}^T & c_{N+1} \end{pmatrix}\]
When the joint density is gaussian, so does the conditionnal dentisty:
\[y_{N+1}|\textbf{y}_{N}, \textbf{x}_{N+1} \sim \mathcal{N}(k^T \color{red}{C_N^{-1}}\textbf{y}_{N}, c_{N+1}- k_{N+1}^T \color{red}{C_N^{-1}} k_{N+1})\]
Key points:
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
Estimating a GP on each individuals (\(O(\color{green}{N_i^3})\)):
\(\rightarrow\) Using the shared information between individuals (GPR-ME)
Shi & Wang - 2008 | Shi & Choi - 2011
\[y_i(t) = \mu_0(t) + f_i(t) + \epsilon_i\] with:
GPFDA R package
Limits:
\[y_i(t) = \mu_0(t) + f_i(t) + \epsilon_i\] with:
It follows that:
\[y_i(\cdot) \vert \mu_0 \sim \mathcal{GP}(\mu_0(\cdot), \Sigma_{\theta_i}(\cdot,\cdot) + \sigma^2), \ y_i \vert \mu_0 \perp \!\!\! \perp\]
\(\rightarrow\) Shared information through \(\mu_0\) and its uncertainty
\(\rightarrow\) Unified non parametric probabilistic framework
\(\rightarrow\) Effective even for irregular time series
\(\textbf{y} = (y_1^1,\dots,y_i^k,\dots,y_M^{N_M})^T\)
\(\textbf{t} = (t_1^1,\dots,t_i^k,\dots,t_M^{N_M})^T\)
\(\Theta = \{ \theta_0, (\theta_i)_i, \sigma_i^2 \}\)
\(\Sigma\): covariance matrix from the process \(f_i\) evaluated on \(\textbf{t}\)
\(\Sigma = \left[ \Sigma_{\theta_i}(t_i^k, t_j^l)_{(i,j), (j,l)} \right]\)
\(\Psi = \Sigma + \sigma_i^2 Id_N\)
Reminder of its simple definition:
\[ \mathbb{P}(T \vert D) = \dfrac{\mathbb{P}(D \vert T) \mathbb{P}(T)}{\mathbb{P}(D)} \] Powerful implication when it comes to learning from data:
Bayes’ law tells you how and what you should learn on theory T according to data D:
\(\rightarrow \mathbb{P}(T \vert D)\), what you should think a posteriori Computational burden, among solutions: empirical Bayes
Step E: Computing the posterior (knowing \(\Theta\))
\[ \begin{align} p(\mu_0(\textbf{t}) \vert \textbf{t}, \textbf{y}, \Theta) &\propto p(\textbf{y} \vert \textbf{t}, \mu_0(\textbf{t}), \Theta) \ p(\mu_0(\textbf{t}) \vert \textbf{t}, \Theta) \\ &\propto \mathcal{N}( \mu_0(\textbf{t}), \Psi) \ \mathcal{N}(m_0, K) \\ &= [Insert \ here \ some \ PhD \ student \ ideas] \\ &= \mathcal{N}( \hat{\mu}_0(\textbf{t}), \hat{K}) \end{align} \]
Step M: Estimating \(\Theta\) (knowing \(p(\mu_0)\))
\[\hat{\Theta} = \underset{\Theta}{\arg\max} \ \mathbb{E}_{\mu_0} [ \log \ p(\textbf{y}, \mu_0(\textbf{t}) \vert \textbf{t}, \Theta ) \ \vert \Theta]\]
Initialize hyperparameters
while(sufficient condition of convergence){
Iterate alternatively steps E and M}
\(\mathbb{E} \left[ \mu_0(\textbf{t}) \vert Data \right] \pm CI_{0.95}\)
Iteration counter : 0
\(\mathbb{E} \left[ \mu_0(\textbf{t}) \vert Data \right] \pm CI_{0.95}\)
Iteration counter : 1
\(\mathbb{E} \left[ \mu_0(\textbf{t}) \vert Data \right] \pm CI_{0.95}\)
Iteration counter : 2
\(\mathbb{E} \left[ \mu_0(\textbf{t}) \vert Data \right] \pm CI_{0.95}\)
Iteration counter : 4
\(\mathbb{E} \left[ \mu_0(\textbf{t}) \vert Data \right] \pm CI_{0.95}\)
Iteration counter : 5
\(\mathbb{E} \left[ \mu_0(\textbf{t}) \vert Data \right] \pm CI_{0.95}\)
Iteration counter : 6 \(\rightarrow\) break and return
Petit jeu concours pour savoir qui sont ceux qui suivent et ceux qui scrollent sur Facebook :
L’orateur a oublié de sortir un graph et a donc sauté une itération.
\(\rightarrow\) Une bière pour le vainqueur. Un jour. Peut être.
\[ \forall i, \ \ y_i(t) = \mu_0(t) + f_i(t) + \epsilon_i \] Suppose that, after the learning step, you observe data from a new individual \(\star\). Multitask learning consists in improving performance by sharing information across individuals.
Also recall that if \(p(y_*(\textbf{t}), y_*(t^{new})) = \mathcal{N}( \begin{bmatrix} m_*^{\textbf{t}} \\ m_*^{new} \\ \end{bmatrix}, \begin{pmatrix} \Psi_*^{\textbf{t},\textbf{t}} & \Psi_*^{\textbf{t},new} \\ \Psi_*^{new,\textbf{t}} & \Psi_*^{new,new} \end{pmatrix})\)
GP prediction’s formula gives:
\[\begin{align} p(y_*(t^{new}) \vert y_*(\textbf{t})) &=\mathcal{N} \big( m_*^{new} + \Psi_*^{new,\textbf{t}} {\Psi_*^{\textbf{t},\textbf{t}}}^{-1} (y_*(\textbf{t}) - m_*^{\textbf{t}}); \\ & \hspace{1.2cm} \Psi_*^{new,new} - \Psi_*^{new,\textbf{t}} {\Psi_*^{\textbf{t},\textbf{t}}}^{-1} \Psi_*^{\textbf{t},new} \big) \end{align}\]
Multitask regression : conditioning on observations Incertitude on the mean process : integrate over \(\mu_0\)
\[\begin{align} p(y_* \vert \textbf{y}) &= \int p(y_*, \mu_0 \vert \textbf{y}) \ d \mu_0\\ &\underbrace{=}_{Bayes \heartsuit} \int p(y_* \vert \textbf{y}, \mu_0) p(\mu_0 \vert \textbf{y}) \ d \mu_0\\ &\underbrace{=}_{(y_i \vert \mu_0)_i \perp \!\!\! \perp} \int p(y_* \vert \mu_0) p(\mu_0 \vert \textbf{y}) \ d \mu_0 \\ &= \mathcal{N}( \hat{\mu}_0, \hat{K} + \Psi) \end{align}\]
Assumption : 1 underlying mean process might be strong
\(\rightarrow\) Mixture model of multitask GP:
\[\forall i , \forall k , \ \ y_i(t) \vert (Z_{ik} = 1) = \mu_k(t) + f_i(t) + \epsilon_i\] with:
It follows that:
\[y_i(\cdot) \vert (\mu_k)_k, \pi \sim \sum\limits_{k=1}^K{\pi_k \ \mathcal{GP}(\mu_k(\cdot), \Psi_i)}, \ y_i \vert (\mu_k)_k, \pi \perp \!\!\! \perp\]
Posterior dependencies \((\mu_k)_k\) and \((Z_i)_i\) \(\rightarrow\) variational EM
Step E:
Approximation assumption: \(r(\mu, Z) = r(\mu)p(Z)\)
True likelihood becomes : \(\mathcal{L}(model) = \mathcal{L}(r(\mu, Z)) + KL \big( r(\mu, Z) \vert \vert p(\mu, Z \vert \textbf{y})\big)\)
Step M:
\(\mathcal{L}(r(\mu, Z))\) provides a lower bound for LL maximization
Then buisiness as usual
\(\rightarrow\) Each step is proved to increase likelihood. Repeat until convergence.
Making a prediction is \(\mathbb{P}(\)saying something wrong\() \simeq 1\).
A probabilistic prediction tells you how much:
Improve and release the package
Enable association with sparse GP approximations
Integrate to the app and launch tests with FFN
Maybe multivariate functional regression
If you have another idea, let’s work on that together
Write down a thesis. Someday.