The candidate will be in a diverse research environment working with both a lab of statistics for population genetics (http://membres-timc.imag.fr/
Subject: The amount of generated biological data grows at an unprecedented pace. Statistical and machine-learning research efforts are increasingly needed to analyse the large-scale biological data. As part of this research effort, the objective of the PhD candidate will be to develop statistical algorithms that can scale with the massive dimension of genomic data. One major use of genomic data is to make associations between genomic features and disease traits. Key technical steps in making associations between genes and diseases consists of identifying the genetic variants that are co-located on the same chromosome (phasing) and imputing missing genetic variants based on reference panels (genotype imputation). These two steps are performed with statistical algorithms that may hit a computational wall soon because of the massive dimension of the genomic data. The objective of the PhD will be to develop statistical models and algorithms for handling massive data. A first direction will be to adapt matrix completion techniques, which are are popular for web recommender systems and which can also provide accurate and fast results for genotype imputation. The proposed models and algorithms will be implemented in an open-source software during the course of the PhD project.
Profile: The typical background of the candidate will be in statistics or machine learning. Students from related disciplines, such as physics, computer sciences or mathematics are also welcome to apply. Applicants with a genuine interest for interdisciplinary PhD education will be preferred.
The Phd candidate will be co-supervised by Julien Mairal, INRIA researcher in machine learning and Michael Blum, CNRS researcher in Bayesian statistical genetics.
Applicants should send by email a CV and a recommendation letter from an academic reference.