Research Topics
Estimate the impact of driving habits on claims frequency for Ajusto.
Desjardins General Insurance, Modelling and Research Department, Levis, Quebec, Canada
2022 - presentAjusto is a telematics program that collects data on the driving habits and behaviours using sensors in smartphones. the data is used to personalize car insurance premium. I develop new interesting aggregated variables based on telematic raw data. I create machine learning models to predict claims based on ratemaking prediction and aggregated telematic features. The gain in prediction and clients experience is estimated. We evaluate the pros and cons and we made recommendations for the ratemaking team. I also support other teams working on Ajusto in providing report based on data analysis.
machine learning, structured and unstructured data
Imputation problems in genetic methylation studies: a linear corregionalization model (LMC) with covariates.
Post-Doctorate, Department of Decision Sciences, HEC and UQAM, Montreal, Quebec, Canada
2021 - 2022Methylation is a process that modifies DNA CpG sites by the addition of a methyl group. This phenomenon is necessary for the body to function. Methylation is measured at all sites, but is subject to missing values. The aim is to impute the level of methylation on the missing sites; this is a high-dimensional imputation problem with covariates. We propose a method for predicting missing methylation levels from observed ones and covariates. The method captures methylation level correlation structures between sites and samples. The regression function linking methylation level to covariates is modeled by a linear combination of observed and latent factors (LMC). We assume that the effects of the factors are Gaussian processes. Predictions for missing data are obtained by equations conditional on observed data.
Non-separable Gaussian processes (linear corregionalization model), Imputation methods, Singular Value Decomposition, Kalman Filtering
Identification of factors impacting virus transmission.
Post- Doctorat, BioSP, INRAE Avignon
2019 - 2021This post-doctorate is part of the ANR SMITID project. The aim of the project is to develop statistical methods for inferring infectious disease transmission from high-throughput sequencing data. The aim of the post-doc is to develop statistical methods for detecting the impact of environmental factors once the transmission tree has been inferred.
Permutation tests, Spearman correlation, Equine Influenza, Covid-19
Covid-19 : prediction des décès dans le monde
The aim is to predict the number of deaths per country over the medium to long term, using a mixture model based on other, more advanced countries. The methodology developed is summarized on the BioSp blog and described in detail in Soubeyrand S, et al. (2020). My collaboration enabled me, among other things, to use R code to improve the visualization of interactive graphs via plotly on the Shiny application dedicated to this research.
Plotly, RShiny, mixture model
Covid-19 : ICU beds predictions Vaucluse
The aim is to predict when intensive care unit (ICU) beds will be fully occupied in the Vaucluse département. The method is based on the temporal evolution of ICU bed occupancy in other French departments. A summary of the results is available on the BioSP blog dedicated to the Covid-19 pandemic.
Clustering, linear regression
Kriging for turbomachinery design: large dimensions and robust optimization
PhD, ICJ, Lyon1, École centrale de Lyon defended in october 2018
2015 - 2018
This thesis is part of the ANR PEPITO project for the transport industry. The project is in collaboration with industrialists (Valéo, Intes, InModelia) and other academics. The aim is to build efficient turbomachinery. The numerical code used by Valéo to simulate turbomachinery operation is too expensive and cannot be used directly to address the problem.
Data-based algorithms for the construction of an isotropic group kernel for the high-dimensional kriging metamodel
The algorithms developed use only the available data to construct isotropic kernel groups by group. The methods are based on combinatorics and clustering. The corresponding published article is available here.
Robust optimization strategies on kriging metamodels
The problem of robustness is taken into account with the creation of two mean/variance criteria based on Taylor development. The metamodel used is co-kriging with derivatives. The seven strategies developed follow a classical sequential scheme of learning plan enrichment. The choice of enrichment points is based on expected improvement criteria, clustering methods and a multi-objective genetic optimization algorithm (NSGA II). The published article is available here and the article in proceeding is available here.
Performance du modèle de krigeage sur des données de fondation
Mission, Freelance, Fondasol
2019Preliminary study on the prediction quality of the kriging model on foundation measurements.
Kriging, Rmarkdown report
Morphofunctional study of the iliac auricular surface in felids
Collaboration with Pallandre, J-P. (Museum national d'histoire naturelle)
2018Study of the links between the shape of the auricular surface of the sacroiliac joint in felines and the selection of their prey, the type of bites inflicted and their body mass. Creation of an R-Shiny application. The published articles are available here and here.
Test post-hoc, R-Shiny
Metamodel comparison
Second year of a master's degree internship, ICJ, Lyon1, École Centrale de Lyon
2015This internship takes place in the same context as the thesis. Several preliminary studies have been carried out: comparison of metamodels (kriging, linear regression and generalized additive model), dimension reduction using sensitivity analysis, co-kriging method.
Kriging, co-kriging, linear model, GAM, cross validation
Identification of fraudsters and water meter leaks
First year of a master's degree internship, United Water, Paramus, New Jersey
2014Analysis of a city's water consumption data to detect fraud and leaks. Data is transmitted by water meters automatically every minute (Big Data). The data needs to be updated, validated and analyzed. The methods developed are based on statistical tests, linear regression and analysis of variance.