Research Topics

Estimate the impact of driving habits on claims frequency for Ajusto. 

Desjardins General Insurance, Modelling and Research Department, Levis, Quebec, Canada

2022 - present

Ajusto is a telematics program that collects data on the driving habits and behaviours using sensors in smartphones. the data is used to personalize car insurance premium. I develop new interesting aggregated variables based on telematic raw data. I create machine learning models to predict claims based on ratemaking prediction and aggregated telematic features. The gain in prediction and clients experience is estimated. We evaluate the pros and cons and we made recommendations for the ratemaking team. I also support other teams working on Ajusto in providing report based on data analysis. 

machine learning, structured and unstructured data

Imputation problems in genetic methylation studies: a linear corregionalization model (LMC) with covariates.

Post-Doctorate, Department of Decision Sciences, HEC and UQAM, Montreal, Quebec, Canada

2021 - 2022

Methylation is a process that modifies DNA CpG sites by the addition of a methyl group. This phenomenon is necessary for the body to function. Methylation is measured at all sites, but is subject to missing values. The aim is to impute the level of methylation on the missing sites; this is a high-dimensional imputation problem with covariates. We propose a method for predicting missing methylation levels from observed ones and covariates. The method captures methylation level correlation structures between sites and samples. The regression function linking methylation level to covariates is modeled by a linear combination of observed and latent factors (LMC). We assume that the effects of the factors are Gaussian processes. Predictions for missing data are obtained by equations conditional on observed data.

Non-separable Gaussian processes (linear corregionalization model), Imputation methods, Singular Value Decomposition, Kalman Filtering

Identification of factors impacting virus transmission.

Post- Doctorat, BioSP, INRAE Avignon

2019 - 2021

This post-doctorate is part of the ANR SMITID project. The aim of the project is to develop statistical methods for inferring infectious disease transmission from high-throughput sequencing data. The aim of the post-doc is to develop statistical methods for detecting the impact of environmental factors once the transmission tree has been inferred.

Permutation tests, Spearman correlation, Equine Influenza, Covid-19


Covid-19 : prediction des décès dans le monde

The aim is to predict the number of deaths per country over the medium to long term, using a mixture model based on other, more advanced countries. The methodology developed is summarized on the BioSp blog and described in detail in Soubeyrand S, et al. (2020). My collaboration enabled me, among other things, to use R code to improve the visualization of interactive graphs via plotly on the Shiny application dedicated to this research.

Plotly, RShiny, mixture model


Covid-19 : ICU beds predictions Vaucluse

The aim is to predict when intensive care unit (ICU) beds will be fully occupied in the Vaucluse département. The method is based on the temporal evolution of ICU bed occupancy in other French departments. A summary of the results is available on the BioSP blog dedicated to the Covid-19 pandemic.

Clustering, linear regression

Kriging for turbomachinery design: large dimensions and robust optimization

PhD, ICJ, Lyon1, École centrale de Lyon defended in october 2018 


2015 - 2018

This thesis is part of the ANR PEPITO project for the transport industry. The project is in collaboration with industrialists (Valéo, Intes, InModelia) and other academics. The aim is to build efficient turbomachinery. The numerical code used by Valéo to simulate turbomachinery operation is too expensive and cannot be used directly to address the problem.

Data-based algorithms for the construction of an isotropic group kernel for the high-dimensional kriging metamodel

The algorithms developed use only the available data to construct isotropic kernel groups by group. The methods are based on combinatorics and clustering. The corresponding published article is available here.

Robust optimization strategies on kriging metamodels

The problem of robustness is taken into account with the creation of two mean/variance criteria based on Taylor development. The metamodel used is co-kriging with derivatives. The seven strategies developed follow a classical sequential scheme of learning plan enrichment.  The choice of enrichment points is based on expected improvement criteria, clustering methods and a multi-objective genetic optimization algorithm (NSGA II). The published article is available here and the article in proceeding is available here

Performance du modèle de krigeage sur des données de fondation

Mission, Freelance, Fondasol

2019

Preliminary study on the prediction quality of the kriging model on foundation measurements.

Kriging, Rmarkdown report


Morphofunctional study of the iliac auricular surface in felids

Collaboration with Pallandre, J-P. (Museum national d'histoire naturelle)

2018

Study of the links between the shape of the auricular surface of the sacroiliac joint in felines and the selection of their prey, the type of bites inflicted and their body mass. Creation of an R-Shiny application. The published articles are available here and here

Test post-hoc, R-Shiny

Metamodel comparison

Second year of a master's degree internship, ICJ, Lyon1, École Centrale de Lyon

2015

This internship takes place in the same context as the thesis. Several preliminary studies have been carried out: comparison of metamodels (kriging, linear regression and generalized additive model), dimension reduction using sensitivity analysis, co-kriging method.

Kriging, co-kriging, linear model, GAM, cross validation

Identification of fraudsters and water meter leaks

First year of a master's degree internship, United Water, Paramus, New Jersey

2014

Analysis of a city's water consumption data to detect fraud and leaks. Data is transmitted by water meters automatically every minute (Big Data). The data needs to be updated, validated and analyzed. The methods developed are based on statistical tests, linear regression and analysis of variance.

Linear regression, Anova, goodness-of-fit test, automatic data update, automatic reporting