Data augmentation, a technique from machine learning, enables researchers to increase the diversity of data, albeit synthetic, for training models without requiring further collection of data. We explore its use to alleviate the bottleneck of limited publicly available telematics data faced by researchers interested in the field of telematics based auto insurance.
Models that classify drivers based on their risk profile inferred from telematics data are predominantly unsupervised with riskiness inferred from the profile of drivers in each cluster. This is largely so because telematics data accompanied by historical claim records is not publicly available. In the talk, we explore the training of a model to assign a risk score to the drivers. Specifically, telematics data is collected from a small cohort of drivers, and this data is supplemented with self-disclosed historical claim records, a self evaluation survey, and a peer assessment of riskiness. Subsequently, the data is augmented to represent the telematics data of a larger cohort of synthetic drivers with varying risk profiles based on identification of driving-events such as turns and braking in the collected telematics data. In the talk we will report the results from our implementation as well as discuss avenues for further enhancements.