Predicting traffic accident severities in inter-city roads from Spain and United Kingdom by means of statistical models based on Random Forest and Logistic Regression
This thesis proposes a regression model that allows to predict the severity of a traffic accident for each of its occupants. The model is intended to provide a collective intelli- gence to the vehicle and also will be useful to improve the userâĂŹs driving behavior.
For this purpose, a probabilistic classification model is proposed that allows to pre- dict the severity of a traffic accident based on a set of predictors that were involved in the collision. A massive dataset of accidents has been used to test the techniques pre- sented in this study. Different approaches for reducing the number of features involved in each road accident are presented and tested.
Based on the previous analysis of all the characteristics that involve traffic accidents and through autonomous systems that are able to learn from new incidents, this study will be of possible application to avoid future undesirable events.
The model proposed is based on the Random Forest and BayesGLM algorithms and allow us to infer the relationships between accidents and their contributing factors, in order to recognize the causes that determine the physical damages associated to each victim, thus allowing us to extract information that can be of great importance when planning traffic policies by governments. In addition, another important applications involves the possibility of providing vehicles with the intelligence to choose safer routes, thus increasing the overall road safety.
This work has been applied in two practical cases with very satisfactory results. On the one hand, the research includes the prediction of injuries to vehicle occupants by analyzing the traffic accidents that occurred from 2011 to 2015 in Spain and from 2009 to 2014 in the United Kingdom. The results presented demonstrate that the system is able to learn from a series of data collected on traffic accidents and to find trends in them that can be statistically demonstrable. The algorithm proposed is able to process the information and classify it by a supervised combination of regression and classification techniques.