An Efficient Random Forest Model for Predicting Respiratory Toxicity of Organic Chemicals
Aubin N’GUESSAN
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Ludovic AKONAN
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Désiré MELEDJE
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Hermann N’GUESSAN
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Gabin Placide ALLANGBA
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Logbo MOUSSE
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Nahossé ZIAO
Laboratory of Thermodynamics and Physico-Chemistry of the Environment, Nangui Abrogoua University, Côte d’Ivoire.
Melalie KEITA
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Raymond KRE
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire.
Eugène MEGNASSAN
*
Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire, International Center for Theoretical Physics, ICTP-UNESCO, Coastal Road 11, I-34151 Trieste, Italy, Laboratory of Crystallography and Molecular Physics, University of Cocody (Now Felix Houphouet-Boigny), Abidjan 02, Côte d’Ivoire, Laboratory of Material Sciences, The Environment and Solar Energy and Laboratory of Structural and Theoretical Organic Chemistry, University Felix Houphouet-Boigny, Abidjan 02, Côte d’Ivoire and QLS, ICTP-UNESCO, I 34151 Trieste, Italy.
*Author to whom correspondence should be addressed.
Abstract
This study developed a random forest (RF) model based on a large and diverse dataset to classify whether organic chemicals or drugs are respiratory toxicants. Indeed, the concerns regarding drug-induced respiratory remain a major cause of drug candidate failure in clinical trials resulting in the high cost of bringing drugs to market. In addition, animal models for experimental determination of the respiratory toxicity of chemicals are very lengthy, costly and time-consuming. It is therefore urgent to develop a theoretical model based on machine learning to qualitatively identify toxicants from a large dataset of drug/chemical compounds associated with respiratory system toxicity. However, it should be noted that the use of an excessive number of descriptors has the potential to increase the risk of overfitting, thereby reducing the model's ability to generalise. It is essential to implement more robust methods, capable of capturing relevant information without burdening the model with unnecessary variables. In this study, the random forest (RF) machine learning method combined with only nine (09) descriptors was used to build an efficient binary classification model for predicting the pulmonary or respiratory toxicity of chemicals. To demonstrate its predictive reliability, the global respiratory toxicity model was assessed using 10-fold internal cross-validation alongside external test set validation. RF model achieved a prediction accuracy of 76.66% and an AUC of 0.83 for the compounds in the test set. These findings emphasize the importance of rigorous descriptor selection and streamlined models to achieve reliable predictions in real-world scenarios, and they offer valuable contributions to respiratory toxicity assessment during early-stage drug discovery and environmental safety evaluations.

Keywords: Binary classification, organic chemicals, respiratory toxicity, random forest (RF), machine learning