Abstract

Combined SVM-PLS Method for Predicting Estrogenic Activities of Organic Chemicals in the Coastal water

Fei Li, Lulu Cao, Huifeng Wu, Jianmin Zhao

A data set of 517 natural, synthetic and environmental chemicals belonging to a broad range of structural classes have been tested for estrogenic activities (expressed as logREC10) to the estrogen receptor (ER) using a yeast twohybrid assay. In this study, quantitative structure- activity relationships (QSARs) were determined using two methods, partial least square (PLS) and support vector machine (SVM). The Q2 cum of the PLS model is 0.678, indicating high robustness and good predictive ability. The correlation coefficient (R) between the observed and the predicted values is 0.870, indicating the predicted values by the final QSAR models were in good agreement with the corresponding experimental values. Eight DRAGON descriptors were included in the PLS model, including Mor03p, L3e, R8p, RTv+ , R8e, R1p+ , R7p+ and HATSv , which implies that chemical estrogenic activities are related to atomic properties (atomic Sanderson electronegativities, polarizabilities and van der Waals volumes). Comparison of the results obtained from two models showed that the SVM method exhibited the best overall performances, with a RMS error of 0.145 logREC10 units for the whole set. Moreover, three linear QSAR models were constructed for some specific families based on their chemical structures. These predictive models should be useful to rapidly identify potential estrogenic endocrine disrupting chemicals.