The aim of this study was to evaluate several types of kinase descriptors and compare the performance merely of different multivariate correlation methods in large scale proteochemometric modelling of protein kinase inhibitor interactions. Results Performance of different types of kinase descriptors in PCA and PLS DA models In order to compare the performance of the alignment based approach and the five alignment independent approaches used herein for describing protein kinase sequences we applied principal component analysis and partial least squares discriminant analysis. PCA was performed to visualize how different types of descriptors separate the seven groups of protein kinases confined in the data set of 317 sequences. PLS DA was used to obtain a quantitative measure of the abil ity of the descriptors Inhibitors,Modulators,Libraries to discriminate these groups.
The seven kinase groups were as defined in, namely AGC, CaMK, CK1, CMGC, STE, TK, and TKL. The first three principal components of the PCA mod els for the six sets of descriptors are visualized in Figure 1, Panels A to F. As seen from panels A and B, SO PAA and CTD descriptors distribute the kinases in a more or less random fashion, albeit part of tyrosine Inhibitors,Modulators,Libraries kinases are sepa rated from other groups, and the STE and CK1 groups are quite compact. Clustering into groups is more evident when the AAC DC descriptors and MACCs of z scale descriptors are used. For these descrip tors the location of the Inhibitors,Modulators,Libraries TK group, which is the largest group in the data set, shows almost no overlap with the other groups.
Finally, the ACCs of z scale descriptors and the z scale descriptors of aligned sequences give good separation of most of the kinase groups. However, a notable difference between the two last is that ACCs separate subgroups of TKs, while the first three PCs of descriptors of the aligned sequences do not reveal such Inhibitors,Modulators,Libraries sub clustering. On the other hand, the alignment based descriptors are the only ones that separate Inhibitors,Modulators,Libraries CMGC kinases as being substan tially different from the other groups. As seen from Panel F, for the alignment based approach the CMGC kinases form a distinct cluster in the first two PCs. PLS DA finds the directions in PC space where maxi mum separation among the classes is obtained and where each class forms a maximally compact cluster.
Imatinib mechanism In an ideal situation a cross validated correlation coefficient Q2 1 indicates that all members of a class are predicted to have y 1, whereas all non members are predicted to have y 0. In reality Q2 is always lower than 1, which is due to intra class variations. Nevertheless, a Q2 within the range 0. 6 0. 8 still indicates a good separation of classes, with few or no mispredictions. Should Q2 drop down to 0. 4 0. 6, or even less, we have a warning that classes overlap and that the model will make multiple mispredictions.