Molecular Similarity (molecular + similarity)

Distribution by Scientific Domains


Selected Abstracts


Data and Graph Mining in Chemical Space for ADME and Activity Data Sets

MOLECULAR INFORMATICS, Issue 3 2006

Abstract We present a classification method, which is based on a coordinate-free chemical space. Thus, it does not depend on descriptor values commonly used by coordinate-based chemical space methods. In our method the molecular similarity of chemical structures is evaluated by a generalized maximum common graph isomorphism, which supports the usage of numerical physicochemical atom property labels in addition to discrete-atom-type labels. The Maximum Common Substructure (MCS) algorithm applies the Highest Scoring Common Substructure (HSCS) ranking of Sheridan and co-workers, which penalizes discontinuous fragments. For all compared classification algorithms used in this work we analyze their usefulness based on two objectives. First, we are interested in highly accurate and general hypotheses and second, the interpretation ability is highly important to increase our structural knowledge for the ADME data sets and the activity data set investigated in this work. [source]


Analysis of water solubility data on the basis of HYBOT descriptors.

MOLECULAR INFORMATICS, Issue 9-10 2003
Part 1.
Abstract This work describes the analysis of water-gas phase partitioning data Lw=Cw/Cg for 559 organic chemicals on the basis of physicochemical descriptors calculated by the HYBOT program package. Physicochemical descriptors combined with indicator variables as well as a new approach combining traditional QSAR and molecular similarity are used to take structural features into account. The H-bond acceptor ability of chemicals (i.e. interaction of acceptor atoms with hydrogen atoms of water) is the main factor that influences the partitioning of vapors into water. The simultaneous consideration of H-bond acceptor and donor factors leads to a description of the solubility of vapors with a correlation coefficient of about 0.92. The influence of steric interactions of solutes (characterized by means of molecular polarizability) with water molecules contributes slightly but significantly from the statistics point of view. The use of a set of indicator variables for hydrocarbons and for molecules containing amino, amido, CX3, ether and nitro groups as well as for molecules with ability to form intramolecular hydrogen bonds improves the correlation and helps to take structural features into account. Furthermore, the application of an approach based on the calculation of additional contributions to solubility by considering ,nearest neighbor chemicals' and their difference in physicochemical parameters gives in many cases good results and could be very useful in the analysis of vast data sets. [source]


A structural systematic study of four isomers of difluoro- N -(3-pyridyl)benzamide

ACTA CRYSTALLOGRAPHICA SECTION C, Issue 7 2009
Joyce McMahon
The four isomers 2,4-, (I), 2,5-, (II), 3,4-, (III), and 3,5-difluoro- N -(3-pyridyl)benzamide, (IV), all with formula C12H8F2N2O, display molecular similarity, with interplanar angles between the C6/C5N rings ranging from 2.94,(11)° in (IV) to 4.48,(18)° in (I), although the amide group is twisted from either plane by 18.0,(2),27.3,(3)°. Compounds (I) and (II) are isostructural but are not isomorphous. Intermolecular N,H...O=C interactions form one-dimensional C(4) chains along [010]. The only other significant interaction is C,H...F. The pyridyl (py) N atom does not participate in hydrogen bonding; the closest H...Npy contact is 2.71,Å in (I) and 2.69,Å in (II). Packing of pairs of one-dimensional chains in a herring-bone fashion occurs via,-stacking interactions. Compounds (III) and (IV) are essentially isomorphous (their a and b unit-cell lengths differ by 9%, due mainly to 3,4-F2 and 3,5-F2 substitution patterns in the arene ring) and are quasi-isostructural. In (III), benzene rotational disorder is present, with the meta F atom occupying both 3- and 5-F positions with site occupancies of 0.809,(4) and 0.191,(4), respectively. The N,H...Npy intermolecular interactions dominate as C(5) chains in tandem with C,H...Npy interactions. C,H...O=C interactions form R22(8) rings about inversion centres, and there are ,,, stacks about inversion centres, all combining to form a three-dimensional network. By contrast, (IV) has no strong hydrogen bonds; the N,H...Npy interaction is 0.3,Å longer than in (III). The carbonyl O atom participates only in weak interactions and is surrounded in a square-pyramidal contact geometry with two intramolecular and three intermolecular C,H...O=C interactions. Compounds (III) and (IV) are interesting examples of two isomers with similar unit-cell parameters and gross packing but which display quite different intermolecular interactions at the primary level due to subtle packing differences at the atom/group/ring level arising from differences in the peripheral ring-substitution patterns. [source]


Comparative QSAR Studies on Toxicity of Phenol Derivatives Using Quantum Topological Molecular Similarity Indices

CHEMICAL BIOLOGY & DRUG DESIGN, Issue 5 2010
Bahram Hemmateenejad
Quantitative structure activity relationship (QSAR) analyses using a novel type of electronic descriptors called quantum topological molecular similarity (QTMS) indices were operated to describe and compare the mechanisms of toxicity of phenols toward five different strains (i.e., Tetrahymena pyriformis, L1210 Leukemia, Pseudomonas putida, Raja japonica and Cucumis sativus). The appropriate QSAR models for the toxicity data were obtained separately employing partial least squares (PLS) regression combined with genetic algorithms (GA), as a variable selection method. The resulting QSAR models were used to identify molecular fragments of phenol derivatives whose electronic properties contribute significantly to the observed toxicities. Using this information, it was feasible to discriminate between the mechanisms of action of phenol toxicity to the studied strains. It was found that toxicities of phenols to all strains, except with L1210 Leukemia, are significantly affected by electronic features of the phenolic hydroxyl group (C-O-H). Meanwhile, the resulting models can describe the inductive and resonance effects of substituents on various toxicities. [source]


Filtering and Counting of Extended Connectivity Fingerprint Features Maximizes Compound Recall and the Structural Diversity of Hits

CHEMICAL BIOLOGY & DRUG DESIGN, Issue 1 2009
Ye Hu
Extended connectivity fingerprints produce variable numbers of structural features for molecules and quantitative comparison of feature ensembles is typically carried out as a measure of molecular similarity. As an alternative way to utilize the information content of extended connectivity fingerprint features, we have introduced a compound class-directed feature filtering technique. In combination with a simple feature counting protocol, feature filtering significantly improves the performance of extended connectivity fingerprint similarity searching compared with state-of-the-art fingerprint search methods. Subsets of extended connectivity fingerprint features that are unique to active compounds are found to be responsible for high compound recall. Moreover, feature filtering and counting is shown to result in significantly higher scaffold hopping potential than data fusion or fingerprint averaging methods. Extended connectivity fingerprint feature filtering and counting represents one of the simplest similarity search methods introduced to date, yet it produces top compound recall and maximizes the scaffold diversity of hits, which is a longstanding goal of similarity searching. [source]