fepops package

Subpackages

fepops.fepops_persistent package

Submodules

fepops.fepops module

class fepops.fepops.GetFepopStatusCode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

FAILED_RETRIEVED_NONE = 4

FAILED_TO_GENERATE = 2

FAILED_TO_RETRIEVE = 3

SUCCESS = 1

class fepops.fepops.OpenFEPOPS(*, kmeans_method: Literal['sklearn', 'pytorchcpu', 'pytorchgpu'] = 'sklearn', max_tautomers: int | None = 5, num_fepops_per_mol: int = 7, num_centroids_per_fepop: int = 4, descriptor_means: Tuple[float, ...] = (-0.28971602, 0.5181022, 0.37487135, 0.99922747, -0.04187301, 1.03382471, 0.27407036, 0.99853436, 0.09725517, 1.12824307, 0.23735556, 0.99882914, 0.35977538, 0.66653514, 0.41238282, 0.99902545, 5.71261449, 6.37716992, 6.47293777, 6.26134733, 6.20354385, 6.23201498), descriptor_stds: Tuple[float, ...] = (0.35110473, 1.00839329, 0.4838859, 0.02769204, 0.15418035, 0.86446056, 0.44583626, 0.0381767, 0.16095862, 0.92079483, 0.42526185, 0.03413741, 0.35756229, 1.36093993, 0.4921059, 0.0311619, 1.9668792, 2.31266486, 2.50699385, 2.41269982, 2.30018205, 2.31527129))[source]

Bases: object

OpenFEPOPS (Feature Points) molecular similarity object

Fepops allows the comparison of molecules using feature points, see the original publication for more information: https://doi.org/10.1021/jm049654z. In short, featurepoints reduce the number of points used to represent a molecule by combining atoms and their properties. Typically used to compare libraries of small molecules against known actives in the hope of discovering biosimilars based on queries.

Parameters:

kmeans_method (str, optional) – String literal denoting the method which should be used for kmeans calculations. May be one of “sklearn”, “pytorchgpu”, or “pytorchcpu”. If “sklearn” is passed then Scikit-learn’s kmeans implementation is used. However a faster implementation from the fast_pytorch_kmeans package can also be used if Pytorch is available and may be run in cpu-only mode, or GPU accelerated mode. Note: GPU accelerated mode should only be used if you are stretching the capabilities in terms of feature points for large molecules. Small molecules will not benefit at all from GPU acceleration due to overheads. By default “sklearn”
max_tautomers (Union[int, None], optional) – Maximum number of tautomers which should be generated. Internally, this implementation of FEPOPS relies upon RDKit’s TautomerEnumerator to generate tautomers and pass 5 to the number of tautomers to generate based on original FEPOPS paper. Unless the molecules (or macromolecules) you areworking with generate massive numbers of tautomers, this may optionally set as None implying that no limit should be placed on tautomer generation. By default 5
num_fepops_per_mol (int, optional) – Number of feature points to use in the representation of a molecule. Literature notes that 7 has been empirically found to be a good number of feature points for performant representations of small molecules. This might be increased if you are dealing with large and very flexible molecules, by default 7
num_centroids_per_fepop (int, optional) – Each fepop is represented by a number of centres, into which atom properties are compressed. Literature notes that this has been empirically determined to be 4 for a performant representation of small molecules. By default 4
descriptor_means (Tuple[float, ...], optional) – Due to the need to apply scaling to FEPOPS, the DUDE diversity set has been profiled and the means collected for all contained FEPOPS. This this allows centering and scaling of FEPOPS before scoring. This field contains default values for FEPOP means calculated with num_fepops_per_mol = 7, num_centroids_per_fepop=4, and kmeans_method = ‘sklearn’. New values should be supplied if the FEPOPS object is using different numbers for these values. By default (-0.28932319,0.5166312, 0.37458883,0.99913668,-0.04193182,1.03616917,0.27327129,0.99839024, 0.09701198,1.12969387,0.23718642,0.99865705,0.35968991,0.6649304, 0.4123743,0.99893657,5.70852885,6.3707943,6.47354071,6.26385429, 6.19229367,6.22946713)
descriptor_sds (Tuple[float, ...], optional) – Due to the need to apply scaling to FEPOPS, the DUDE diversity set has been profiled and the means collected for all contained FEPOPS. This this allows centering and scaling of FEPOPS before scoring. This field contains default values for FEPOP standard deviations calculated with num_fepops_per_mol = 7, num_centroids_per_fepop=4, and kmeans_method = ‘sklearn’. New values should be supplied if the FEPOPS object is using different numbers for these values. By default (0.35067291,1.00802116, 0.48380817,0.02926675,0.15400475,0.86220776,0.44542581,0.03999429, 0.16085455,0.92042695,0.42515847,0.03655217,0.35778578,1.36108994, 0.49210665,0.03252466,1.96446927,2.30792259,2.5024708,2.4155645, 2.29434487,2.31437527)

Raises:

ValueError – Invalid kmeans method

Calculate FEPOPS similarity

Method for calculating molecular similarity based on their OpenFEPOPS descriptors. Centres and scales FEPOPS descriptors using parameters passed upon object initialisation.

Parameters:

query (Union[np.ndarray, str]) – A Numpy array containing the FEPOPS descriptors of the query molecule or a smiles string from which to generate FEPOPS descriptors for the query molecule. Can also be None, in which case, np.nan is returned as a score.
candidate (Union[np.ndarray, str, None, list[np.ndarray, str, None]],) – A Numpy array containing the FEPOPS descriptors of the candidate molecule or a smiles string from which to generate FEPOPS descriptors for the candidate molecule. Can also be None, in which case, np.nan is returned as a score, or a list of any of these. If it is a list, then a list of scores against the single candidate is returned.

Returns:

Fepops similarity between two molecules

Return type:

float

generate_conformers(mol: Mol, random_state: int = 42) → list[source]

Generate conformers with rotatable bonds

Generate conformers for a molecule, enumerating rotatable bonds over 90 degree angles. This 90 degree increment was deemed opimal in literature.

Parameters:

mol (Chem.rdchem.Mol) – The Rdkit mol object of the input molecule.
random_state (int) – Integer to use as a random state when seeding the random number generator. By default 42.

Returns:

A list containing mol objects of different conformers with different angles of rotetable bonds

Return type:

List

get_centroid_pharmacophoric_features(mol: Mol) → ndarray[source]

Obtain centroids and their corresponding pharmacophoric features

Obtain centroids and then calucate and assign their corresponding pharmacophoric features (logP, charges, HBA, HBD, and distances between the centroids, following the pattern used for calculation of matrix determinants - in the case of 4 centroids, this is: d1-4, d1-2, d2-3, d3-4, d1-3, d2-4)

Parameters:: mol (Chem.rdchem.Mol) – The Rdkit mol object of the input molecule.
Returns:: A Numpy array containing 22 pharmacophoric features for all conformers.
Return type:: np.ndarray

get_fepops(mol: str | None | Mol, is_canonical: bool = False) → Tuple[GetFepopStatusCode, ndarray | None][source]

Get Fepops descriptors for a molecule

Parameters:: mol (Union[str, None, Chem.rdchem.Mol]) – Molecule as a SMILES string or RDKit molecule. Can also be None, in which case a failure error status is returned along with None in place of the requested Fepops descriptors.
Returns:: Returns a tuple, with the first value being a GetFepopStatusCode (enum) denoting SUCCESS or FAILED_TO_GENERATE. The second tuple element is either None (if unsuccessful), or a np.ndarray containing the calculated Fepops descriptors of the requested input molecule.
Return type:: Tuple[GetFepopStatusCode, Union[np.ndarray, None]]

pairwise_correlation(A: ndarray, B: ndarray)[source]

Fast method to generate pairwise correlation values (Pearson)

Parameters:

A (np.ndarray) – First features array (1D)
B (np.ndarray) – Second features array (1D)

Returns:

2D matrix containing A vs B feature correlations

Return type:

np.ndarray

Module contents

FEPOPS module containing the OpenFEPOPS class