fepops package
Subpackages
- fepops.fepops_persistent package
- Submodules
- fepops.fepops_persistent.fepops_persistent_abc module
FepopsPersistentAbstractBaseClass
FepopsPersistentAbstractBaseClass.add_fepop()
FepopsPersistentAbstractBaseClass.calc_similarity()
FepopsPersistentAbstractBaseClass.fepop_exists()
FepopsPersistentAbstractBaseClass.get_cansmi_to_mol_dict_not_in_database()
FepopsPersistentAbstractBaseClass.get_fepops()
FepopsPersistentAbstractBaseClass.save_descriptors()
FepopsPersistentAbstractBaseClass.write()
- fepops.fepops_persistent.fepopsdb_json module
- fepops.fepops_persistent.fepopsdb_sqlite module
- fepops.fepops_persistent.utils module
- Module contents
Submodules
fepops.fepops module
- class fepops.fepops.GetFepopStatusCode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
- FAILED_RETRIEVED_NONE = 4
- FAILED_TO_GENERATE = 2
- FAILED_TO_RETRIEVE = 3
- SUCCESS = 1
- class fepops.fepops.OpenFEPOPS(*, kmeans_method: Literal['sklearn', 'pytorchcpu', 'pytorchgpu'] = 'sklearn', max_tautomers: int | None = 5, num_fepops_per_mol: int = 7, num_centroids_per_fepop: int = 4, descriptor_means: Tuple[float, ...] = (-0.28971602, 0.5181022, 0.37487135, 0.99922747, -0.04187301, 1.03382471, 0.27407036, 0.99853436, 0.09725517, 1.12824307, 0.23735556, 0.99882914, 0.35977538, 0.66653514, 0.41238282, 0.99902545, 5.71261449, 6.37716992, 6.47293777, 6.26134733, 6.20354385, 6.23201498), descriptor_stds: Tuple[float, ...] = (0.35110473, 1.00839329, 0.4838859, 0.02769204, 0.15418035, 0.86446056, 0.44583626, 0.0381767, 0.16095862, 0.92079483, 0.42526185, 0.03413741, 0.35756229, 1.36093993, 0.4921059, 0.0311619, 1.9668792, 2.31266486, 2.50699385, 2.41269982, 2.30018205, 2.31527129))[source]
Bases:
object
OpenFEPOPS (Feature Points) molecular similarity object
Fepops allows the comparison of molecules using feature points, see the original publication for more information: https://doi.org/10.1021/jm049654z. In short, featurepoints reduce the number of points used to represent a molecule by combining atoms and their properties. Typically used to compare libraries of small molecules against known actives in the hope of discovering biosimilars based on queries.
- Parameters:
kmeans_method (str, optional) – String literal denoting the method which should be used for kmeans calculations. May be one of “sklearn”, “pytorchgpu”, or “pytorchcpu”. If “sklearn” is passed then Scikit-learn’s kmeans implementation is used. However a faster implementation from the fast_pytorch_kmeans package can also be used if Pytorch is available and may be run in cpu-only mode, or GPU accelerated mode. Note: GPU accelerated mode should only be used if you are stretching the capabilities in terms of feature points for large molecules. Small molecules will not benefit at all from GPU acceleration due to overheads. By default “sklearn”
max_tautomers (Union[int, None], optional) – Maximum number of tautomers which should be generated. Internally, this implementation of FEPOPS relies upon RDKit’s TautomerEnumerator to generate tautomers and pass 5 to the number of tautomers to generate based on original FEPOPS paper. Unless the molecules (or macromolecules) you areworking with generate massive numbers of tautomers, this may optionally set as None implying that no limit should be placed on tautomer generation. By default 5
num_fepops_per_mol (int, optional) – Number of feature points to use in the representation of a molecule. Literature notes that 7 has been empirically found to be a good number of feature points for performant representations of small molecules. This might be increased if you are dealing with large and very flexible molecules, by default 7
num_centroids_per_fepop (int, optional) – Each fepop is represented by a number of centres, into which atom properties are compressed. Literature notes that this has been empirically determined to be 4 for a performant representation of small molecules. By default 4
descriptor_means (Tuple[float, ...], optional) – Due to the need to apply scaling to FEPOPS, the DUDE diversity set has been profiled and the means collected for all contained FEPOPS. This this allows centering and scaling of FEPOPS before scoring. This field contains default values for FEPOP means calculated with num_fepops_per_mol = 7, num_centroids_per_fepop=4, and kmeans_method = ‘sklearn’. New values should be supplied if the FEPOPS object is using different numbers for these values. By default (-0.28932319,0.5166312, 0.37458883,0.99913668,-0.04193182,1.03616917,0.27327129,0.99839024, 0.09701198,1.12969387,0.23718642,0.99865705,0.35968991,0.6649304, 0.4123743,0.99893657,5.70852885,6.3707943,6.47354071,6.26385429, 6.19229367,6.22946713)
descriptor_sds (Tuple[float, ...], optional) – Due to the need to apply scaling to FEPOPS, the DUDE diversity set has been profiled and the means collected for all contained FEPOPS. This this allows centering and scaling of FEPOPS before scoring. This field contains default values for FEPOP standard deviations calculated with num_fepops_per_mol = 7, num_centroids_per_fepop=4, and kmeans_method = ‘sklearn’. New values should be supplied if the FEPOPS object is using different numbers for these values. By default (0.35067291,1.00802116, 0.48380817,0.02926675,0.15400475,0.86220776,0.44542581,0.03999429, 0.16085455,0.92042695,0.42515847,0.03655217,0.35778578,1.36108994, 0.49210665,0.03252466,1.96446927,2.30792259,2.5024708,2.4155645, 2.29434487,2.31437527)
- Raises:
ValueError – Invalid kmeans method
- calc_similarity(query: ndarray | str | None, candidate: ndarray | str | None | list[numpy.ndarray, str, None]) float [source]
Calculate FEPOPS similarity
Method for calculating molecular similarity based on their OpenFEPOPS descriptors. Centres and scales FEPOPS descriptors using parameters passed upon object initialisation.
- Parameters:
query (Union[np.ndarray, str]) – A Numpy array containing the FEPOPS descriptors of the query molecule or a smiles string from which to generate FEPOPS descriptors for the query molecule. Can also be None, in which case, np.nan is returned as a score.
candidate (Union[np.ndarray, str, None, list[np.ndarray, str, None]],) – A Numpy array containing the FEPOPS descriptors of the candidate molecule or a smiles string from which to generate FEPOPS descriptors for the candidate molecule. Can also be None, in which case, np.nan is returned as a score, or a list of any of these. If it is a list, then a list of scores against the single candidate is returned.
- Returns:
Fepops similarity between two molecules
- Return type:
float
- generate_conformers(mol: Mol, random_state: int = 42) list [source]
Generate conformers with rotatable bonds
Generate conformers for a molecule, enumerating rotatable bonds over 90 degree angles. This 90 degree increment was deemed opimal in literature.
- Parameters:
mol (Chem.rdchem.Mol) – The Rdkit mol object of the input molecule.
random_state (int) – Integer to use as a random state when seeding the random number generator. By default 42.
- Returns:
A list containing mol objects of different conformers with different angles of rotetable bonds
- Return type:
List
- get_centroid_pharmacophoric_features(mol: Mol) ndarray [source]
Obtain centroids and their corresponding pharmacophoric features
Obtain centroids and then calucate and assign their corresponding pharmacophoric features (logP, charges, HBA, HBD, and distances between the centroids, following the pattern used for calculation of matrix determinants - in the case of 4 centroids, this is: d1-4, d1-2, d2-3, d3-4, d1-3, d2-4)
- Parameters:
mol (Chem.rdchem.Mol) – The Rdkit mol object of the input molecule.
- Returns:
A Numpy array containing 22 pharmacophoric features for all conformers.
- Return type:
np.ndarray
- get_fepops(mol: str | None | Mol, is_canonical: bool = False) Tuple[GetFepopStatusCode, ndarray | None] [source]
Get Fepops descriptors for a molecule
- Parameters:
mol (Union[str, None, Chem.rdchem.Mol]) – Molecule as a SMILES string or RDKit molecule. Can also be None, in which case a failure error status is returned along with None in place of the requested Fepops descriptors.
- Returns:
Returns a tuple, with the first value being a GetFepopStatusCode (enum) denoting SUCCESS or FAILED_TO_GENERATE. The second tuple element is either None (if unsuccessful), or a np.ndarray containing the calculated Fepops descriptors of the requested input molecule.
- Return type:
Tuple[GetFepopStatusCode, Union[np.ndarray, None]]
Module contents
FEPOPS module containing the OpenFEPOPS class