fepops.fepops_persistent package
Submodules
fepops.fepops_persistent.fepops_persistent_abc module
- class fepops.fepops_persistent.fepops_persistent_abc.FepopsPersistentAbstractBaseClass(database_file: str | Path, kmeans_method: str = 'sklearn', parallel: bool = True, n_jobs: int = -1)[source]
Bases:
object
Abstract base class for persistent fepops storage
New storage methods may be implemented as demonstrated in fepopsdb_json.py or in fepopsdb_sqlite.py by extending this abstract base class which provides some required functionality like:
- save_descriptors(smiles: Union[str, Path, list[str]]) to save a smiles
file/list of smiles to the persistent storage
- get_cansmi_to_mol_dict_not_in_database(smiles: Union[str, Path, list[str]])
to retrieve a unique dictionary with canonical smiles as keys not already stored in the database and rdkit mol objects as values.
When writing your own persistent storage methods, you must override the following methods:
add_fepop(rdkit_canonical_smiles: str, fepops: np.ndarray)
Add the fepop to persistent storage. super().add_fepop may be called by the overridden function to perform type checks on arguments.
fepop_exists(rdkit_canonical_smiles: str)
Return True if the canonical smiles is already in the database, and False if not. super().fepop_exists may be called by the overridden function to perform type checks on arguments.
get_fepops(rdkit_canonical_smiles: str)
Return a fepop from persistent storage. If it does not exist, then generate it by calling self.fepops_object.get_fepops which is supplied by this base class. super().get_fepops may be called by the overridden function to perform type checks on arguments. With this function in place it allows interface compatibility with a standard Fepops object.
Inheriting functions may also define __enter__ and __exit__ methods for use with context handlers. If none are defined, then empty ones are provided. This can be useful in doing things like writing out large files after descriptor generation if incremental writes are not possible, like in the case of the FepopsDBJSON child class.
- param database_file:
File to use for persistent storage.
- type database_file:
Union[str, Path]
- param kmeans_method:
Method which should be used for kmeans calculation by fepops objects, can be one of “sklearn”, “pytorchgpu”, or “pytorchcpu”.
- type kmeans_method:
str, optional
- param parallel:
Run in parallel (using joblib), by default True
- type parallel:
bool, optional
- param n_jobs:
Number of jobs to be spawned with joblib. If -1, then use all available cores. By default -1
- type n_jobs:
int, optional
- abstract add_fepop(rdkit_canonical_smiles: str, fepops: ndarray)[source]
Add canonical smiles and fepop to database. Must be overridden
This abstractmethod must be overridden by the inheriting object, but provides some functionality for sanity checking input and may be called by the inheriting class.
- calc_similarity(fepops_features_1: ndarray | str | None, fepops_features_2: ndarray | str | None, is_canonical=True)[source]
Calculate FEPOPS similarity
A static method for calculating molecular similarity based on their FEPOPS descriptors.
- Parameters:
fepops_features_1 (Union[np.ndarray, str, None]) – A Numpy array containing the FEPOPS descriptors of the query molecule or a smiles string from which to generate FEPOPS descriptors for the query molecule.
fepops_features_2 (Union[np.ndarray, str, None, list[np.ndarray, str, None]]) – A Numpy array containing the FEPOPS descriptors of the candidate molecule or a smiles string from which to generate FEPOPS descriptors for the candidate molecule. Can also be None, in which case, np.nan is returned as a score, or a list of any of these. If it is a list, then a list of scores against the single candidate is returned.
- Returns:
Fepops similarity between two molecules
- Return type:
float
- abstract fepop_exists(rdkit_canonical_smiles: str) bool [source]
Return True if canonical smiles already exist in the database
This abstractmethod must be overridden by the inheriting object, but provides some functionality for sanity checking input and may be called by the inheriting class.
- get_cansmi_to_mol_dict_not_in_database(smiles: str | Path | list[str], smiles_guaranteed_rdkit_canonical: bool = False)[source]
Get smiles to mol dict for smiles not in the database
- Parameters:
smiles (Union[str, Path, list[str]]) – If a string is passed, then it is assumed to be a file path of SMILES file and this file is loaded for processing. Similarly, Path objects are assumed to point at SMILES files for processing. If passing smiles strings to this function, the wrap the string in a list (making a list containing only one element), or provide large multi-SMILES lists which will be operated upon directly
smiles_guaranteed_rdkit_canonical (bool, optional) – If the supplied SMILES are canonical RDKit-generated SMILES, then regeneration of these SMILES strings for uniquification and database lookup may be skipped, by default False
- Returns:
Dictionary with SMILES as keys and RDKit molecules as the values for molecules not present in the current database
- Return type:
dict
- abstract get_fepops(smiles: str | Mol | ndarray, is_canonical: bool = True) None [source]
Get a FEPOP from the database using its SMILES. Must be overridden
This abstractmethod must be overridden by the inheriting object, but provides some functionality for sanity checking input and may be called by the inheriting class.
- Parameters:
smiles (Union[str, Chem.rdchem.Mol, np.ndarray]) – _description_
is_canonical (bool, optional) – If True, then the supplied SMILES are guaranteed to be canonical SMILES generated by RDKit which allows skipping of a sanitisation step, by default True
- save_descriptors(smiles: str | Path | list[str], add_failures_to_database: bool = True, smiles_guaranteed_rdkit_canonical: bool = False, fepops_object_constructor_kwargs: dict = {})[source]
Pregenerate FEPOPS descriptors for a set of SMILES strings
- Parameters:
smiles (Union[str, Path, list[str]]) – String containing the path to a SMILES file which should be read in and have each molecule with in added to the database
add_failures_to_database (bool) – If True, then a record is kept in the database for SMILES which were problematic and FEPOPS descriptor generation failed for, by default True
smiles_guaranteed_rdkit_canonical (bool) – If True, then the supplied SMILES are guaranteed to be canonical SMILES generated by RDKit which allows skipping of a sanitisation step, by default False
fepops_object_constructor_kwargs (dict) – Dictionary of kwargs which will be passed to the FEPOPS object upon initialisation, by default {}
fepops.fepops_persistent.fepopsdb_json module
- class fepops.fepops_persistent.fepopsdb_json.FepopsDBJSON(database_file: str | Path, kmeans_method: str = 'sklearn', parallel: bool = True, n_jobs: int = -1)[source]
Bases:
FepopsPersistentAbstractBaseClass
FepopsDBJSON - allows reading and writing to a simple JSON style cache
- add_fepop(rdkit_canonical_smiles: str, fepops: ndarray | None)[source]
Add a FEPOP to the database using the supplied SMILES as a key
- Parameters:
rdkit_canonical_smiles (str) – Canonical SMILES string generated by RDKit which represents the molecule used to generate the FEPOPS
fepops (Union[np.ndarray, None]) – Array containing calculated FEPOPS descriptors. If None, then None is stored in the database, which is useful for indicating that the canonical SMILES supplied did not succeed in generating a molecule and subsequent FEPOPS. Marking these difficult SMILES in the database means they can be checked and ignored without further time being spent to regenerate them again.
- fepop_exists(rdkit_canonical_smiles: str) bool [source]
Check if Fepop exists in the database
If the fepops object was constructed with a database file, then query if the supplied canonical SMILES is included. If no database is present, then False is returned, as if it is not included.
- Parameters:
rdkit_canonical_smiles (str) – Canonical smiles to check
- Returns:
True if the canonical smiles exists in the database
- Return type:
bool
- get_fepops(smiles: str, is_canonical: bool = False) ndarray | None [source]
Get FEPOPS from the database for a given SMILES
- Parameters:
smiles (str) – The SMILES string of the molecule
is_canonical (bool, optional) – If True, then we guarantee that the SMILES string supplied is canonical and generated by RDKit and in which case, we may skip a cleaning step, by default False
- Returns:
Returns an array representing the retrieved FEPOPS, or None if None was stored in the database under the supplied SMILES key
- Return type:
Union[np.ndarray, None]
fepops.fepops_persistent.fepopsdb_sqlite module
- class fepops.fepops_persistent.fepopsdb_sqlite.FepopsDBSqlite(database_file: str | Path, kmeans_method: str = 'sklearn', parallel: bool = True, n_jobs: int = -1)[source]
Bases:
FepopsPersistentAbstractBaseClass
FepopsDBSqlite - allows reading and writing to a sqlite cache/database
- add_fepop(rdkit_canonical_smiles: str, fepops: ndarray | None)[source]
Add a FEPOP to the database using the supplied SMILES as a key
- Parameters:
rdkit_canonical_smiles (str) – Canonical SMILES string generated by RDKit which represents the molecule used to generate the FEPOPS
fepops (Union[np.ndarray, None]) – Array containing calculated FEPOPS descriptors. If None, then None is stored in the database, which is useful for indicating that the canonical SMILES supplied did not succeed in generating a molecule and subsequent FEPOPS. Marking these difficult SMILES in the database means they can be checked and ignored without further time being spent to regenerate them again.
- fepop_exists(rdkit_canonical_smiles: str) bool [source]
Check if Fepop exists in the database
If the fepops object was constructed with a database file, then query if the supplied canonical SMILES is included. If no database is present, then False is returned, as if it is not included.
- Parameters:
rdkit_canonical_smiles (str) – Canonical smiles to check
- Returns:
True if supplied canonical smiles exists in the database
- Return type:
bool
- get_fepops(smiles, is_canonical=False) ndarray | None [source]
Get FEPOPS from the database for a given SMILES
- Parameters:
smiles (str) – The SMILES string of the molecule
is_canonical (bool, optional) – If True, then we guarantee that the SMILES string supplied is canonical and generated by RDKit and in which case, we may skip a cleaning step, by default False
- Returns:
Returns an array representing the retrieved FEPOPS, or None if None was stored in the database under the supplied SMILES key
- Return type:
Union[np.ndarray, None]
fepops.fepops_persistent.utils module
- fepops.fepops_persistent.utils.get_persistent_fepops_storage_object(database_file: str | Path, kmeans_method: str = 'sklearn', parallel: bool = True, n_jobs: int = -1) FepopsDBSqlite | FepopsDBJSON [source]
Module contents
fepops_persistent module contains functionality to cache/save FEPOPS descriptors to a file