B_HIT.st.tl.ClusterAutoK#

class B_HIT.st.tl.ClusterAutoK(n_clusters, max_runs=5, convergence_tol=0.01, model_class=None)#: Identify the best candidates for the number of clusters.

Attributes table#

`best_k`	The number of clusters with the highest silhouette_scores.
`silhouette_scores`

`fit`(adata[, use_rep, verbose])	Fit the clustering model with a range of cluster numbers and calculate silhouette scores.
`predict`(adata[, use_rep, k, store_labels, ...])	Predict cluster labels for the data in the given representation and optionally store the labels in `adata.obs`.

ClusterAutoK.best_k#: The number of clusters with the highest silhouette_scores.

ClusterAutoK.fit(adata, use_rep='X_cellcharter', verbose=True)#

Fit the clustering model with a range of cluster numbers and calculate silhouette scores.

Parameters:

adata (AnnData) – AnnData object containing the data to cluster.
use_rep (str (default: 'X_cellcharter')) – str, the key in adata.obsm to use for clustering. Defaults to “X_cellcharter”.
verbose (bool (default: True)) – bool, whether to display the fitting process. Defaults to True.

ClusterAutoK.predict(adata, use_rep=None, k=None, store_labels=False, store_column='predicted_labels')#

Predict cluster labels for the data in the given representation and optionally store the labels in adata.obs.

Parameters:

adata (AnnData) – AnnData object containing the dataset. The data to be clustered is accessed from adata.obsm or adata.X.
use_rep (Optional[str] (default: None)) – The key in adata.obsm to use as the data representation for clustering. If None, the method defaults to: - adata.obsm['X_cellcharter'], if it exists, or - adata.X as a fallback.
k (Optional[int] (default: None)) – The number of clusters to predict labels for. If not specified, the best number of clusters (self.best_k) will be used. Must be one of the values in self.n_clusters.
store_labels (bool (default: False)) – If True, the predicted labels will be stored in adata.obs under the column name specified by store_column. Default is False.
store_column (str (default: 'predicted_labels')) – The name of the column in adata.obs where predicted labels will be stored if store_labels is True. Default is 'predicted_labels'.

Return type:

Categorical

Returns:

pd.Categorical A pandas Categorical object containing the predicted cluster labels. The labels are integers ranging from 0 to k-1.

Raises:

AssertionError – If k is provided and it is not in self.n_clusters.

Notes

This method relies on the clustering models stored in self.best_models for label prediction.
Ensure that the model for the desired k clusters has been fitted prior to calling this method.