B_HIT.st.tl.ClusterAutoK#

class B_HIT.st.tl.ClusterAutoK(n_clusters, max_runs=5, convergence_tol=0.01, model_class=None)#

Identify the best candidates for the number of clusters.

Attributes table#

best_k

The number of clusters with the highest silhouette_scores.

silhouette_scores

Methods table#

fit(adata[, use_rep, verbose])

Fit the clustering model with a range of cluster numbers and calculate silhouette scores.

predict(adata[, use_rep, k, store_labels, ...])

Predict cluster labels for the data in the given representation and optionally store the labels in adata.obs.

Attributes#

ClusterAutoK.best_k#

The number of clusters with the highest silhouette_scores.

ClusterAutoK.silhouette_scores: ndarray#

Methods#

ClusterAutoK.fit(adata, use_rep='X_cellcharter', verbose=True)#

Fit the clustering model with a range of cluster numbers and calculate silhouette scores.

Parameters:
  • adata (AnnData) – AnnData object containing the data to cluster.

  • use_rep (str (default: 'X_cellcharter')) – str, the key in adata.obsm to use for clustering. Defaults to “X_cellcharter”.

  • verbose (bool (default: True)) – bool, whether to display the fitting process. Defaults to True.

ClusterAutoK.predict(adata, use_rep=None, k=None, store_labels=False, store_column='predicted_labels')#

Predict cluster labels for the data in the given representation and optionally store the labels in adata.obs.

Parameters:
  • adata (AnnData) – AnnData object containing the dataset. The data to be clustered is accessed from adata.obsm or adata.X.

  • use_rep (Optional[str] (default: None)) – The key in adata.obsm to use as the data representation for clustering. If None, the method defaults to: - adata.obsm['X_cellcharter'], if it exists, or - adata.X as a fallback.

  • k (Optional[int] (default: None)) – The number of clusters to predict labels for. If not specified, the best number of clusters (self.best_k) will be used. Must be one of the values in self.n_clusters.

  • store_labels (bool (default: False)) – If True, the predicted labels will be stored in adata.obs under the column name specified by store_column. Default is False.

  • store_column (str (default: 'predicted_labels')) – The name of the column in adata.obs where predicted labels will be stored if store_labels is True. Default is 'predicted_labels'.

Return type:

Categorical

Returns:

pd.Categorical A pandas Categorical object containing the predicted cluster labels. The labels are integers ranging from 0 to k-1.

Raises:

AssertionError – If k is provided and it is not in self.n_clusters.

Notes

  • This method relies on the clustering models stored in self.best_models for label prediction.

  • Ensure that the model for the desired k clusters has been fitted prior to calling this method.