B_HIT.sVDJ.tl.compute_correlation

B_HIT.sVDJ.tl.compute_correlation#

B_HIT.sVDJ.tl.compute_correlation(cloneRich, groupby_cols, corr1, corr2, save=False, path=None, compute_corr_matrix=False)#

Compute Pearson correlation between two variables, grouped by specific columns.

Parameters:
  • cloneRich (pd.DataFrame) – The input DataFrame containing the data.

  • groupby_cols (list of str) – List of columns to group by (e.g., [‘Cregion_simple’, ‘tissue’]).

  • corr1 (str) – The name of the first column to compute correlation for (e.g., ‘BaggArea’).

  • corr2 (str) – The name of the second column to compute correlation for (e.g., ‘gini_index’).

  • save (bool, optional) – Whether to save the output to CSV. Default is False.

  • path (str, optional) – The file path to save the output CSV. Default is None.

  • compute_corr_matrix (bool, optional) – Whether to compute and return the correlation matrix and p-value matrix. Default is False.

Returns:

pd.DataFrame

A DataFrame containing the correlation and p-value for each group.

pd.DataFrame, optional

A correlation matrix (if compute_corr_matrix is True).

pd.DataFrame, optional

A p-value matrix (if compute_corr_matrix is True).