B_HIT.sVDJ.tl.compute_richness

B_HIT.sVDJ.tl.compute_richness#

B_HIT.sVDJ.tl.compute_richness(df, extra_cols, groupby_cols, richness_name, default_value=0, return_df=False)#

Compute the richness of groups based on unique combinations of key columns.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing the data to compute richness.

  • extra_cols (list) – List of columns which is not included in groupby_cols but needs to be selected (e.g. [‘sample’, ‘family_id’]).

  • groupby_cols (list) – Columns that define groups.

  • richness_name (str) – Name of the richness column.

  • default_value (int, optional (default=0)) – Default value to return if a group is missing from the lookup.

  • return_df (bool, optional (default=False)) – If True, return the richness DataFrame instead of mapped values.

Returns:

pd.DataFrame or pd.Series If return_df=True, returns the richness DataFrame. If return_df=False, returns a Series with richness values mapped to df.