API Reference
Core Functions
Preprocessing
- cellcommunicationpf2.import_data.prepare_dataset(X: anndata.AnnData, condition_name: str, geneThreshold: float, normalize: bool = False)[source]
Preprocess single-cell RNA-seq data for CCC-RISE analysis.
This function performs essential preprocessing steps including cell and gene filtering, normalization, log transformation, and creation of condition indices required for CCC-RISE decomposition.
- Parameters:
X (anndata.AnnData) – AnnData object containing raw count data in sparse matrix format.
condition_name (str) – Name of the column in X.obs that specifies experimental conditions for each cell.
geneThreshold (float) – Minimum mean expression threshold for gene filtering. Genes with mean expression below this value are removed.
normalize (bool, optional) – If True, performs normalization and log transformation. If False, keeps raw counts.
- Returns:
Preprocessed AnnData object with filtered cells/genes, added condition_unique_idxs column, and gene means in X.var[“means”].
- Return type:
- cellcommunicationpf2.import_data.import_ligand_receptor_pairs(filename: str = '/opt/andrew/ccc/Human-2020-Jin-LR-pairs.csv.zst', update_interaction_names: bool = True)[source]
Import ligand-receptor pairs from a compressed CSV file with caching.
Loads a curated database of ligand-receptor pairs from CellChat. The function uses LRU caching to avoid repeated file reads. For protein complexes, subunits are separated by & (e.g., CD74&CD44).
- Parameters:
- Returns:
DataFrame with ligand and receptor columns containing gene names.
- Return type:
pd.DataFrame
- cellcommunicationpf2.import_data.add_cond_idxs(X: anndata.AnnData, condition_key: str)[source]
Add unique 0-indexed condition identifiers to an AnnData object.
Creates a new column condition_unique_idxs in X.obs that maps condition labels to 0-based integer indices. This is required for proper data partitioning in the PARAFAC2 decomposition.
- Parameters:
X (anndata.AnnData) – AnnData object to add condition indices to.
condition_key (str) – Column name in X.obs containing condition identifiers.
- Returns:
AnnData object with added condition_unique_idxs column in X.obs.
- Return type:
Factorization
- cellcommunicationpf2.tensor.run_ccc_rise_workflow(adata: anndata.AnnData, rise_rank: int, lr_pairs: pd.DataFrame, cp_rank: int = None, condition_column: str = 'sample', n_iter_max: int = 100, tol: float = 1e-3, random_state: int = None, complex_sep: str = None, doEmbedding: bool = True, svd_init: str = 'svd')
Execute the complete CCC-RISE workflow including RISE decomposition, CPD factorization, and result storage.
This is the main function for performing CCC-RISE analysis. It executes both the RISE (PARAFAC2) decomposition of expression data and the CPD decomposition of the resulting interaction tensor. The function decomposes cell-cell communication into four interpretable factor matrices representing conditions, sender cells, receiver cells, and ligand-receptor pairs.
- Parameters:
adata (anndata.AnnData) – AnnData object with preprocessed scRNA-seq data. Must have condition_unique_idxs in adata.obs.
rise_rank (int) – Number of PARAFAC2 components to extract from expression data. Typically chosen based on FMS and R²X analysis.
lr_pairs (pd.DataFrame) – DataFrame of ligand-receptor pairs with ‘ligand’ and ‘receptor’ columns.
cp_rank (int, optional) – Number of CPD components for factorizing the interaction tensor. If None, defaults to rise_rank.
condition_column (str, optional) – Column name in adata.obs containing condition identifiers.
n_iter_max (int, optional) – Maximum iterations for decomposition.
tol (float, optional) – Convergence tolerance for optimization.
random_state (int, optional) – Random seed for reproducibility of decomposition.
complex_sep (str, optional) – Separator for protein complexes in L-R pairs (typically “&”).
doEmbedding (bool, optional) – If True, automatically computes PaCMAP embeddings for visualization and stores in adata.obsm[“PaCMAP”].
svd_init (str, optional) – Initialization method for CPD (‘svd’ or ‘random’).
- Returns:
Tuple containing the updated AnnData object with stored results and the R²X value (variance explained).
- Return type:
- cellcommunicationpf2.tensor.calculate_interaction_tensor(X_filtered: anndata.AnnData, lr_pairs: pd.DataFrame, rise_rank: int)
Calculate the interaction tensor from AnnData object using PARAFAC2 and communication scores.
This function performs RISE decomposition and computes cell-cell communication scores for all ligand-receptor pairs across sender-receiver latent cell state pairs. The resulting tensor has dimensions (rise_rank × rise_rank × n_lr_pairs × n_conditions).
- Parameters:
X_filtered (anndata.AnnData) – AnnData object containing preprocessed expression data.
lr_pairs (pd.DataFrame) – DataFrame of ligand-receptor pairs with ‘ligand’ and ‘receptor’ columns.
rise_rank (int) – Number of PARAFAC2 components to extract before computing communication scores.
- Returns:
Interaction tensor with dimensions (rise_rank × rise_rank × n_lr_pairs × n_conditions).
- Return type:
np.ndarray
- cellcommunicationpf2.tensor.run_fms_r2x_analysis(interaction_tensor: np.ndarray, rank_list: list[int] = None, runs: int = 1, svd_init: str = 'svd')
Run Factor Match Score (FMS) and R²X analysis across different CPD ranks to assess stability and fit.
This function helps determine the optimal CPD rank by evaluating model stability (FMS) through bootstrap resampling and variance explained (R²X). FMS values above 0.6 indicate reliable, reproducible components. Use this before finalizing your CPD rank choice.
- Parameters:
interaction_tensor (np.ndarray) – Pre-computed interaction tensor from calculate_interaction_tensor().
rank_list (list of int, optional) – List of CPD ranks to test (e.g., [1, 3, 5, 7, 9]). If None, defaults to [1, 3].
runs (int, optional) – Number of bootstrap runs for stability assessment.
svd_init (str, optional) – Initialization method (‘svd’ or ‘random’).
- Returns:
DataFrame with columns [‘Run’, ‘Component’, ‘FMS’, ‘R2X’] for each rank and run.
- Return type:
pd.DataFrame
Visualization Functions
Rank Selection
- cellcommunicationpf2.figures.commonFuncs.plotGeneral.plot_fms_r2x_diff_ranks(X: anndata.AnnData, condition_name: str, ax1: matplotlib.axes.Axes, ax2: matplotlib.axes.Axes, ranksList: list[int], runs: int)
Plot Factor Match Score (FMS) and R²X across different RISE ranks for rank selection.
This function evaluates multiple RISE ranks by computing FMS (stability) and R²X (variance explained) metrics. It performs bootstrap resampling to assess component reproducibility. Use this to determine the optimal RISE rank before running the full CCC-RISE workflow.
- Parameters:
X (anndata.AnnData) – AnnData object with preprocessed data. Must have condition_unique_idxs in X.obs.
condition_name (str) – Column name in X.obs containing condition identifiers for bootstrap resampling.
ax1 (matplotlib.axes.Axes) – Matplotlib axes object for plotting FMS values.
ax2 (matplotlib.axes.Axes) – Matplotlib axes object for plotting R²X values.
ranksList (list of int) – List of RISE ranks to evaluate (e.g., [5, 10, 15, 20, 25, 30, 35, 40]).
runs (int) – Number of bootstrap runs for stability assessment per rank.
Factor Plotting
- cellcommunicationpf2.figures.commonFuncs.plotFactors.plot_condition_factors(data: anndata.AnnData, ax: matplotlib.axes.Axes, cond: str = 'Condition', cond_group_labels: pd.Series = None, color_key: list = None, group_cond: bool = False, normalize: bool = False)[source]
Plot Factor A (condition factors) as a heatmap showing how conditions contribute to components.
This visualization shows how each experimental condition (rows) contributes to each CCC-RISE component (columns). High values indicate strong association between a condition and a component’s communication pattern.
- Parameters:
data (anndata.AnnData) – AnnData object with stored CCC-RISE results. Must contain data.uns[“A”] and data.obs[cond].
ax (matplotlib.axes.Axes) – Matplotlib axes object to plot on.
cond (str, optional) – Name of column in data.obs containing condition labels.
cond_group_labels (pd.Series, optional) – Series mapping conditions to group labels for colored row annotations.
color_key (list, optional) – Custom colors for condition group labels.
group_cond (bool, optional) – If True and cond_group_labels provided, sorts conditions by group.
normalize (bool, optional) – If True, normalizes each component to [-1, 1] range.
- cellcommunicationpf2.figures.commonFuncs.plotFactors.plot_eigenstate_factors(data: anndata.AnnData, ax: matplotlib.axes.Axes, factor_type: str)[source]
Plot Factor B (sender eigenstates) or Factor C (receiver eigenstates) as a heatmap.
Eigenstate factors represent the underlying cell state patterns across components in the latent RISE space. Each row represents a latent dimension from RISE and each column represents a CCC-RISE component.
- Parameters:
data (anndata.AnnData) – AnnData object with stored CCC-RISE results. Must contain data.uns[“B”] or data.uns[“C”].
ax (matplotlib.axes.Axes) – Matplotlib axes object to plot on.
factor_type (str) – Either “B” for sender eigenstates or “C” for receiver eigenstates.
- cellcommunicationpf2.figures.commonFuncs.plotFactors.plot_lr_factors(data: anndata.AnnData, ax: matplotlib.axes.Axes, trim: bool = True, weight: float = 0.08)[source]
Plot Factor D (ligand-receptor pairs) as a heatmap showing which L-R pairs drive each component.
This visualization reveals coordinated signaling programs by showing which ligand-receptor pairs (rows) are highly weighted in each component (columns). Only L-R pairs with maximum absolute weight above the threshold are displayed.
- Parameters:
data (anndata.AnnData) – AnnData object with stored CCC-RISE results. Must contain data.uns[“D”] and data.uns[“lr_pairs”].
ax (matplotlib.axes.Axes) – Matplotlib axes object to plot on.
trim (bool, optional) – If True, filters L-R pairs based on the weight parameter.
weight (float, optional) – Minimum absolute weight threshold for including L-R pairs.
- cellcommunicationpf2.figures.commonFuncs.plotFactors.plot_lr_factors_partial(X: anndata.AnnData, cmp: int, ax: matplotlib.axes.Axes, geneAmount: int = 5, top: bool = True)[source]
Plot the top or bottom weighted ligand-receptor pairs for a specific component as a bar plot.
This visualization identifies the most positively or negatively weighted L-R pairs for a single component, revealing which specific interactions are most associated with that communication pattern.
- Parameters:
X (anndata.AnnData) – AnnData object with stored CCC-RISE results. Must contain X.uns[“D”] and X.uns[“lr_pairs”].
cmp (int) – Component number to visualize (1-indexed).
ax (matplotlib.axes.Axes) – Matplotlib axes object to plot on.
geneAmount (int, optional) – Number of L-R pairs to display.
top (bool, optional) – If True, shows highest-weighted pairs; if False, shows lowest-weighted pairs.
PaCMAP Visualization
- cellcommunicationpf2.figures.commonFuncs.plotPaCMAP.plot_labels_pacmap(X: anndata.AnnData, labelType: str, ax: matplotlib.axes.Axes, condition: list = None, cmap: str = 'tab20', color_key: list = None)[source]
Plot PaCMAP embedding colored by categorical labels (cell type or condition).
This visualization shows the overall structure of the cell embedding in the latent communication space, revealing how cells cluster by cell type or experimental condition based on their communication patterns.
- Parameters:
X (anndata.AnnData) – AnnData object with RISE decomposition results. Must contain X.obsm[“PaCMAP”] and X.obs[labelType].
labelType (str) – Name of column in X.obs containing categorical labels to color by.
ax (matplotlib.axes.Axes) – Matplotlib axes object to plot on.
condition (list of str, optional) – If provided, only highlights cells from these specific conditions.
cmap (str, optional) – Matplotlib colormap name for coloring categories.
color_key (list, optional) – Custom list of colors for categories.
- cellcommunicationpf2.figures.commonFuncs.plotPaCMAP.plot_wc_pacmap(X: anndata.AnnData, cmp: int, ax: matplotlib.axes.Axes, cbarMax: float = 1.0, factor_matrix: str = None)[source]
Plot PaCMAP embedding colored by weighted projections for a specific component.
This visualization shows which cells contribute most strongly to a specific component by coloring them according to their sender (Factor B) or receiver (Factor C) weights. Higher values indicate stronger association with the component’s communication pattern.
- Parameters:
X (anndata.AnnData) – AnnData object with RISE decomposition results. Must contain X.obsm[“PaCMAP”] and X.obsm[“sc_B”] or X.obsm[“rc_C”].
cmp (int) – Component number to visualize (1-indexed).
ax (matplotlib.axes.Axes) – Matplotlib axes object to plot on.
cbarMax (float, optional) – Maximum value for the color scale.
factor_matrix (str) – Either “B” for sender weights or “C” for receiver weights.
Analysis Functions
- cellcommunicationpf2.utils.expression_product_matrix(X1: anndata.AnnData, X2: anndata.AnnData, ligand: str, receptor: str)[source]
Calculate the expression product matrix for a specific ligand-receptor pair between cell populations.
For each cell in X1 (senders) and each cell in X2 (receivers), this computes the product of ligand expression (in sender) and receptor expression (in receiver). This represents the potential for that specific L-R interaction between each sender-receiver cell pair.
- Parameters:
X1 (anndata.AnnData) – AnnData object containing sender cells.
X2 (anndata.AnnData) – AnnData object containing receiver cells.
ligand (str) – Ligand gene name (must be present in X1.var_names).
receptor (str) – Receptor gene name (must be present in X2.var_names).
- Returns:
DataFrame with sender cells as rows and receiver cells as columns, values are expression products.
- Return type:
pd.DataFrame
- cellcommunicationpf2.utils.average_product_matrix_ccc(df: pd.DataFrame)[source]
Reduce an expression product matrix to a 10×10 matrix by binning rows and columns and averaging.
This function groups rows and columns into 10 bins each and computes the mean within each bin, creating a summarized heatmap suitable for visualization of large cell-cell interaction matrices.
- Parameters:
df (pd.DataFrame) – Expression product matrix from expression_product_matrix().
- Returns:
10×10 DataFrame with averaged expression products.
- Return type:
pd.DataFrame