Tutorial for CCC-RISE: Cell-Cell Communication Analysis
This tutorial demonstrates the complete CCC-RISE workflow for analyzing cell-cell communication in single-cell RNA-seq data across experimental conditions.
Installation
To add CCC-RISE to your Python package, add the following line to your requirements.txt and remake your virtual environment:
git+https://github.com/meyer-lab/cellcommunication-Pf2.git@main
Preprocessing the Dataset
Input Requirements
Your AnnData object must meet the following requirements:
Condition Index: Include an observations column
condition_unique_idxsthat is a 0-indexed array indicating which condition each cell is derived from, along with the cell barcode. Condition 1 cells are indexed as 0, Condition 2 as 1, and so on.Preprocessing: Your AnnData object must be preprocessed (doublets removed, genes filtered, normalized, and log-transformed) before running CCC-RISE. Standard preprocessing functions can assist with gene filtering, normalization, and assigning
condition_unique_idxs.
Ligand-Receptor Pairs
CCC-RISE requires a DataFrame of ligand-receptor pairs to analyze cell-cell communication. This can be obtained from resources like CellChat or other ligand-receptor databases.
Required DataFrame Format:
Must contain columns named
ligandandreceptorGene names should match those in your AnnData object (typically uppercase)
For protein complexes, subunits should be separated by
&(e.g.,CD74&CD44)Example structure:
ligand receptor CCL19 CCR7 PTN PTPRZ1 CD74&CD44 CD44
The package includes a default function import_ligand_receptor_pairs() that loads a curated database from CellChat, but you can provide your own DataFrame following this format.
Import and Prepare the Dataset
The prepare_dataset function assists with preprocessing your data. Parameters:
X: AnnData object containing raw count data in sparse matrix formatcondition_name: Name of the column inX.obsthat specifies experimental conditions for each cellgeneThreshold: Minimum mean expression threshold for gene filtering (genes with mean expression below this value are removed)deviance: If True, applies deviance transformation instead of log normalization (default: False)
The function performs the following steps:
Filters cells with fewer than 10 total counts
Filters genes based on the
geneThresholdparameterNormalizes total counts per cell to the median
Scales gene expression values
Applies log₁₀((1000 × normalized_value) + 1) transformation (or deviance transformation if specified)
Creates
condition_unique_idxscolumn inX.obswith 0-indexed condition assignmentsPre-calculates gene means and stores in
X.var["means"]
Example Code:
Import your dataset as an AnnData object with preprocessed data:
from cellcommunicationpf2.import_data import prepare_dataset, import_ligand_receptor_pairs
import anndata
# Load your data
X = anndata.read_h5ad("your_data.h5ad")
# Prepare the dataset
X = prepare_dataset(X, condition_name="condition", geneThreshold=0.01)
# Load ligand-receptor pairs
lr_pairs = import_ligand_receptor_pairs()
Choosing the Rank
CCC-RISE involves two sequential factorization steps, each requiring rank selection:
RISE Rank Selection: Determines how many components to extract from the raw scRNA-seq data
CPD Rank Selection: Determines how many components to use when factorizing the interaction tensor
Both steps require careful evaluation using two complementary metrics:
R²X (Variance Explained): Measures the proportion of variance in the data explained by the model. Higher values indicate better model fit, but values plateau as rank increases, suggesting diminishing returns.
FMS (Factor Match Score): Measures the stability/reproducibility of components across different random initializations or bootstrap samples. Values above ~0.6 indicate stable, reliable components. Low FMS suggests overfitting or noise.
RISE Rank Selection
What is RISE?
RISE (Reduction and Insight in Single-cell Exploration) is an unsupervised, tensor-based computational method designed for the integrative analysis of single-cell RNA sequencing (scRNA-seq) data across multiple experimental conditions, such as drug treatments, patient cohorts, or time points. Built upon the PARAFAC2 tensor decomposition framework, RISE preserves the inherent three-dimensional structure of multi-condition single-cell data—conditions × cells × genes—instead of flattening it into a conventional two-dimensional matrix. The RISE rank determines how many latent patterns are extracted before computing cell-cell communication scores.
Evaluate RISE Ranks
Test different RISE ranks to find the optimal balance between model complexity and stability:
from cellcommunicationpf2.figures.commonFuncs.plotGeneral import plot_fms_r2x_diff_ranks
import matplotlib.pyplot as plt
# Create figure with two subplots
fig, ax = plt.subplots(1, 2, figsize=(12, 4))
rise_ranks = list(range(5, 41, 5))
# Plot FMS and R2X for different RISE ranks
plot_fms_r2x_diff_ranks(
X,
condition_name="condition_unique_idxs",
ax1=ax[0],
ax2=ax[1],
ranksList=rise_ranks,
runs=e
)
ax[0].set_title('RISE: Factor Match Score (FMS)')
ax[0].set_xlabel('RISE Rank')
ax[1].set_title('RISE: Variance Explained (R²X)')
ax[1].set_xlabel('RISE Rank')
plt.tight_layout()
plt.show()
Interpretation:
R²X: Look for where the curve begins to plateau. Beyond this point, additional components explain minimal additional variance.
FMS: Choose a rank where FMS is above 0.6, indicating stable components. Higher FMS means the factors are more reproducible.
Recommendation: Balance both metrics. Typically, select the rank where FMS exceeds 0.6 and R²X shows diminishing returns (e.g., RISE rank = 20-40 for most datasets).
CPD Rank Selection
What is CPD?
After RISE, an interaction tensor is computed from cell-cell communication scores between all cell pairs across L-R pairs and conditions. CPD (Canonical Polyadic Decomposition) factorizes this tensor to identify interpretable communication patterns. The CPD rank determines how many communication components are extracted.
Evaluate CPD Ranks
First, calculate the interaction tensor using a selected RISE rank, then test different CPD ranks:
from cellcommunicationpf2.tensor import calculate_interaction_tensor, run_fms_r2x_analysis
import seaborn as sns
import matplotlib.pyplot as plt
# Calculate interaction tensor with chosen RISE rank
rise_rank = 35 # Based on RISE analysis above
interaction_tensor = calculate_interaction_tensor(X, lr_pairs, rise_rank=rise_rank)
# Test CPD ranks
rank_list = list(range(1, 11, 2))
df = run_fms_r2x_analysis(
interaction_tensor,
rank_list=rank_list,
runs=3,
svd_init="random"
)
# Plot CPD FMS and R2X
fig, ax = plt.subplots(1, 2, figsize=(12, 4))
sns.lineplot(data=df, x="Component", y="FMS", ax=ax[0], label="FMS")
ax[0].set_ylim(0, 1)
ax[0].set_xlabel('CPD Rank')
sns.lineplot(data=df, x="Component", y="R2X", ax=ax[1], color="orange", label="R2X")
ax[1].set_ylim(0, np.max(df["R2X"]) + 0.02)
ax[1].set_xlabel('CPD Rank')
plt.tight_layout()
plt.show()
Interpretation:
R²X: Monitor how much of the interaction tensor’s variance is explained. Look for an elbow point where additional components provide minimal improvement.
FMS: Ensure components are stable (FMS > 0.6). Lower FMS indicates components may vary significantly with different initializations.
Recommendation: Select the CPD rank where both FMS is high (>0.6) and R²X shows an elbow. Typically 5-20 components capture most meaningful communication patterns.
Running the Factorization
Perform CCC-RISE Factorization
Based on the rank selection analysis above, choose appropriate RISE and CPD ranks and run the complete CCC-RISE workflow. This performs the two-step factorization: first extracting latent communication modes from expression data (RISE), then factorizing the resulting interaction tensor (CPD):
from cellcommunicationpf2.tensor import run_ccc_rise_workflow
# Select ranks based on FMS/R2X analysis
rise_rank = 35 # Selected from RISE rank analysis
cp_rank = 8 # Selected from CPD rank analysis
# Run the complete workflow
X, r2x = run_ccc_rise_workflow(
adata=X,
rise_rank=rise_rank,
cp_rank=cp_rank,
lr_pairs=lr_pairs,
condition_column="condition_column",
n_iter_max=10000,
tol=1e-9,
random_state=42,
complex_sep="&",
doEmbedding=True
)
print(f"Variance Explained (R²X): {r2x:.4f}")
Function Parameters
The run_ccc_rise_workflow function executes the complete CCC-RISE pipeline:
adata: AnnData object with preprocessed scRNA-seq datarise_rank: Number of PARAFAC2 components to extract from expression datacp_rank: Number of CPD components for factorizing the interaction tensor (if None, defaults torise_rank)lr_pairs: DataFrame of ligand-receptor pairs with ‘ligand’ and ‘receptor’ columnscondition_column: Column name inadata.obscontaining condition identifiers (default: “sample”)n_iter_max: Maximum iterations for decomposition (default: 100)tol: Convergence tolerance (default: 1e-3)random_state: Random seed for reproducibility (default: None)complex_sep: Separator for protein complexes in L-R pairs (default: None)doEmbedding: If True, computes PaCMAP embeddings for visualization (default: True)svd_init: Initialization method for CPD (‘svd’ or ‘random’, default: ‘svd’)
What Gets Stored
The function returns the AnnData object with added results and the final R²X value. The following are stored in the AnnData object:
Factor Matrices (in X.uns):
X.uns["weights"]- Component weights (array of lengthcp_rank)X.uns["A"]- Condition factor matrix (n_conditions ×cp_rank)X.uns["B"]- Sender cell type factor matrix (rise_rank×cp_rank)X.uns["C"]- Receiver cell type factor matrix (rise_rank×cp_rank)X.uns["D"]- Ligand-receptor pair factor matrix (n_lr_pairs ×cp_rank)
Cell Projections (in X.obsm):
X.obsm["projections"]- Cell projections from PARAFAC2 (n_cells ×rise_rank)X.obsm["sc_B"]- Sender cell loadings (n_cells ×cp_rank)X.obsm["rc_C"]- Receiver cell loadings (n_cells ×cp_rank)X.obsm["PaCMAP"]- PaCMAP embedding for visualization (n_cells × 2, ifdoEmbedding=True)
Additional Information:
X.uns["lr_pairs"]- Array of ligand-receptor pair names used in the analysisX.uns["r2x"]- Variance explained by the CPD factorization
Visualizing the Factors
CCC-RISE decomposes cell-cell communication into four factor matrices, each revealing different aspects of the communication patterns. The package provides specialized plotting functions to visualize each factor type.
Overview of the Four Factors
The factorization produces four interpretable factor matrices:
Factor A (Condition Factor): Shows how each experimental condition (e.g., patient sample, time point) contributes to each component
Factor B (Sender Cell Eigenstate): Represents sender cell patterns in the latent RISE space
Factor C (Receiver Cell Eigenstate): Represents receiver cell patterns in the latent RISE space
Factor D (Ligand-Receptor Pairs): Shows which L-R pairs drive each communication component
Visualize All Four Factors
Create visualizations for each of the four factor types:
from cellcommunicationpf2.figures.commonFuncs.plotFactors import (
plot_condition_factors,
plot_eigenstate_factors,
plot_lr_factors
)
import matplotlib.pyplot as plt
# Prepare condition grouping (if you have multiple conditions per group)
condition_column = "sample" # or your condition column name
group_col = "condition" # optional: higher-level grouping
# Create mapping from samples to their condition groups
sample_to_group = X.obs.drop_duplicates(
subset=[condition_column, group_col]
).set_index(condition_column)[group_col]
# Factor A: Condition factors
fig, ax = plt.subplots(figsize=(6, 8))
plot_condition_factors(
X,
ax,
cond=condition_column,
cond_group_labels=sample_to_group,
group_cond=True, # Sort by condition groups
normalize=True # Normalize each component
)
ax.set_title("Factor A: Condition")
plt.show()
# Factor B: Sender cell eigenstates
fig, ax = plt.subplots(figsize=(6, 8))
plot_eigenstate_factors(X, ax, factor_type="B")
ax.set_title("Factor B: Sender Cell Eigenstate")
plt.tight_layout()
plt.show()
# Factor C: Receiver cell eigenstates
fig, ax = plt.subplots(figsize=(6, 8))
plot_eigenstate_factors(X, ax, factor_type="C")
ax.set_title("Factor C: Receiver Cell Eigenstate")
plt.tight_layout()
plt.show()
# Factor D: Ligand-receptor pairs
fig, ax = plt.subplots(figsize=(6, 8))
plot_lr_factors(
X,
ax,
trim=True, # Show only top L-R pairs
weight=0.08 # Threshold for inclusion
)
ax.set_title("Factor D: LR Pairs")
plt.tight_layout()
plt.show()
Understanding Each Factor
Factor A (Condition): Heatmap rows represent conditions/samples, columns represent components. High values indicate which conditions are enriched in each communication pattern.
Factor B (Sender Eigenstate): Heatmap shows sender cell patterns in the latent space. Each row is a latent dimension from RISE, each column is a CPD component.
Factor C (Receiver Eigenstate): Similar to Factor B but for receiver cells. Reveals which latent receiver cell states participate in each component.
Factor D (LR Pairs): Shows which ligand-receptor interactions drive each component. Only top-weighted pairs are displayed for clarity.
Visualize Cell Projections with PaCMAP
Explore how individual cells are positioned in the latent communication space:
from cellcommunicationpf2.figures.commonFuncs.plotPaCMAP import plot_labels_pacmap
import matplotlib.pyplot as plt
import seaborn as sns
# Create figure with 3 subplots
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
# Plot colored by cell type
pal = sns.color_palette(palette="Set3")
pal = pal.as_hex()
plot_labels_pacmap(X, labelType="celltype", ax=axes[0], color_key=pal)
axes[0].set_title("Cells Colored by Cell Type")
# Plot colored by condition
pal = sns.color_palette("Set2")
pal = [pal[0], pal[1], pal[2]]
pal = [f"#{int(r*255):02x}{int(g*255):02x}{int(b*255):02x}" for r, g, b in pal]
plot_labels_pacmap(X, labelType="condition", ax=axes[1], color_key=pal)
axes[1].set_title("Cells Colored by Condition")
# Plot colored by sample
plot_labels_pacmap(X, labelType="sample", ax=axes[2])
axes[2].set_title("Cells Colored by Sample")
plt.tight_layout()
plt.show()
Interpreting a Component
Once you’ve identified components of interest, dig deeper into the specific cell types and ligand-receptor pairs driving the communication patterns.
Visualize Specific Component Loadings
Use specialized plotting functions to visualize component weightings for a specific component:
from cellcommunicationpf2.figures.commonFuncs.plotPaCMAP import plot_wc_pacmap
import matplotlib.pyplot as plt
# Select component to visualize
component = 6
# Create figure with 2 subplots
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
# Plot sender cell weightings
plot_wc_pacmap(X, component-1, ax[0], factor_matrix="B", cbarMax=0.3)
ax[0].set_title(f"Cmp.{component} Sender Cells")
# Plot receiver cell weightings
plot_wc_pacmap(X, component-1, ax[1], factor_matrix="C", cbarMax=0.3)
ax[1].set_title(f"Cmp.{component} Receiver Cells")
plt.tight_layout()
plt.show()
Violin Plots of Component Weights by Cell Type
Examine the distribution of component weights for specific cell types:
import seaborn as sns
import pandas as pd
# Select a component and cell type of interest
component = 6
cell_type = "mDC"
# Filter cells by cell type
X_celltype = X[X.obs["celltype"] == cell_type]
# Extract sender cell weights for this component
sender_weights = X_celltype.obsm["sc_B"][:, component-1]
# Create violin plot
fig, ax = plt.subplots(figsize=(6, 6))
sns.violinplot(data=sender_weights, ax=ax)
ax.set_ylim(-0.1, max(sender_weights) + 0.1)
ax.set_xlabel(f"{cell_type} Weight Distribution")
ax.set_ylabel(f"Sender Cell Component {component} Association")
ax.set_title(f"Distribution of Component {component} Weights in {cell_type}")
plt.tight_layout()
plt.show()
This visualization reveals whether a component is broadly active across all cells of a type or concentrated in a subset.
Analyze Top LR Pairs per Component
Visualize the most important ligand-receptor pairs for a specific component:
from cellcommunicationpf2.figures.commonFuncs.plotFactors import plot_lr_factors_partial
import matplotlib.pyplot as plt
cmp = 6
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
# Plot top 10 LR pairs
plot_lr_factors_partial(X, cmp, ax[0], geneAmount=10, top=True)
ax[0].set_title(f"Component {cmp}: Top 10 LR Pairs")
# Plot bottom 10 LR pairs
plot_lr_factors_partial(X, cmp, ax[1], geneAmount=10, top=False)
ax[1].set_title(f"Component {cmp}: Bottom 10 LR Pairs")
plt.tight_layout()
plt.show()
Expression Product Heatmaps
Visualize the ligand-receptor expression products between sender and receiver cell populations to validate predicted interactions:
from cellcommunicationpf2.utils import (
expression_product_matrix,
average_product_matrix_ccc
)
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Select component and L-R pair of interest
ccc_rise_cmp = 6
ligand = "CCL19"
receptor = "CCR7"
# Filter and sort sender cells by component weight
X_mdc_sender = X[X.obs["celltype"] == "mDC"]
X_mdc_sender = X_mdc_sender[np.argsort(-X_mdc_sender.obsm["sc_B"][:, ccc_rise_cmp-1])]
# Filter and sort receiver cells by component weight
X_mdc_receiver = X[X.obs["celltype"] == "mDC"]
X_mdc_receiver = X_mdc_receiver[np.argsort(X_mdc_receiver.obsm["rc_C"][:, ccc_rise_cmp-1])]
# Calculate expression product matrix for mDC -> mDC
df = expression_product_matrix(X_mdc_sender, X_mdc_receiver, ligand, receptor)
df = average_product_matrix_ccc(df)
# Create heatmap
fig, ax = plt.subplots(figsize=(6, 6))
sns.heatmap(df, ax=ax, cmap="viridis", vmax=0.12)
ax.set_xlabel("Receiver mDCs")
ax.set_ylabel("Sender mDCs")
ax.set_title("CCL19-CCR7 Interaction")
ax.set_xticks([])
ax.set_yticks([])
plt.tight_layout()
plt.show()
This heatmap shows the product of ligand expression (in senders) and receptor expression (in receivers), revealing which cell pairs have the highest potential for this specific interaction.