API Reference
=============

Core Functions
--------------

Preprocessing
~~~~~~~~~~~~~

.. py:function:: prepare_dataset(X: anndata.AnnData, condition_name: str, geneThreshold: float, normalize: bool = False)
   :module: cellcommunicationpf2.import_data

   Preprocess single-cell RNA-seq data for CCC-RISE analysis.
   
   This function performs essential preprocessing steps including cell and gene filtering, 
   normalization, log transformation, and creation of condition indices required for 
   CCC-RISE decomposition.
   
   :param X: AnnData object containing raw count data in sparse matrix format.
   :type X: anndata.AnnData
   :param condition_name: Name of the column in X.obs that specifies experimental conditions for each cell.
   :type condition_name: str
   :param geneThreshold: Minimum mean expression threshold for gene filtering. Genes with mean expression below this value are removed.
   :type geneThreshold: float
   :param normalize: If True, performs normalization and log transformation. If False, keeps raw counts.
   :type normalize: bool, optional
   :returns: Preprocessed AnnData object with filtered cells/genes, added condition_unique_idxs column, and gene means in X.var["means"].
   :rtype: anndata.AnnData

.. py:function:: import_ligand_receptor_pairs(filename: str = "/opt/andrew/ccc/Human-2020-Jin-LR-pairs.csv.zst", update_interaction_names: bool = True)
   :module: cellcommunicationpf2.import_data

   Import ligand-receptor pairs from a compressed CSV file with caching.
   
   Loads a curated database of ligand-receptor pairs from CellChat. The function uses 
   LRU caching to avoid repeated file reads. For protein complexes, subunits are separated 
   by & (e.g., CD74&CD44).
   
   :param filename: Path to the compressed CSV file containing ligand-receptor pairs.
   :type filename: str, optional
   :param update_interaction_names: If True, updates interaction names to use standardized formatting (uppercase, & for complexes).
   :type update_interaction_names: bool, optional
   :returns: DataFrame with ligand and receptor columns containing gene names.
   :rtype: pd.DataFrame

.. py:function:: add_cond_idxs(X: anndata.AnnData, condition_key: str)
   :module: cellcommunicationpf2.import_data

   Add unique 0-indexed condition identifiers to an AnnData object.
   
   Creates a new column condition_unique_idxs in X.obs that maps condition labels 
   to 0-based integer indices. This is required for proper data partitioning in the 
   PARAFAC2 decomposition.
   
   :param X: AnnData object to add condition indices to.
   :type X: anndata.AnnData
   :param condition_key: Column name in X.obs containing condition identifiers.
   :type condition_key: str
   :returns: AnnData object with added condition_unique_idxs column in X.obs.
   :rtype: anndata.AnnData


Factorization
~~~~~~~~~~~~~

.. py:function:: run_ccc_rise_workflow(adata: anndata.AnnData, rise_rank: int, lr_pairs: pd.DataFrame, cp_rank: int = None, condition_column: str = "sample", n_iter_max: int = 100, tol: float = 1e-3, random_state: int = None, complex_sep: str = None, doEmbedding: bool = True, svd_init: str = "svd")
   :module: cellcommunicationpf2.tensor

   Execute the complete CCC-RISE workflow including RISE decomposition, CPD factorization, and result storage.
   
   This is the main function for performing CCC-RISE analysis. It executes both the RISE 
   (PARAFAC2) decomposition of expression data and the CPD decomposition of the resulting 
   interaction tensor. The function decomposes cell-cell communication into four interpretable 
   factor matrices representing conditions, sender cells, receiver cells, and ligand-receptor pairs.
   
   :param adata: AnnData object with preprocessed scRNA-seq data. Must have condition_unique_idxs in adata.obs.
   :type adata: anndata.AnnData
   :param rise_rank: Number of PARAFAC2 components to extract from expression data. Typically chosen based on FMS and R²X analysis.
   :type rise_rank: int
   :param lr_pairs: DataFrame of ligand-receptor pairs with 'ligand' and 'receptor' columns.
   :type lr_pairs: pd.DataFrame
   :param cp_rank: Number of CPD components for factorizing the interaction tensor. If None, defaults to rise_rank.
   :type cp_rank: int, optional
   :param condition_column: Column name in adata.obs containing condition identifiers.
   :type condition_column: str, optional
   :param n_iter_max: Maximum iterations for decomposition.
   :type n_iter_max: int, optional
   :param tol: Convergence tolerance for optimization.
   :type tol: float, optional
   :param random_state: Random seed for reproducibility of decomposition.
   :type random_state: int, optional
   :param complex_sep: Separator for protein complexes in L-R pairs (typically "&").
   :type complex_sep: str, optional
   :param doEmbedding: If True, automatically computes PaCMAP embeddings for visualization and stores in adata.obsm["PaCMAP"].
   :type doEmbedding: bool, optional
   :param svd_init: Initialization method for CPD ('svd' or 'random').
   :type svd_init: str, optional
   :returns: Tuple containing the updated AnnData object with stored results and the R²X value (variance explained).
   :rtype: tuple[anndata.AnnData, float]

.. py:function:: calculate_interaction_tensor(X_filtered: anndata.AnnData, lr_pairs: pd.DataFrame, rise_rank: int)
   :module: cellcommunicationpf2.tensor

   Calculate the interaction tensor from AnnData object using PARAFAC2 and communication scores.
   
   This function performs RISE decomposition and computes cell-cell communication scores for 
   all ligand-receptor pairs across sender-receiver latent cell state pairs. The resulting 
   tensor has dimensions (rise_rank × rise_rank × n_lr_pairs × n_conditions).
   
   :param X_filtered: AnnData object containing preprocessed expression data.
   :type X_filtered: anndata.AnnData
   :param lr_pairs: DataFrame of ligand-receptor pairs with 'ligand' and 'receptor' columns.
   :type lr_pairs: pd.DataFrame
   :param rise_rank: Number of PARAFAC2 components to extract before computing communication scores.
   :type rise_rank: int
   :returns: Interaction tensor with dimensions (rise_rank × rise_rank × n_lr_pairs × n_conditions).
   :rtype: np.ndarray

.. py:function:: run_fms_r2x_analysis(interaction_tensor: np.ndarray, rank_list: list[int] = None, runs: int = 1, svd_init: str = "svd")
   :module: cellcommunicationpf2.tensor

   Run Factor Match Score (FMS) and R²X analysis across different CPD ranks to assess stability and fit.
   
   This function helps determine the optimal CPD rank by evaluating model stability (FMS) 
   through bootstrap resampling and variance explained (R²X). FMS values above 0.6 indicate 
   reliable, reproducible components. Use this before finalizing your CPD rank choice.
   
   :param interaction_tensor: Pre-computed interaction tensor from calculate_interaction_tensor().
   :type interaction_tensor: np.ndarray
   :param rank_list: List of CPD ranks to test (e.g., [1, 3, 5, 7, 9]). If None, defaults to [1, 3].
   :type rank_list: list of int, optional
   :param runs: Number of bootstrap runs for stability assessment.
   :type runs: int, optional
   :param svd_init: Initialization method ('svd' or 'random').
   :type svd_init: str, optional
   :returns: DataFrame with columns ['Run', 'Component', 'FMS', 'R2X'] for each rank and run.
   :rtype: pd.DataFrame


Visualization Functions
-----------------------

Rank Selection
~~~~~~~~~~~~~~

.. py:function:: plot_fms_r2x_diff_ranks(X: anndata.AnnData, condition_name: str, ax1: matplotlib.axes.Axes, ax2: matplotlib.axes.Axes, ranksList: list[int], runs: int)
   :module: cellcommunicationpf2.figures.commonFuncs.plotGeneral

   Plot Factor Match Score (FMS) and R²X across different RISE ranks for rank selection.
   
   This function evaluates multiple RISE ranks by computing FMS (stability) and R²X 
   (variance explained) metrics. It performs bootstrap resampling to assess component 
   reproducibility. Use this to determine the optimal RISE rank before running the full 
   CCC-RISE workflow.
   
   :param X: AnnData object with preprocessed data. Must have condition_unique_idxs in X.obs.
   :type X: anndata.AnnData
   :param condition_name: Column name in X.obs containing condition identifiers for bootstrap resampling.
   :type condition_name: str
   :param ax1: Matplotlib axes object for plotting FMS values.
   :type ax1: matplotlib.axes.Axes
   :param ax2: Matplotlib axes object for plotting R²X values.
   :type ax2: matplotlib.axes.Axes
   :param ranksList: List of RISE ranks to evaluate (e.g., [5, 10, 15, 20, 25, 30, 35, 40]).
   :type ranksList: list of int
   :param runs: Number of bootstrap runs for stability assessment per rank.
   :type runs: int


Factor Plotting
~~~~~~~~~~~~~~~

.. py:function:: plot_condition_factors(data: anndata.AnnData, ax: matplotlib.axes.Axes, cond: str = "Condition", cond_group_labels: pd.Series = None, color_key: list = None, group_cond: bool = False, normalize: bool = False)
   :module: cellcommunicationpf2.figures.commonFuncs.plotFactors

   Plot Factor A (condition factors) as a heatmap showing how conditions contribute to components.
   
   This visualization shows how each experimental condition (rows) contributes to each CCC-RISE 
   component (columns). High values indicate strong association between a condition and a 
   component's communication pattern.

   :param data: AnnData object with stored CCC-RISE results. Must contain data.uns["A"] and data.obs[cond].
   :type data: anndata.AnnData
   :param ax: Matplotlib axes object to plot on.
   :type ax: matplotlib.axes.Axes
   :param cond: Name of column in data.obs containing condition labels.
   :type cond: str, optional
   :param cond_group_labels: Series mapping conditions to group labels for colored row annotations.
   :type cond_group_labels: pd.Series, optional
   :param color_key: Custom colors for condition group labels.
   :type color_key: list, optional
   :param group_cond: If True and cond_group_labels provided, sorts conditions by group.
   :type group_cond: bool, optional
   :param normalize: If True, normalizes each component to [-1, 1] range.
   :type normalize: bool, optional

.. py:function:: plot_eigenstate_factors(data: anndata.AnnData, ax: matplotlib.axes.Axes, factor_type: str)
   :module: cellcommunicationpf2.figures.commonFuncs.plotFactors

   Plot Factor B (sender eigenstates) or Factor C (receiver eigenstates) as a heatmap.
   
   Eigenstate factors represent the underlying cell state patterns across components in the 
   latent RISE space. Each row represents a latent dimension from RISE and each column 
   represents a CCC-RISE component.
   
   :param data: AnnData object with stored CCC-RISE results. Must contain data.uns["B"] or data.uns["C"].
   :type data: anndata.AnnData
   :param ax: Matplotlib axes object to plot on.
   :type ax: matplotlib.axes.Axes
   :param factor_type: Either "B" for sender eigenstates or "C" for receiver eigenstates.
   :type factor_type: str

.. py:function:: plot_lr_factors(data: anndata.AnnData, ax: matplotlib.axes.Axes, trim: bool = True, weight: float = 0.08)
   :module: cellcommunicationpf2.figures.commonFuncs.plotFactors

   Plot Factor D (ligand-receptor pairs) as a heatmap showing which L-R pairs drive each component.
   
   This visualization reveals coordinated signaling programs by showing which ligand-receptor 
   pairs (rows) are highly weighted in each component (columns). Only L-R pairs with maximum 
   absolute weight above the threshold are displayed.
   
   :param data: AnnData object with stored CCC-RISE results. Must contain data.uns["D"] and data.uns["lr_pairs"].
   :type data: anndata.AnnData
   :param ax: Matplotlib axes object to plot on.
   :type ax: matplotlib.axes.Axes
   :param trim: If True, filters L-R pairs based on the weight parameter.
   :type trim: bool, optional
   :param weight: Minimum absolute weight threshold for including L-R pairs.
   :type weight: float, optional

.. py:function:: plot_lr_factors_partial(X: anndata.AnnData, cmp: int, ax: matplotlib.axes.Axes, geneAmount: int = 5, top: bool = True)
   :module: cellcommunicationpf2.figures.commonFuncs.plotFactors

   Plot the top or bottom weighted ligand-receptor pairs for a specific component as a bar plot.
   
   This visualization identifies the most positively or negatively weighted L-R pairs for a 
   single component, revealing which specific interactions are most associated with that 
   communication pattern.
   
   :param X: AnnData object with stored CCC-RISE results. Must contain X.uns["D"] and X.uns["lr_pairs"].
   :type X: anndata.AnnData
   :param cmp: Component number to visualize (1-indexed).
   :type cmp: int
   :param ax: Matplotlib axes object to plot on.
   :type ax: matplotlib.axes.Axes
   :param geneAmount: Number of L-R pairs to display.
   :type geneAmount: int, optional
   :param top: If True, shows highest-weighted pairs; if False, shows lowest-weighted pairs.
   :type top: bool, optional


PaCMAP Visualization
~~~~~~~~~~~~~~~~~~~~

.. py:function:: plot_labels_pacmap(X: anndata.AnnData, labelType: str, ax: matplotlib.axes.Axes, condition: list = None, cmap: str = "tab20", color_key: list = None)
   :module: cellcommunicationpf2.figures.commonFuncs.plotPaCMAP

   Plot PaCMAP embedding colored by categorical labels (cell type or condition).
   
   This visualization shows the overall structure of the cell embedding in the latent 
   communication space, revealing how cells cluster by cell type or experimental condition 
   based on their communication patterns.
   
   :param X: AnnData object with RISE decomposition results. Must contain X.obsm["PaCMAP"] and X.obs[labelType].
   :type X: anndata.AnnData
   :param labelType: Name of column in X.obs containing categorical labels to color by.
   :type labelType: str
   :param ax: Matplotlib axes object to plot on.
   :type ax: matplotlib.axes.Axes
   :param condition: If provided, only highlights cells from these specific conditions.
   :type condition: list of str, optional
   :param cmap: Matplotlib colormap name for coloring categories.
   :type cmap: str, optional
   :param color_key: Custom list of colors for categories.
   :type color_key: list, optional

.. py:function:: plot_wc_pacmap(X: anndata.AnnData, cmp: int, ax: matplotlib.axes.Axes, cbarMax: float = 1.0, factor_matrix: str = None)
   :module: cellcommunicationpf2.figures.commonFuncs.plotPaCMAP

   Plot PaCMAP embedding colored by weighted projections for a specific component.
   
   This visualization shows which cells contribute most strongly to a specific component by 
   coloring them according to their sender (Factor B) or receiver (Factor C) weights. Higher 
   values indicate stronger association with the component's communication pattern.
   
   :param X: AnnData object with RISE decomposition results. Must contain X.obsm["PaCMAP"] and X.obsm["sc_B"] or X.obsm["rc_C"].
   :type X: anndata.AnnData
   :param cmp: Component number to visualize (1-indexed).
   :type cmp: int
   :param ax: Matplotlib axes object to plot on.
   :type ax: matplotlib.axes.Axes
   :param cbarMax: Maximum value for the color scale.
   :type cbarMax: float, optional
   :param factor_matrix: Either "B" for sender weights or "C" for receiver weights.
   :type factor_matrix: str


Analysis Functions
~~~~~~~~~~~~~~~~~~

.. py:function:: expression_product_matrix(X1: anndata.AnnData, X2: anndata.AnnData, ligand: str, receptor: str)
   :module: cellcommunicationpf2.utils

   Calculate the expression product matrix for a specific ligand-receptor pair between cell populations.
   
   For each cell in X1 (senders) and each cell in X2 (receivers), this computes the product 
   of ligand expression (in sender) and receptor expression (in receiver). This represents 
   the potential for that specific L-R interaction between each sender-receiver cell pair.
   
   :param X1: AnnData object containing sender cells.
   :type X1: anndata.AnnData
   :param X2: AnnData object containing receiver cells.
   :type X2: anndata.AnnData
   :param ligand: Ligand gene name (must be present in X1.var_names).
   :type ligand: str
   :param receptor: Receptor gene name (must be present in X2.var_names).
   :type receptor: str
   :returns: DataFrame with sender cells as rows and receiver cells as columns, values are expression products.
   :rtype: pd.DataFrame

.. py:function:: average_product_matrix_ccc(df: pd.DataFrame)
   :module: cellcommunicationpf2.utils

   Reduce an expression product matrix to a 10×10 matrix by binning rows and columns and averaging.
   
   This function groups rows and columns into 10 bins each and computes the mean within 
   each bin, creating a summarized heatmap suitable for visualization of large cell-cell 
   interaction matrices.
   
   :param df: Expression product matrix from expression_product_matrix().
   :type df: pd.DataFrame
   :returns: 10×10 DataFrame with averaged expression products.
   :rtype: pd.DataFrame