Data Preparation

Before training spacer, prepare your spatial transcriptomics dataset in a .h5ad format compatible with Scanpy. This section describes the preprocessing, annotation, and structure requirements for input data.

—

Preprocessing Workflow

Spacer expects a Scanpy AnnData object (adata) as input. You can generate this object from raw count matrices using Scanpy following these preprocessing steps:

import scanpy as sc

# Load raw data (example)
adata = sc.read_10x_h5("sample_filtered_feature_bc_matrix.h5")

# Step 1: Filter low-quality data
sc.pp.filter_cells(adata, min_genes=n)
sc.pp.filter_genes(adata, min_cells=m)

# Step 2: Normalize and log-transform
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# Step 3: Annotate cell types (user-defined)
# e.g., adata.obs['cell_type'] = cell_type_annotation_vector

—

Required Data Structure

Spacer requires the .h5ad object to contain the following essential fields:

Expression matrix (`adata.X`): normalized and log-transformed gene expression values.
Metadata table (`adata.obs`): must contain the following columns:

Column	Description
X, Y	Spatial coordinates (in microns or pixel units) of each cell.
cell_type	Integer-encoded major cell class: - 0 → other cell - 1 → recruiting cell - 2 → engaging cell (customizable, e.g., T/B/macrophage)
<EngagingTag>	Binary indicator for whether the cell at the center of the bag belongs to the target engaging cell type (1) or not (0).

The <EngagingTag> column defines which cells will serve as the center for each neighborhood (“bag”) in spacer. Below is the default mapping used in our study:

mapping = {
    'tcell': 'T',
    'bcell': 'B',
    'macrophage': 'Macrophage',
    'neutrophil': 'Neutrophil',
    'fibroblast': 'Fibroblast',
    'endothelial': 'Endothelial',
}

For example, if you are studying T-cell recruitment, the column name in adata.obs should be “T”, and its values should be 1 for T cells and 0 for all other cells.

—

Customizing the Mapping

In this work, we used the above mapping to ensure consistent annotation across datasets involving multiple stromal and immune cell types. Each key in the mapping corresponds to a general immune or stromal population, while the assigned value (e.g., “T”, “B”, “Macrophage”) serves as a compact label for downstream modeling and visualization.

However, this mapping is fully customizable. Users can freely modify or extend it to match their experimental context or cell annotation schema. For instance, if you are analyzing brain tissues, you could define:

mapping = {
    'microglia': 'Microglia',
    'astrocyte': 'Astrocyte',
    'oligodendrocyte': 'Oligodendrocyte',
}

Spacer will automatically adapt its bag construction and learning process to your new mapping. The only requirement is that the corresponding binary column (e.g., “Microglia”) exists in adata.obs with values 1 for target cells and 0 otherwise.