SPAtial CEllular Graphical Modeling (SPACE-GM) is a graph neural network developed and published by Enable Medicine and collaborators [1]. Methods are detailed in the manuscript [1], and the code used to construct spatial cellular graphs and analyze the model outputs is available in a GitLab repository [2]. Below we provide information on how we apply SPACE-GM for the purpose of patient stratification within the Insight Reports.
Cell centroids (based on segmentation results) and cell type annotations are used to create cellular graphs of each tissue sample, in which each cell is represented by a node and spatial neighbor relationships are represented by edges.
These cellular graphs are sampled to create a collection of 3-hop neighborhood subgraphs (microenvironments). In addition, a physical distance constraint is applied such that cells beyond 75 µm from a given subgraph’s center cell are excluded from that subgraph.
Node features include the cell type and edge features include spatial distance between cells and their edge type as defined in the SPACE-GM manuscript [1].
Since our goal is to identify microenvironments that stratify the study cohort, we train SPACE-GM on microenvironments using all samples. SPACE-GM is trained to predict the provided clinical metadata feature as a graph-level label, based on each graph’s node and edge features.
We evaluate the model every 3,000 iterations during training. For evaluation, per-node predictions within the same sample are mean-aggregated over the entire sample’s graph to compute a sample-level prediction. To avoid overfitting, training is stopped when the graph classification accuracy no longer improves. The model weights at this point are used for clustering of microenvironment embeddings.
For clustering, we randomly sample and extract a subset of microenvironments, denoted as the reference dataset. SPACE-GM is applied to the reference dataset, from which reference embeddings and reference predictions are collected.
We fit a PCA model to the reference embeddings, and extract the top 20 principal components (PCs). We then apply K-means clustering on these PCs, where K is selected to maximize the silhouette score.
The PCA and K-means models that were fitted to the reference dataset are then applied to the full dataset to label all microenvironments.
We compute cell type composition and global abundance of the microenvironment clusters as described in the SPACE-GM manuscript [1].
To compute cell type adjacency for each microenvironment cluster, the pairwise interaction counts (cell types for each pair of nodes connected by an edge) were aggregated across all subgraphs of that cluster.