Distance from Tumor Boundary Analysis

When supplied with a mask or image annotation indicating the location of tumor areas in an image, we can compute the distance of each cell from the boundary of that mask. This can help inform degree and depth of infiltration for immune cells, or act as a covariate alongside other phenotypic measures.

To compute this value, the tissue annotation is first transformed into a binary mask, where tumor areas are given a value of 1 and non-tumor areas a value of 0. The function distance_transform_edt from the library scipy can then be used to compute a euclidean distance transform. This will create a mapping for each point outside the tumor area, indicating its distance to the nearest tumor area. The original mask can then be inverted (setting tumor areas to 0 and non-tumor areas to 1) and distance_transform_edt run again to compute for each point inside a tumor area its distance to the nearest non-tumor area.

After calculating point-by-point distances, the distance of each cell is determined by applying the distance calculated at the location of that cell’s centroid (derived from the whole cell segmentation mask).

To normalize for variability in tissue area for each unit of distance away from these boundaries, we divide the tissue into 20 µm wide bands and compute the total tissue area at that distance. For each cell phenotype in a provided set of phenotypes, we compute the density of that phenotype at that distance as $cells/mm^2$. Samples with too few cells of a given phenotype, and tissue bands with too small of an area will be excluded from this analysis.

With each cell’s distance to the nearest tumor boundary quantified, we can proceed with three analyses:

We can transform distances to tumor margin for cells inside the tumor to negative values. With this transformation done, we can visualize the density of each cell phenotype within each band of distance from the tumor margin on a single density curve for each sample.
For each annotated cell type, within each sample, we will calculate the average distance outside and inside the tumor as a weighted average across bands:

$$ avg\ dist = \dfrac{\sum_{b \in B} \dfrac{n_b * dist_b}{area_b}}{\sum_{b \in B} \dfrac{n_b}{area_b}}

\\

\begin{align*} \text{where} \\

&b \in B\quad\quad\quad\quad\ \text{is each band } b \text{ in the set of bands } B \text{,}\\

&n_b \quad\quad\quad\quad\quad\ \ \ \text{is the number of cells in band } b \text{,}\\

&dist_b \quad\quad\quad\quad\hspace{2mm} \text{is the center point of distance enclosed by band } b \text{, and}\\

&area_b\quad\quad\quad\quad\ \text{is the area enclosed by band } b. \end{align*} $$

For datasets with multiple samples, a maximum threshold will be set on the distances considered for this weighted average, such that all samples have the same range of distances included. This prevents skewing of the results driven by variation in image or tissue size.
If a metadata trait specifying cohort definitions is provided, the result of a statistical comparison between groups will also be reported for each of these average distances:
- For cohort definitions with two groups, the result of a two sample t-test will be reported.
- For cohort definitions with more than two groups, the result of a one-way ANOVA will be reported, with a Tukey’s post-hoc test performed to identify significant pairwise differences following significant ANOVA results.
- All reported p-values will be corrected for multiple hypotheses using a Benjamini-Hochberg correction.