| Title: | Swiss-army toolkit for selecting niche fronts and invasive margins in spatial transcriptomics data |
|---|---|
| Description: | Battlefield is a Swiss-army toolkit originally developed to define and extract spatial spots from specific tissue regions—such as front regions, niche borders, invasive margins, and cluster interfaces—using spatial transcriptomics data or clustered tissue maps. It has since been extended to support trajectory selection and layer inspection, and now provides a collection of low-level utilities for spatial transcriptomics analysis. These utilities are primarily intended to be reused within higher-level analytical packages. It is designed to work with sequencing-based platforms such as Visium at several resolutions and Visium HD(binned). |
| Authors: | Jean-Philippe Villemin [aut, cre] (ORCID: <https://orcid.org/0000-0002-1838-5880>), European Research Council [fnd] (ERC-2022) |
| Maintainer: | Jean-Philippe Villemin <[email protected]> |
| License: | CeCILL | file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-29 08:45:03 UTC |
| Source: | https://github.com/bioc/Battlefield |
This function takes spot selection results (either border or core spots) and adds them as annotations in the colData of a SpatialExperiment object.
add_borders_to_spe(spe, border = NULL, core = NULL, erase = FALSE)add_borders_to_spe(spe, border = NULL, core = NULL, erase = FALSE)
spe |
A SpatialExperiment object from which the selection dataframes were derived. |
border |
Optional data.frame of border spots. Can be output from either [select_border_spots()] or [build_all_borders()]. Expected columns: spot_id, interface, mode. |
core |
Optional data.frame of core spots. Can be output from either [select_core_spots()] or [build_all_cores()]. Expected columns: spot_id, interface, mode. |
erase |
Logical. If 'TRUE', erase pre-existing battlefield columns before adding new ones. If 'FALSE' (default), issue a warning if columns already exist and skip adding them. |
At least 'border' must be provided. 'core' is optional. If 'core' is provided without 'border', an error is raised.
The following unified columns are added: - 'is_border': logical, TRUE if spot is a border spot, FALSE or NA otherwise - 'is_core': logical, TRUE if spot is a core spot, FALSE or NA otherwise - 'interface': the target interface cluster - 'border_mode': character, "inner", "outer", or NA (from the 'mode' column in input data)
The SpatialExperiment object with updated colData.
data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Using build_all_borders and build_all_cores all_borders <- build_all_borders(df, k = 6) all_cores <- build_all_cores(df, all_borders, mode = "inner") # Or using individual select functions border_3_to_4 <- select_border_spots(df, cluster = 3, interface = 4, k = 6) core_3_to_4 <- select_core_spots(df, all_borders, cluster = 3, interface = 4)data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Using build_all_borders and build_all_cores all_borders <- build_all_borders(df, k = 6) all_cores <- build_all_cores(df, all_borders, mode = "inner") # Or using individual select functions border_3_to_4 <- select_border_spots(df, cluster = 3, interface = 4, k = 6) core_3_to_4 <- select_core_spots(df, all_borders, cluster = 3, interface = 4)
This function takes layer classification results (border, intermediate, core) and adds them as annotations in the colData of a SpatialExperiment object.
add_layers_to_spe(spe, layer = NULL, erase = FALSE)add_layers_to_spe(spe, layer = NULL, erase = FALSE)
spe |
A SpatialExperiment object from which the layer dataframes were derived. |
layer |
A data.frame of layer classifications. Can be output from either [create_cluster_layers()] or [create_all_layers()]. Expected columns: spot_id, layer. |
erase |
Logical. If 'TRUE', erase pre-existing layer columns before adding new ones. If 'FALSE' (default), issue a warning if columns already exist and skip adding them. |
The 'layer' parameter must be provided and contain at least one row.
The following column is added: - 'layer': character, one of "border", "intermediate", "core", or NA
The SpatialExperiment object with updated colData.
data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Or for a single cluster cluster_1_layers <- create_cluster_layers(df, target_cluster = 1, k = 6) spe <- add_layers_to_spe(spe, layer = cluster_1_layers)data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Or for a single cluster cluster_1_layers <- create_cluster_layers(df, target_cluster = 1, k = 6) spe <- add_layers_to_spe(spe, layer = cluster_1_layers)
This function takes trajectory results (e.g., from [build_similar_trajectories()]) and adds trajectory metadata to the colData of a SpatialExperiment object.
add_trajectories_to_spe(spe, trajectory = NULL, erase = FALSE)add_trajectories_to_spe(spe, trajectory = NULL, erase = FALSE)
spe |
A SpatialExperiment object from which the trajectory dataframes were derived. |
trajectory |
A data.frame of trajectory spots. Expected output from [build_similar_trajectories()]. Expected columns: spot_id, trajectory_id, offset, pos_on_seg, dist_to_seg. |
erase |
Logical. If 'TRUE', erase pre-existing trajectory columns before adding new ones. If 'FALSE' (default), issue a warning if columns already exist and skip adding them. |
The 'trajectories' parameter must be provided and contain at least one row.
The following columns are added: - 'trajectory_id': character, identifier for each trajectory (e.g., "main", "left_1", "right_2") - 'offset': numeric, the offset distance used to build each trajectory line - 'pos_on_seg': numeric in [0, 1], the position along the trajectory - 'dist_to_seg': numeric, the distance to the original segment
The SpatialExperiment object with updated colData.
data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) centroids <- compute_centroids(df) A <- centroids[centroids$cluster == 1, c("x", "y")] B <- centroids[centroids$cluster == 3, c("x", "y")] trajs <- build_similar_trajectories(df, A, B, top_n = 10, n_extra = 1, side = "both") spe <- add_trajectories_to_spe(spe, trajectory = trajs)data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) centroids <- compute_centroids(df) A <- centroids[centroids$cluster == 1, c("x", "y")] B <- centroids[centroids$cluster == 3, c("x", "y")] trajs <- build_similar_trajectories(df, A, B, top_n = 10, n_extra = 1, side = "both") spe <- add_trajectories_to_spe(spe, trajectory = trajs)
Given an endpoint (typically the start or end of a previously selected path),
this function computes a target point located one perpendicular step away
from the endpoint, on the "left" or "right" side relative to the directed
segment . It then returns the closest spot to that
target
among the remaining candidates 'df_rest'.
adjacent_endpoint(df_rest, endpoint, A, B, spacing, side = c("left", "right"))adjacent_endpoint(df_rest, endpoint, A, B, spacing, side = c("left", "right"))
df_rest |
A data frame of candidate spots containing at least columns 'x' and 'y'. |
endpoint |
A one-row data frame with columns 'x' and 'y' representing the endpoint from which to step sideways. If it has multiple rows, only the first row is used. |
A |
A data frame with columns 'x' and 'y' representing point A defining the direction of the segment. Only the first row is used. |
B |
A data frame with columns 'x' and 'y' representing point B defining the direction of the segment. Only the first row is used. |
spacing |
Numeric scalar; step size (in the same coordinate units as 'x'/'y') used to move perpendicularly from 'endpoint'. |
side |
Character; which side to step to relative to the vector
|
The unit direction vector is computed from and a unit
left normal is derived. The target point is:
where 'sign = +1' for '"left"' and '-1' for '"right"'.
The closest spot is selected with closest_spot() using squared
Euclidean distance.
A one-row data frame (same columns as 'df_rest') corresponding to the spot closest to the computed perpendicular target point.
df <- data.frame(x = c(0, 1, 1, 2, 2), y = c(0, 0, 1, 0, 1), id = 1:5) A <- data.frame(x = 0, y = 0) B <- data.frame(x = 2, y = 0) endpoint <- data.frame(x = 1, y = 0) # Step "left" of A->B (here, toward positive y) adjacent_endpoint(df, endpoint, A, B, spacing = 1, side = "left") # Step "right" of A->B (here, toward negative y) adjacent_endpoint(df, endpoint, A, B, spacing = 1, side = "right")df <- data.frame(x = c(0, 1, 1, 2, 2), y = c(0, 0, 1, 0, 1), id = 1:5) A <- data.frame(x = 0, y = 0) B <- data.frame(x = 2, y = 0) endpoint <- data.frame(x = 1, y = 0) # Step "left" of A->B (here, toward positive y) adjacent_endpoint(df, endpoint, A, B, spacing = 1, side = "left") # Step "right" of A->B (here, toward negative y) adjacent_endpoint(df, endpoint, A, B, spacing = 1, side = "right")
Generates integer grid coordinates along the line segment connecting two points using Bresenham's line algorithm.
bresenham_line(p0, p1, snap = TRUE)bresenham_line(p0, p1, snap = TRUE)
p0 |
A 'data.frame' with at least columns 'x' and 'y'. If it contains multiple rows, only the first row is used. |
p1 |
A 'data.frame' with at least columns 'x' and 'y'. If it contains multiple rows, only the first row is used. |
snap |
Logical; if 'TRUE' (default), 'p0' and 'p1' coordinates are rounded to the nearest integer before running the algorithm. If 'FALSE', coordinates are truncated via 'as.integer()'. |
The function:
Validates inputs are data frames with 'x'/'y' columns and at least one row.
Converts endpoints to integer coordinates (optionally rounding first).
Applies the classic Bresenham algorithm to enumerate grid points.
This is useful for tracing discrete paths on an integer lattice (e.g., pixels, tile grids, spatial transcriptomics spot grids).
A 'data.frame' with columns 'x' and 'y' giving the integer grid points visited by the line, including both endpoints, in traversal order from 'p0' to 'p1'.
p0 <- data.frame(x = 1.2, y = 2.7) p1 <- data.frame(x = 7.9, y = 5.1) # Default (snap = TRUE): rounds endpoints first bresenham_line(p0, p1) # No rounding (snap = FALSE): integer coercion truncates toward zero bresenham_line(p0, p1, snap = FALSE)p0 <- data.frame(x = 1.2, y = 2.7) p1 <- data.frame(x = 7.9, y = 5.1) # Default (snap = TRUE): rounds endpoints first bresenham_line(p0, p1) # No rounding (snap = FALSE): integer coercion truncates toward zero bresenham_line(p0, p1, snap = FALSE)
This function iterates over all **ordered** cluster pairs (A -> B) and returns a single data.frame containing the border spots for each interface, as computed by a border-detection function (e.g. [select_border_spots()]).
build_all_borders( df, k = 6, max_dist = NULL, mode = "both", pairs = NULL, coord_cols = c("x", "y"), cluster_col = "cluster" )build_all_borders( df, k = 6, max_dist = NULL, mode = "both", pairs = NULL, coord_cols = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least coordinate columns and a cluster label column. |
k |
Integer. Number of nearest neighbors to consider (excluding self) when computing borders. Default is 6. |
max_dist |
Optional numeric. Maximum Euclidean distance for neighbors to be considered. If 'NULL', no distance filtering is applied. |
mode |
Character. One of "inner", "outer", or "both". Controls which border direction(s) to select for each pair. Default is "both". |
pairs |
Optional data.frame of oriented cluster pairs, typically produced by [directed_cluster_interface_pairs()]. Must contain 'cluster' and 'interface' columns. If 'NULL', it is computed from 'df[[cluster_col]]'. |
coord_cols |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
cluster_col |
Character. Name of the column containing cluster labels. Default is '"cluster"'. |
If 'pairs_df' is not provided, it is generated from the cluster labels using [directed_cluster_interface_pairs()].
A data.frame produced by row-binding the result of border selection for each oriented pair. Typically contains the original columns of 'df' plus interface annotation columns from the border selector.
**Note on mode column**: The 'mode' column in the returned data.frame will always contain either "inner" or "outer", never "both". Even when the 'mode' parameter is set to "both", each border spot row is labeled with its actual direction (cluster→interface is "inner", interface→cluster is "outer").
# Example with synthetic data set.seed(1) df_ex <- data.frame( x = rnorm(200), y = rnorm(200), cluster = sample(c("A","B","C"), 200, replace = TRUE) ) all_borders <- build_all_borders(df_ex, k = 6) head(all_borders)# Example with synthetic data set.seed(1) df_ex <- data.frame( x = rnorm(200), y = rnorm(200), cluster = sample(c("A","B","C"), 200, replace = TRUE) ) all_borders <- build_all_borders(df_ex, k = 6) head(all_borders)
This function iterates over all **ordered** cluster pairs (A -> B) and returns a single data.frame containing the core (control) spots for each pair, as computed by [select_core_spots()].
build_all_cores( df, border_df, mode = "both", pairs = NULL, coord_cols = c("x", "y"), cluster_col = "cluster" )build_all_cores( df, border_df, mode = "both", pairs = NULL, coord_cols = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least coordinate columns and a cluster label column. |
border_df |
A data.frame of border spots from [build_all_borders()]. |
mode |
Character. One of "inner", "outer", or "both". Controls which mode values from border_df to use when counting border spots. Default is "both". |
pairs |
Optional data.frame of oriented cluster pairs, typically produced by [directed_cluster_interface_pairs()]. Must contain 'cluster' and 'interface' columns. If 'NULL', it is computed from 'df[[cluster_col]]'. |
coord_cols |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
cluster_col |
Character. Name of the column containing cluster labels. Default is '"cluster"'. |
If 'pairs' is not provided, it is generated from the cluster labels using [directed_cluster_interface_pairs()].
A data.frame produced by row-binding the result of core spot selection for each oriented pair. Typically contains the original columns of 'df' plus annotation columns (is_core, interface, mode) from the core spot selector.
**Note on mode column**: The 'mode' column in the returned data.frame reflects which border modes were used for each core spot selection: "inner", "outer", or "both". This differs from 'build_all_borders()' which only returns "inner" or "outer".
# Example with synthetic data set.seed(1) df_ex <- data.frame( spot_id = paste0("spot_", seq_len(200)), x = rnorm(200), y = rnorm(200), cluster = sample(c("A","B","C"), 200, replace = TRUE) ) all_borders <- build_all_borders(df_ex, k = 6) all_cores <- build_all_cores(df_ex, all_borders, mode = "both") head(all_cores)# Example with synthetic data set.seed(1) df_ex <- data.frame( spot_id = paste0("spot_", seq_len(200)), x = rnorm(200), y = rnorm(200), cluster = sample(c("A","B","C"), 200, replace = TRUE) ) all_borders <- build_all_borders(df_ex, k = 6) all_cores <- build_all_cores(df_ex, all_borders, mode = "both") head(all_cores)
Selects spots near the segment defined by endpoints 'A' and 'B' using 'build_one_trajectory()', then orders the selected spots by their projection parameter (from 'A' to 'B'). The result is returned as a data frame.
build_one_line(df_rest, A, B, top_n = 19, max_dist = NULL)build_one_line(df_rest, A, B, top_n = 19, max_dist = NULL)
df_rest |
A data frame of candidate spots containing at least columns 'x' and 'y'. |
A |
A data frame with columns 'x' and 'y' defining endpoint A of the segment. If it contains multiple rows, only the first row is used. |
B |
A data frame with columns 'x' and 'y' defining endpoint B of the segment. If it contains multiple rows, only the first row is used. |
top_n |
Integer; number of closest spots to keep (default '19'). If 'NULL', no top-N truncation is applied. |
max_dist |
Numeric; if not 'NULL', only spots with distance to the segment '<= max_dist' are kept. |
This function is a thin wrapper around 'build_one_trajectory()' that returns only the ordered 'selected' data frame (and not the segment metadata).
It uses the base R pipe '|>' and 'dplyr::arrange()'.
A data frame of selected spots (subset of 'df_rest') ordered along the segment (in increasing 'pos_on_seg'). The output includes the extra columns added by 'build_one_trajectory()' (typically 'dist_to_seg' and 'pos_on_seg').
df <- data.frame( x = c(0, 1, 2, 3, 4, 2), y = c(0, 0, 0, 0, 0, 1), id = 1:6 ) A <- data.frame(x = 0, y = 0) B <- data.frame(x = 4, y = 0) build_one_line(df, A, B, top_n = 5)df <- data.frame( x = c(0, 1, 2, 3, 4, 2), y = c(0, 0, 0, 0, 0, 1), id = 1:6 ) A <- data.frame(x = 0, y = 0) B <- data.frame(x = 4, y = 0) build_one_line(df, A, B, top_n = 5)
Computes the distance from each spot in 'df' to the segment '[p0, p1]', optionally filters by a maximum distance, keeps the 'top_n' closest spots, and finally orders the retained spots by their projection position 'pos_on_seg' along the segment (from 'p0' to 'p1'). This is the foundation function for building trajectory lines across spatial spots.
build_one_trajectory(df, p0, p1, top_n = 100, max_dist = NULL)build_one_trajectory(df, p0, p1, top_n = 100, max_dist = NULL)
df |
A data frame of spots containing at least columns 'x' and 'y'. Additional columns are preserved in the output. |
p0 |
A data frame with columns 'x' and 'y' defining the first endpoint of the segment. If it has multiple rows, only the first row is used. |
p1 |
A data frame with columns 'x' and 'y' defining the second endpoint of the segment. If it has multiple rows, only the first row is used. |
top_n |
Integer; number of closest spots to keep (default '100'). If 'NULL', no top-N truncation is applied. |
max_dist |
Numeric; if not 'NULL', only spots with distance to the segment '<= max_dist' are kept. |
Distances are computed using point_segment_distance_vec().
Ordering uses the projection parameter
clamped to '[0, 1]'.
This function uses dplyr verbs ('mutate', 'filter', 'arrange',
'slice_head') via the '|>' pipe.
A data frame containing the selected spots, with two extra columns:
Euclidean distance from the spot to the segment.
Clamped projection parameter in '[0, 1]' indicating position along the segment (0 at 'p0', 1 at 'p1').
The rows are ordered by 'pos_on_seg' (i.e., from 'p0' to 'p1').
# Minimal example with base data.frame inputs df <- data.frame( x = c(0, 1, 2, 3, 4), y = c(0, 1, 1, 2, 4), id = letters[1:5] ) p0 <- data.frame(x = 0, y = 0) p1 <- data.frame(x = 4, y = 0) # Keep 3 closest spots (no distance threshold) res <- build_one_trajectory(df, p0, p1, top_n = 3) head(res)# Minimal example with base data.frame inputs df <- data.frame( x = c(0, 1, 2, 3, 4), y = c(0, 1, 1, 2, 4), id = letters[1:5] ) p0 <- data.frame(x = 0, y = 0) p1 <- data.frame(x = 4, y = 0) # Keep 3 closest spots (no distance threshold) res <- build_one_trajectory(df, p0, p1, top_n = 3) head(res)
Constructs a central line of spots near the segment defined by endpoints 'A' and 'B', then builds additional parallel lines on the left, right, or both sides by translating the segment along its left unit normal. After each line is built, its spots are removed from the remaining pool to avoid reuse.
build_similar_trajectories( df, A, B, top_n = 19, n_extra = 2, side = c("left", "right", "both"), lane_width_factor = 1.15, max_dist = NULL )build_similar_trajectories( df, A, B, top_n = 19, n_extra = 2, side = c("left", "right", "both"), lane_width_factor = 1.15, max_dist = NULL )
df |
A data frame of candidate spots containing at least columns 'x' and 'y'. Additional columns are preserved. |
A |
A data frame with columns 'x' and 'y' defining endpoint A of the central segment. If it contains multiple rows, only the first row is used. |
B |
A data frame with columns 'x' and 'y' defining endpoint B of the central segment. If it contains multiple rows, only the first row is used. |
top_n |
Integer; number of closest spots to keep per line (default '19'). If 'NULL', no top-N truncation is applied in the underlying selection. |
n_extra |
Integer; number of additional lines to build on each requested side (default '2'). For example, 'n_extra = 2' with 'side = "both"' yields 1 center line + 2 left + 2 right. |
side |
Character; which side(s) to build relative to the vector
|
lane_width_factor |
Numeric scalar; multiplier applied to the estimated spot spacing to obtain the inter-line offset (default '1.15'). |
max_dist |
Numeric; if not 'NULL', only spots with distance to each (translated) segment '<= max_dist' are considered when building each line. |
The inter-line distance is computed as:
where 'spacing' is the median nearest-neighbor distance estimated from 'df'.
Lines are generated by shifting both endpoints 'A' and 'B' by 'offset * n',
where 'n' is the left unit normal of . Positive offsets
correspond to the left side; negative offsets correspond to the right side.
This function relies on helpers such as 'estimate_spot_spacing()', 'unit_normal_left()', 'shift_point()', 'build_one_line()', and 'remove_used_points()'.
The implementation uses the base R pipe '|>' and 'dplyr::mutate()' / 'dplyr::bind_rows()'. Ensure 'dplyr' is installed.
A data frame containing all selected spots from all lines, stacked together, with additional columns 'trajectory_id' (e.g., '"main"', '"left_1"', '"right_1"') and 'offset' (signed offset used for that line).
df <- data.frame( x = rep(1:10, each = 3), y = rep(1:3, times = 10), id = seq_len(30) ) A <- data.frame(x = 1, y = 2) B <- data.frame(x = 10, y = 2) out <- build_similar_trajectories(df, A, B, top_n = 5, n_extra = 1, side = "both") head(out) unique(out$trajectory_id)df <- data.frame( x = rep(1:10, each = 3), y = rep(1:3, times = 10), id = seq_len(30) ) A <- data.frame(x = 1, y = 2) B <- data.frame(x = 10, y = 2) out <- build_similar_trajectories(df, A, B, top_n = 5, n_extra = 1, side = "both") head(out) unique(out$trajectory_id)
Returns the row of 'df' corresponding to the spot nearest to the target coordinates '(tx, ty)' using squared Euclidean distance.
closest_spot(df, tx, ty)closest_spot(df, tx, ty)
df |
A data frame containing at least numeric columns 'x' and 'y' representing spot coordinates. |
tx |
Numeric scalar; target x-coordinate. |
ty |
Numeric scalar; target y-coordinate. |
The function minimizes '(x - tx)^2 + (y - ty)^2'. Squared distances are used for efficiency; taking the square root is unnecessary for argmin.
If multiple spots are tied for the minimum distance, the first occurrence is returned (as per 'which.min()').
A one-row data frame (same columns as 'df') corresponding to the closest spot. The result is always a data frame ('drop = FALSE').
df <- data.frame(x = c(0, 2, 5), y = c(0, 2, 1), id = c("a", "b", "c")) closest_spot(df, tx = 1.9, ty = 2.1)df <- data.frame(x = c(0, 2, 5), y = c(0, 2, 1), id = c("a", "b", "c")) closest_spot(df, tx = 1.9, ty = 2.1)
Computes the centroid of each cluster by taking the mean of the 'x' and 'y' coordinates for each 'cluster' level.
compute_centroids(df)compute_centroids(df)
df |
A 'data.frame' containing at least the columns 'cluster', 'x', and 'y'. 'cluster' can be any type accepted by 'aggregate()' grouping (e.g., integer, factor, character). 'x' and 'y' must be numeric. |
This is a thin wrapper around aggregate using
FUN = mean. Missing values are handled according to ‘mean()'’s default
behavior (i.e., 'NA's will propagate unless you pre-handle them).
A 'data.frame' with one row per cluster and columns:
Cluster identifier.
Mean x-coordinate for the cluster.
Mean y-coordinate for the cluster.
df <- data.frame( cluster = c(1, 1, 2, 2), x = c(0, 2, 10, 12), y = c(1, 3, 5, 7) ) compute_centroids(df)df <- data.frame( cluster = c(1, 1, 2, 2), x = c(0, 2, 10, 12), y = c(1, 3, 5, 7) ) compute_centroids(df)
This function counts inlaid/annotation composition within each cluster in a single call. It returns a dataframe with counts of inlaid types for each source cluster.
count_all_inlaids( df, clusters = NULL, inlaid_col = "cluster", cluster_col = "cluster" )count_all_inlaids( df, clusters = NULL, inlaid_col = "cluster", cluster_col = "cluster" )
df |
A data.frame containing at least the columns 'x', 'y', cluster identifier column, and 'spot_id'. - 'x', 'y': numeric coordinates of spots - cluster column (name specified by 'cluster_col'): cluster assignment for #' each spot - inlaid column (name specified by 'inlaid_col'): inlaid/annotation for each spot - 'spot_id': unique identifier for each spot |
clusters |
Optional vector of cluster labels to process. If 'NULL', all unique clusters in 'df' are used. |
inlaid_col |
Character. Name of the column containing inlaid/annotation values. Default is '"cluster"'. |
cluster_col |
Character. Name of the column containing cluster assignments. Default is '"cluster"'. |
A data.frame with columns:
The source cluster being analyzed.
The inlaid/annotation type.
Number of spots with this inlaid type in the cluster.
Proportion of this inlaid type among all spots in cluster.
Rows are grouped by source cluster and sorted by count within each group (descending).
data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster, inlaid = sample(paste0("type_", 1:3), length(colnames(spe)), replace = TRUE) ) # Get inlaid statistics for all clusters all_inlaids <- count_all_inlaids(df, inlaid_col = "inlaid") head(all_inlaids)data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster, inlaid = sample(paste0("type_", 1:3), length(colnames(spe)), replace = TRUE) ) # Get inlaid statistics for all clusters all_inlaids <- count_all_inlaids(df, inlaid_col = "inlaid") head(all_inlaids)
This function computes neighborhood statistics for all clusters in a dataset in a single call. It returns a dataframe with counts of neighboring cluster types for each source cluster.
count_all_neighborhoods( df, clusters = NULL, k = 100, max_dist = NULL, inlaid_col = NULL, coords = c("x", "y"), cluster_col = "cluster" )count_all_neighborhoods( df, clusters = NULL, k = 100, max_dist = NULL, inlaid_col = NULL, coords = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least the columns 'x', 'y', cluster identifier column, and 'spot_id'. - 'x', 'y': numeric coordinates of spots - cluster column (name specified by 'cluster_col'): cluster assignment for each spot - 'spot_id': unique identifier for each spot (optional, required for detailed analysis) |
clusters |
Optional vector of cluster labels to process. If 'NULL', all unique clusters in 'df' are used. |
k |
Integer. Number of nearest neighbors to consider. Default is 100. Larger values capture more distant neighbors. |
max_dist |
Numeric. Maximum distance from the target to consider. If 'NULL' (default), all neighbors up to k are included without distance filtering. |
inlaid_col |
Character. Name of the column containing inlaid/annotation values to count in neighborhoods. If 'NULL' (default), uses the 'cluster_col' value. This allows counting neighborhood composition by any column (e.g., cell type, tissue type). |
coords |
Character vector of length 2 giving the coordinate column names. Default is 'c("x", "y")'. |
cluster_col |
Character. Name of the column containing cluster assignments. Default is '"cluster"'. |
This is a batch processing function that applies [count_neighborhood()] to all clusters and combines the results into a single dataframe. Each row represents a neighbor cluster found around a source cluster, with counts and proportions.
A data.frame with columns:
The source cluster being analyzed.
The neighbor cluster type or inlaid value.
Number of neighbor spots of this cluster type.
Proportion of this cluster among all neighbors of the cluster.
Rows are grouped by source cluster and sorted by count within each group (descending).
data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Get neighborhood statistics for all clusters all_neighbors <- count_all_neighborhoods(df, k = 50) head(all_neighbors)data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Get neighborhood statistics for all clusters all_neighbors <- count_all_neighborhoods(df, k = 50) head(all_neighbors)
This function counts the composition of an inlaid/annotation column within a target source cluster. Unlike [count_neighborhood()], this counts types *inside* the source cluster itself, not around it.
count_inlaid(df, cluster, inlaid_col = "cluster", cluster_col = "cluster")count_inlaid(df, cluster, inlaid_col = "cluster", cluster_col = "cluster")
df |
A data.frame containing at least the columns 'x', 'y', cluster identifier column, and 'spot_id'. - 'x', 'y': numeric coordinates of spots - cluster column (name specified by 'cluster_col'): cluster assignment for each spot - inlaid column (name specified by 'inlaid_col'): inlaid/annotation for each spot - 'spot_id': unique identifier for each spot |
cluster |
The cluster label to analyze inlaid composition for. |
inlaid_col |
Character. Name of the column containing inlaid/annotation values. Default is '"cluster"' (to analyze cluster composition within another grouping). |
cluster_col |
Character. Name of the column containing cluster assignments. Default is '"cluster"'. |
A data.frame with columns:
Inlaid/annotation value.
Number of spots with this inlaid type in the cluster.
Proportion of this inlaid type among all spots in cluster.
Rows are sorted by count in descending order.
data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster, inlaid = sample(paste0("type_", 1:3), length(colnames(spe)), replace = TRUE) ) # Count inlaid types within cluster 1 inlaid_counts <- count_inlaid(df, cluster = 1, inlaid_col = "inlaid") inlaid_countsdata("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster, inlaid = sample(paste0("type_", 1:3), length(colnames(spe)), replace = TRUE) ) # Count inlaid types within cluster 1 inlaid_counts <- count_inlaid(df, cluster = 1, inlaid_col = "inlaid") inlaid_counts
This function computes summary statistics of cluster types in the neighborhood of a target cluster or a specific point. It uses [get_neighborhood_spots()] internally to identify neighbors, then counts how many belong to each cluster type.
count_neighborhood( df, cluster = NULL, spot_id = NULL, k = 100, max_dist = NULL, inlaid_col = NULL, coords = c("x", "y"), cluster_col = "cluster" )count_neighborhood( df, cluster = NULL, spot_id = NULL, k = 100, max_dist = NULL, inlaid_col = NULL, coords = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least the columns 'x', 'y', cluster identifier column, and 'spot_id'. - 'x', 'y': numeric coordinates of spots - cluster column (name specified by 'cluster_col'): cluster assignment for each spot - 'spot_id': unique identifier for each spot (optional, required for detailed analysis) |
cluster |
The cluster label for which to analyze the neighborhood. Either 'cluster' or 'spot_id' must be provided, but not both. |
spot_id |
Character. The spot identifier to find neighbors for. Either 'cluster' or 'spot_id' must be provided, but not both. |
k |
Integer. Number of nearest neighbors to consider. Default is 100. Larger values capture more distant neighbors. |
max_dist |
Numeric. Maximum distance from the target to consider. If 'NULL' (default), all neighbors up to k are included without distance #' filtering. |
inlaid_col |
Character. Name of the column containing inlaid/annotation values to count in neighborhoods. If 'NULL' (default), uses the 'cluster_col' value. This allows counting neighborhood composition by any column (e.g., cell type, tissue type). |
coords |
Character vector of length 2 giving the coordinate column names. Default is 'c("x", "y")'. |
cluster_col |
Character. Name of the column containing cluster assignments. Default is '"cluster"'. |
This function wraps [get_neighborhood_spots()] and summarizes the results by counting spots from each neighboring cluster or inlaid type. In cluster mode, the source cluster itself is excluded from the counts. In point mode, all neighboring clusters are counted.
The result shows the composition of the neighborhood: how many spots of each cluster type surround the target.
A data.frame with columns:
Cluster label or inlaid value (from neighboring spots).
Number of times this cluster/type appears in the neighborhood.
Proportion of this cluster/type relative to all neighbors.
Rows are sorted by count in descending order.
data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Count cluster types in neighborhood of cluster 1 neighbor_counts <- count_neighborhood(df, cluster = 1, k = 50) neighbor_counts # Count cluster types around a specific point first_spot <- df$spot_id[1] point_neighbors <- count_neighborhood(df, spot_id = first_spot, k = 10) point_neighborsdata("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Count cluster types in neighborhood of cluster 1 neighbor_counts <- count_neighborhood(df, cluster = 1, k = 50) neighbor_counts # Count cluster types around a specific point first_spot <- df$spot_id[1] point_neighbors <- count_neighborhood(df, spot_id = first_spot, k = 10) point_neighbors
Batch create layer classifications for multiple clusters and optionally generate individual plots for each cluster.
create_all_layers( df, clusters = NULL, k = 6, max_dist = NULL, intermediate_quantile = 0.5, coord_cols = c("x", "y"), cluster_col = "cluster" )create_all_layers( df, clusters = NULL, k = 6, max_dist = NULL, intermediate_quantile = 0.5, coord_cols = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least coordinate and cluster columns. |
clusters |
Optional vector of cluster labels to process. If 'NULL', all unique clusters are used. |
k |
Integer. Number of nearest neighbors to consider. Default is 6. |
max_dist |
Optional numeric. Maximum Euclidean distance for neighbors. |
intermediate_quantile |
Numeric between 0 and 1. The quantile of distances to border used to define the intermediate layer threshold. Default is 0.5 (median). |
coord_cols |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
cluster_col |
Character. Name of the column containing cluster labels. Default is '"cluster"'. |
A data.frame combining all processed clusters with the 'layer' column added.
data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) all_layers <- create_all_layers(df, k = 6, intermediate_quantile = 0.5) head(all_layers)data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) all_layers <- create_all_layers(df, k = 6, intermediate_quantile = 0.5) head(all_layers)
This function classifies spots within a target cluster into three layers: core, intermediate, and border. The border layer consists of spots at the cluster interface (touching other clusters). The core layer consists of spots far from any border. The intermediate layer bridges the two, with depth defined by proximity to the border.
create_cluster_layers( df, target_cluster, k = 6, max_dist = NULL, intermediate_quantile = 0.5, coord_cols = c("x", "y"), cluster_col = "cluster" )create_cluster_layers( df, target_cluster, k = 6, max_dist = NULL, intermediate_quantile = 0.5, coord_cols = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least the coordinate columns and a cluster label column. |
target_cluster |
Cluster label for which to create layers. |
k |
Integer. Number of nearest neighbors to consider (excluding self) when identifying border spots. Default is 6. |
max_dist |
Optional numeric. Maximum Euclidean distance for neighbors. If 'NULL', no distance filtering is applied. |
intermediate_quantile |
Numeric between 0 and 1. The quantile of distances to border used to define the intermediate layer threshold. Default is 0.5 (median). #' Spots with distance to border <= this quantile threshold are classified as intermediate. Use lower values (e.g., 0.33) for narrower intermediate layers, higher values (e.g., 0.75) for wider intermediate layers. |
coord_cols |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
cluster_col |
Character. Name of the column containing cluster labels. Default is '"cluster"'. |
Layer definitions: - **Border**: Spots in the target cluster that touch at least one spot from a different cluster (kNN-based). - **Intermediate**: Non-border spots whose distance to the nearest border spot is within the 'intermediate_quantile' threshold of all non-border spots. - **Core**: All remaining non-border spots that are far from the border.
The 'intermediate_quantile' parameter controls the depth of the intermediate layer: - 0.33: Narrow intermediate layer (only very close to border) - 0.50: Moderate intermediate layer (median distance threshold) - 0.75: Wide intermediate layer (most non-border spots included)
A data.frame based on 'df' filtered to contain only spots from 'target_cluster', with an additional column: - 'layer': character, one of "border", "intermediate", or "core"
data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Default: moderate depth layers <- create_cluster_layers(df, target_cluster = 1, k = 6) table(layers$layer) # Narrow intermediate layer layers_narrow <- create_cluster_layers(df, target_cluster = 1, k = 6, intermediate_quantile = 0.33) table(layers_narrow$layer)data("visiumHD_16um_simulated_spe", package = "Battlefield") spe <- visiumHD_16um_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Default: moderate depth layers <- create_cluster_layers(df, target_cluster = 1, k = 6) table(layers$layer) # Narrow intermediate layer layers_narrow <- create_cluster_layers(df, target_cluster = 1, k = 6, intermediate_quantile = 0.33) table(layers_narrow$layer)
This function inspects k-nearest-neighbor distances in a 2D coordinate set to infer whether the points lie on a **square** or **hexagonal** lattice.
detect_grid_type( df, coords = c("x", "y"), k = 12, tolerance = 0.1, verbose = TRUE )detect_grid_type( df, coords = c("x", "y"), k = 12, tolerance = 0.1, verbose = TRUE )
df |
A data.frame containing spatial coordinates. |
coords |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
k |
Integer. Number of neighbors to consider (internally uses 'k + 1' to include the point itself, then removes it). Default is 12. |
tolerance |
Numeric. Tolerance for matching the characteristic ratio to
|
verbose |
Logical. If 'TRUE', prints diagnostic messages. Default is 'TRUE'. |
The method:
Computes the nearest-neighbor distance for each point and uses the median of those distances as an estimate of the grid step size.
Normalizes all neighbor distances by this step size.
Looks for a characteristic secondary distance peak:
square grid: ratio
hexagonal grid: ratio
The grid step size is estimated as the median of the per-point nearest-neighbor distance. The "median distance ratio" is computed from normalized neighbor distances restricted to the interval (1.2, 2.0), which tends to capture the second-shell distances on regular grids.
A list with:
One of '"square"', '"hexagonal"', or '"unknown"'.
Estimated grid step size (median nearest-neighbor distance).
Median normalized distance ratio in the (1.2, 2.0) range.
# --- Example 1: square grid --- sq <- expand.grid(x = 0:9, y = 0:9) res_sq <- detect_grid_type(sq, coords = c("x","y"), k = 12, tolerance = 0.1) res_sq$grid_type # --- Example 2: hexagonal grid (pointy-top axial-like layout) --- nx <- 10; ny <- 10 hex <- expand.grid(i = 0:(nx-1), j = 0:(ny-1)) hex$x <- hex$i + 0.5 * (hex$j %% 2) hex$y <- (sqrt(3)/2) * hex$j hex <- hex[, c("x","y")] res_hex <- detect_grid_type(hex, coords = c("x","y"), k = 12, tolerance = 0.1) res_hex$grid_type# --- Example 1: square grid --- sq <- expand.grid(x = 0:9, y = 0:9) res_sq <- detect_grid_type(sq, coords = c("x","y"), k = 12, tolerance = 0.1) res_sq$grid_type # --- Example 2: hexagonal grid (pointy-top axial-like layout) --- nx <- 10; ny <- 10 hex <- expand.grid(i = 0:(nx-1), j = 0:(ny-1)) hex$x <- hex$i + 0.5 * (hex$j %% 2) hex$y <- (sqrt(3)/2) * hex$j hex <- hex[, c("x","y")] res_hex <- detect_grid_type(hex, coords = c("x","y"), k = 12, tolerance = 0.1) res_hex$grid_type
Given a vector of cluster labels, this function returns all **ordered** (oriented) pairs of distinct clusters. For each pair ('cluster', 'interface'), it also creates a 'directed_pair' string label such as '"A-B"'.
directed_cluster_interface_pairs( cluster_labels, interface_separator = "-", sort_clusters = FALSE )directed_cluster_interface_pairs( cluster_labels, interface_separator = "-", sort_clusters = FALSE )
cluster_labels |
A vector of cluster labels (character, factor, numeric, etc.). 'NA' values are ignored. |
interface_separator |
Character string used to join 'cluster' and 'interface' into the 'directed_pair' label. Default is '"-"'. |
sort_clusters |
Logical. If 'TRUE', unique cluster labels are sorted (lexicographically) before building pairs. If 'FALSE', preserves first appearance order. Default is 'FALSE'. |
A data.frame with columns:
Source cluster label (character).
Target cluster label (character).
Directed pair label, e.g. "A-B".
If fewer than 2 unique (non-NA) clusters are present, returns an empty data.frame with the same columns.
cl <- c("1","1","2","3", NA) directed_cluster_interface_pairs(cl)cl <- c("1","1","2","3", NA) directed_cluster_interface_pairs(cl)
Estimates the typical spacing between spots by computing each spot's nearest-neighbor distance (excluding itself) and returning the median of those distances. For large datasets, the estimate is computed on a random subsample for speed.
estimate_spot_spacing(df, sample_n = 1000)estimate_spot_spacing(df, sample_n = 1000)
df |
A data frame containing at least columns 'x' and 'y' (numeric) representing spot coordinates. |
sample_n |
Integer; maximum number of spots to sample (default '1000'). If 'nrow(df) > sample_n', a random subset of size 'sample_n' is used; otherwise all spots are used. |
Nearest neighbors are computed with RANN::nn2() using 'k = 2'. The
first neighbor is the point itself (distance 0), so the function uses the
second neighbor distance 'nn$nn.dists[, 2]' as the true nearest-neighbor
distance.
Missing values are handled via 'median(..., na.rm = TRUE)'.
A numeric scalar: the median nearest-neighbor distance (in the same units as 'x' and 'y').
set.seed(1) df <- data.frame( x = rep(1:5, each = 5), y = rep(1:5, times = 5) ) estimate_spot_spacing(df)set.seed(1) df <- data.frame( x = rep(1:5, each = 5), y = rep(1:5, times = 5) ) estimate_spot_spacing(df)
Filters trajectory data by checking the cluster labels at each trajectory's endpoints. For every 'trajectory_id', the start endpoint is defined as the spot with the smallest projection parameter 'pos_on_seg', and the end endpoint as the spot with the largest 'pos_on_seg'. Only trajectories whose start cluster is in 'allowed_start_clusters' *and* whose end cluster is in 'allowed_end_clusters' are kept.
filter_out_by_endpoint_clusters( out, allowed_start_clusters, allowed_end_clusters )filter_out_by_endpoint_clusters( out, allowed_start_clusters, allowed_end_clusters )
out |
A data frame containing selected spots with columns 'trajectory_id', 'cluster', and 'pos_on_seg', typically the output of 'build_similar_trajectories()'. |
allowed_start_clusters |
Vector of allowed cluster labels for the start endpoint. |
allowed_end_clusters |
Vector of allowed cluster labels for the end endpoint. |
The endpoint clusters are computed per 'trajectory_id':
'start_cluster = cluster[which.min(pos_on_seg)]'
'end_cluster = cluster[which.max(pos_on_seg)]'
If multiple spots share the same minimum/maximum 'pos_on_seg', the first is taken (as per 'which.min()' / 'which.max()').
This function uses 'dplyr' ('group_by', 'summarise', 'filter', 'pull') and the base R pipe '|>'.
The same data frame as 'out', filtered to keep only the lines matching the allowed endpoint cluster constraints.
# Minimal example out <- data.frame( trajectory_id = c("L1","L1","L2","L2"), pos_on_seg = c(0.0, 1.0, 0.0, 1.0), cluster = c("A", "B", "A", "C"), x = 1:4, y = 1:4 ) # Keep only trajectories starting in A and ending in B filter_out_by_endpoint_clusters(out, allowed_start_clusters = "A", allowed_end_clusters = "B")# Minimal example out <- data.frame( trajectory_id = c("L1","L1","L2","L2"), pos_on_seg = c(0.0, 1.0, 0.0, 1.0), cluster = c("A", "B", "A", "C"), x = 1:4, y = 1:4 ) # Keep only trajectories starting in A and ending in B filter_out_by_endpoint_clusters(out, allowed_start_clusters = "A", allowed_end_clusters = "B")
This function retrieves all spots from a source cluster along with their inlaid/annotation values. Unlike [get_neighborhood_spots()], this returns spots *inside* the source cluster itself, not around it.
get_inlaid_spots( df, cluster, inlaid_col = "cluster", cluster_col = "cluster", coords = c("x", "y") )get_inlaid_spots( df, cluster, inlaid_col = "cluster", cluster_col = "cluster", coords = c("x", "y") )
df |
A data.frame containing at least the columns 'x', 'y', cluster identifier column, inlaid column, and 'spot_id'. - 'x', 'y': numeric coordinates of spots - cluster column (name specified by 'cluster_col'): cluster assignment for each spot - inlaid column (name specified by 'inlaid_col'): inlaid/annotation for each spot - 'spot_id': unique identifier for each spot |
cluster |
The cluster label to retrieve inlaid spots for. |
inlaid_col |
Character. Name of the column containing inlaid/annotation values. Default is '"cluster"'. |
cluster_col |
Character. Name of the column containing cluster assignments. Default is '"cluster"'. |
coords |
Character vector of length 2 giving the coordinate column names. Default is 'c("x", "y")'. |
A data.frame with columns:
Unique spot identifier.
X coordinate.
Y coordinate.
Inlaid/annotation value.
The source cluster being analyzed.
Logical, always TRUE for returned rows.
Rows are sorted by inlaid and spot_id.
data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster, inlaid = sample(paste0("type_", 1:3), length(colnames(spe)), replace = TRUE) ) # Get all inlaid spots within cluster 1 inlaid_spots <- get_inlaid_spots(df, cluster = 1, inlaid_col = "inlaid") head(inlaid_spots)data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster, inlaid = sample(paste0("type_", 1:3), length(colnames(spe)), replace = TRUE) ) # Get all inlaid spots within cluster 1 inlaid_spots <- get_inlaid_spots(df, cluster = 1, inlaid_col = "inlaid") head(inlaid_spots)
This helper detects the spatial grid type (square vs hexagonal) and returns recommended neighborhood parameters for building local adjacency / neighbor queries (e.g., for border detection or spatial graphs).
get_neighborhood_params( df, coords = c("x", "y"), square_connectivity = c(4, 8), tolerance = 0.1, verbose = TRUE )get_neighborhood_params( df, coords = c("x", "y"), square_connectivity = c(4, 8), tolerance = 0.1, verbose = TRUE )
df |
A data.frame containing spatial coordinates. |
coords |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
square_connectivity |
Integer-like choice for square grids: '4' (Von Neumann) or '8' (Moore). Default is 'c(4, 8)' which selects the first value ('4'). |
tolerance |
Numeric. Passed to [detect_grid_type()] to match characteristic ratios (sqrt(2) / sqrt(3)). Default is 0.1. |
verbose |
Logical. If 'TRUE', prints diagnostic messages. Default is 'TRUE'. |
It relies on [detect_grid_type()] to estimate the grid step size and grid geometry, then chooses typical parameters:
**Hexagonal (Visium)**: 6-neighborhood, radius ~ step
**Square (Visium HD)**: 4- or 8-neighborhood, radius ~ step (4) or ~ sqrt(2)*step (8)
Returned values:
Detected grid type: '"hexagonal"', '"square"', or '"unknown"'.
Nominal grid connectivity (6 for hex, 4 or 8 for square).
Recommended distance threshold (radius) to define adjacency.
A suggested upper bound for kNN queries (used as a safe cap).
Human-readable description of the choice.
Notes:
'radius' is set to '1.01 * step' (or '1.01 * sqrt(2) * step' for #' 8-connectivity) to be slightly permissive under small numerical noise.
'k' is deliberately larger than the nominal connectivity to make sure enough candidates are retrieved before applying distance-based filtering.
A named list with neighborhood parameters: 'grid_type', 'connectivity', 'radius', 'k', and 'comment'.
# Square grid example sq <- expand.grid(x = 0:9, y = 0:9) get_neighborhood_params(sq, square_connectivity = 4, verbose = FALSE) # Hexagonal grid example nx <- 10; ny <- 10 hex <- expand.grid(i = 0:(nx-1), j = 0:(ny-1)) hex$x <- hex$i + 0.5 * (hex$j %% 2) hex$y <- (sqrt(3)/2) * hex$j hex <- hex[, c("x","y")] get_neighborhood_params(hex, verbose = FALSE)# Square grid example sq <- expand.grid(x = 0:9, y = 0:9) get_neighborhood_params(sq, square_connectivity = 4, verbose = FALSE) # Hexagonal grid example nx <- 10; ny <- 10 hex <- expand.grid(i = 0:(nx-1), j = 0:(ny-1)) hex$x <- hex$i + 0.5 * (hex$j %% 2) hex$y <- (sqrt(3)/2) * hex$j hex <- hex[, c("x","y")] get_neighborhood_params(hex, verbose = FALSE)
This function identifies all spots in the neighborhood of a target cluster or a specific point using k-nearest neighbors search. It returns all neighbor spots that were identified, marking which ones are neighbors.
get_neighborhood_spots( df, cluster = NULL, spot_id = NULL, k = 100, max_dist = NULL, inlaid_col = NULL, coords = c("x", "y"), cluster_col = "cluster" )get_neighborhood_spots( df, cluster = NULL, spot_id = NULL, k = 100, max_dist = NULL, inlaid_col = NULL, coords = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least the columns 'x', 'y', cluster identifier column, and 'spot_id'. - 'x', 'y': numeric coordinates of spots - cluster column (name specified by 'cluster_col'): cluster assignment for each spot - 'spot_id': unique identifier for each spot |
cluster |
The cluster label for which to analyze the neighborhood. Either 'cluster' or 'spot_id' must be provided, but not both. |
spot_id |
Character. The spot identifier to find neighbors for. Either 'cluster' or 'spot_id' must be provided, but not both. |
k |
Integer. Number of nearest neighbors to consider. Default is 100. Larger values capture more distant neighbors. If k exceeds the number of spots, it will be capped at nrow(df) - 1. |
max_dist |
Numeric. Maximum distance from the target to consider. If 'NULL' (default), all neighbors up to k are included without distance filtering. |
inlaid_col |
Character. Name of the column containing inlaid/annotation values to return in the 'neighborhood' column. If 'NULL' (default), uses the 'cluster_col' value. This allows counting neighborhood composition by any column (e.g., cell type, tissue type). |
coords |
Character vector of length 2 giving the coordinate column names. Default is 'c("x", "y")'. |
cluster_col |
Character. Name of the column containing cluster assignments. Default is '"cluster"'. |
The function can work in two modes: - **Cluster mode**: If 'cluster' is provided, computes k-nearest neighbors for all spots in that cluster. Returns all neighbors found (excluding source cluster spots). - **Point mode**: If 'spot_id' is provided, finds the k-nearest neighbors to specific spot.
A data.frame with columns:
Unique spot identifier.
X coordinate.
Y coordinate.
Value from 'inlaid_col' (or cluster if inlaid_col is NULL) of the neighbor spot.
The source cluster or spot_id being analyzed.
Logical, always TRUE for returned rows.
Rows are sorted by neighborhood and spot_id.
data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Get all neighbor spots for cluster 1 neighbors_cluster <- get_neighborhood_spots(df, cluster = 1, k = 50) head(neighbors_cluster) # Get neighbors for a specific point first_spot <- df$spot_id[1] neighbors_point <- get_neighborhood_spots(df, spot_id = first_spot, k = 10) head(neighbors_point)data("visium_simulated_spe", package = "Battlefield") spe <- visium_simulated_spe df <- data.frame( spot_id = colnames(spe), x = SpatialExperiment::spatialCoords(spe)[, 1], y = SpatialExperiment::spatialCoords(spe)[, 2], cluster = SummarizedExperiment::colData(spe)$cluster ) # Get all neighbor spots for cluster 1 neighbors_cluster <- get_neighborhood_spots(df, cluster = 1, k = 50) head(neighbors_cluster) # Get neighbors for a specific point first_spot <- df$spot_id[1] neighbors_point <- get_neighborhood_spots(df, spot_id = first_spot, k = 10) head(neighbors_point)
Computes the Euclidean distance from one or more points '(px, py)' to the line segment defined by endpoints 'A(ax, ay)' and 'B(bx, by)'. The projection is clamped to the segment (i.e., 'pos_on_seg' is restricted to '[0, 1]').
point_segment_distance_vec(px, py, ax, ay, bx, by)point_segment_distance_vec(px, py, ax, ay, bx, by)
px, py
|
Numeric vectors (or scalars) giving the x/y coordinates of the query point(s). |
ax, ay
|
Numeric scalars giving the x/y coordinates of endpoint A. |
bx, by
|
Numeric scalars giving the x/y coordinates of endpoint B. |
Let 'v = B - A' and 'pos_on_seg = ((P - A) · v) / ||v||^2'. The closest point on the infinite line is 'A + pos_on_seg v'; clamping 'pos_on_seg' to '[0, 1]' yields the closest point on the segment.
If 'A' and 'B' are identical ('||v||^2 == 0'), the function returns the distance from 'P' to 'A'.
A numeric vector of distances, with length equal to ‘max(length(px), length(py))' (after R’s usual vector recycling rules).
# Single point to a horizontal segment point_segment_distance_vec(px = 1, py = 2, ax = 0, ay = 0, bx = 3, by = 0) # Multiple points (vectorized) px <- c(0, 1, 2, 3) py <- c(1, 1, 1, 1) point_segment_distance_vec(px, py, ax = 0, ay = 0, bx = 3, by = 0) # Degenerate segment (A == B) point_segment_distance_vec(px = c(0, 1), py = c(0, 1), ax = 0, ay = 0, bx = 0, by = 0)# Single point to a horizontal segment point_segment_distance_vec(px = 1, py = 2, ax = 0, ay = 0, bx = 3, by = 0) # Multiple points (vectorized) px <- c(0, 1, 2, 3) py <- c(1, 1, 1, 1) point_segment_distance_vec(px, py, ax = 0, ay = 0, bx = 3, by = 0) # Degenerate segment (A == B) point_segment_distance_vec(px = c(0, 1), py = c(0, 1), ax = 0, ay = 0, bx = 0, by = 0)
Filters 'df' to drop any rows whose rounded coordinate key (as produced by '.xy_key(x, y)') appears in 'used_df'.
remove_used_points(df, used_df)remove_used_points(df, used_df)
df |
A data frame containing at least columns 'x' and 'y'. |
used_df |
A data frame containing at least columns 'x' and 'y' defining the set of already-used points to exclude. |
Matching is performed on rounded coordinates, not exact floating-point values. Internally, keys are computed as 'paste0(round(x), "_", round(y))'.
This function uses 'dplyr::filter()' (and the '|>' pipe), so 'dplyr' must be installed and a pipe operator must be available.
A filtered data frame with the same columns as 'df', excluding rows whose rounded '(x, y)' matches any row in 'used_df'.
df <- data.frame(x = c(0.1, 1.2, 1.6, 2.0), y = c(0.2, 3.4, 3.4, 9.5), id = 1:4) used_df <- data.frame(x = c(1.49, 2.1), y = c(3.49, 9.49)) # Removes rows rounding to (1,3) and (2,9) remove_used_points(df, used_df)df <- data.frame(x = c(0.1, 1.2, 1.6, 2.0), y = c(0.2, 3.4, 3.4, 9.5), id = 1:4) used_df <- data.frame(x = c(1.49, 2.1), y = c(3.49, 9.49)) # Removes rows rounding to (1,3) and (2,9) remove_used_points(df, used_df)
This function identifies **directed interface spots**: spots belonging to 'cluster' (A) that have at least one neighbor in 'interface' (B) within a k-nearest-neighbors neighborhood (optionally constrained by a distance cutoff).
select_border_spots( df, cluster, interface, mode = "inner", k = 7, max_dist = NULL, coord_cols = c("x", "y"), cluster_col = "cluster" )select_border_spots( df, cluster, interface, mode = "inner", k = 7, max_dist = NULL, coord_cols = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least the coordinate columns and a cluster label column. |
cluster |
Cluster label for the source cluster (A). Spots in this cluster are tested for adjacency to 'interface'. |
interface |
Cluster label for the target cluster (B). |
mode |
Character. One of "inner", "outer", or "both". - "inner": returns border spots from cluster → interface (default). - "outer": returns border spots from interface → cluster (swaps cluster/interface). - "both": returns border spots from both directions, row-bound together. Default is "inner". |
k |
Integer. Number of nearest neighbors to consider (excluding self). Internally uses 'k + 1' to include self then removes it. Default is 7. |
max_dist |
Optional numeric. Maximum Euclidean distance for a neighbor to be considered (distance cutoff). If 'NULL', no distance filtering is applied. |
coord_cols |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
cluster_col |
Character. Name of the column containing cluster labels. Default is '"cluster"'. |
In addition, it flags spots that also touch **other clusters** (i.e., junction / multi-interface spots) and reports which other clusters are touched.
Steps:
Builds a kNN neighborhood for each spot in 'cluster'.
Optionally removes neighbors farther than 'max_dist'.
Marks a spot as a border spot if it has at least one neighbor in 'interface'.
Flags a spot as multi-interface if it also has a neighbor belonging to a cluster different from 'cluster' and 'interface'.
**Note on mode column**: The 'mode' column in the returned data.frame will always contain either "inner" or "outer", even if the 'mode' parameter is set to "both". When 'mode="both"', the function internally calls the selection algorithm twice (once for cluster→interface and once for interface→cluster), then row-binds the results, so each row is labeled with its actual direction.
The returned data.frame contains only spots from 'cluster' that touch 'interface', with columns:
Spot identifier.
X coordinate.
Y coordinate.
Cluster label (same as 'cluster').
The value of 'interface'.
Interface label combining cluster and 'interface', e.g. '"A-B"'.
Undirected pair label (both directions), e.g. '"A-B / B-A"'.
Always 'TRUE' for returned rows (kept for clarity).
'TRUE' if the spot also touches other clusters.
Comma-separated list of other clusters touched, or 'NA'.
The mode used for selection: "inner", "outer", or "both".
A data.frame subset of 'df' containing border spots from 'cluster' to 'interface' with columns: spot_id, x, y, cluster, interface, directed_pair, is_border, is_border_multiple, other_adjacent_borders.
# Minimal example (requires RANN and dplyr) set.seed(1) df_ex <- data.frame( x = rnorm(200), y = rnorm(200), cluster = sample(c("A","B","C"), 200, replace = TRUE) ) res <- select_border_spots(df_ex, cluster = "A", interface = "B", k = 7) head(res)# Minimal example (requires RANN and dplyr) set.seed(1) df_ex <- data.frame( x = rnorm(200), y = rnorm(200), cluster = sample(c("A","B","C"), 200, replace = TRUE) ) res <- select_border_spots(df_ex, cluster = "A", interface = "B", k = 7) head(res)
This function selects core (non-interface) spots from a cluster that match the count of border spots for a specific directed pair. The 'mode' value is read from the 'border_df' dataframe for this cluster/interface pair. If multiple mode values exist in border_df for the pair, a warning is printed and the first value is used.
select_core_spots( df, border_df, cluster, interface, mode = "both", coord_cols = c("x", "y"), cluster_col = "cluster" )select_core_spots( df, border_df, cluster, interface, mode = "both", coord_cols = c("x", "y"), cluster_col = "cluster" )
df |
A data.frame containing at least the coordinate columns and a cluster label column. |
border_df |
A data.frame of border spots from [build_all_borders()], containing columns 'spot_id', 'cluster', 'interface', 'directed_pair', etc. |
cluster |
Character. Label of the cluster from which to select inner spots. |
interface |
Character. Label of the target cluster for the directed pair. |
mode |
Character. One of "inner", "outer", or "both". Controls which mode values from border_df to use when counting border spots. When mode="both", both "inner" and "outer" mode rows from border_df are used. Default is "both". |
coord_cols |
Character vector of length 2 giving the coordinate column names. Default is 'c("x","y")'. |
cluster_col |
Character. Name of the column containing cluster labels. Default is '"cluster"'. |
For example with 'cluster="1"' and 'interface="2"': - If mode in df is "inner": counts only border spots from 1→2 - If mode in df is "outer": counts only border spots from 2→1 - If mode in df is "both": counts border spots from both 1→2 AND 2→1
This returns N random inner spots from the cluster, where N equals the border count.
Steps:
Counts directed border spots based on 'mode' parameter.
Gets all spots in 'cluster' that are not at any border.
Randomly samples the same number of core spots as the border count.
Returns these sampled core spots with 'interface' annotation.
If not enough core spots exist to match the border count, all available core spots are returned with a warning.
A data.frame of sampled core spots from 'cluster', with all columns from 'df' plus 'interface' annotation.
# Create synthetic spatial transcriptomics data with 3 clusters df_ex <- data.frame( spot_id = paste0("spot_", seq_len(300)), x = rnorm(300), y = rnorm(300), cluster = sample(c("A", "B", "C"), 300, replace = TRUE) ) # Step 1: Identify all border spots between clusters all_borders <- build_all_borders(df_ex, k = 6) # Step 2: Select core spots for the directed pair A→B # matching the number of border spots found cores_inner <- select_core_spots( df_ex, border_df = all_borders, cluster = "A", interface = "B", mode = "inner" ) head(cores_inner) # Step 3: Select core spots considering both directions (A→B AND B→A) cores_both <- select_core_spots( df_ex, border_df = all_borders, cluster = "A", interface = "B", mode = "both" ) head(cores_both)# Create synthetic spatial transcriptomics data with 3 clusters df_ex <- data.frame( spot_id = paste0("spot_", seq_len(300)), x = rnorm(300), y = rnorm(300), cluster = sample(c("A", "B", "C"), 300, replace = TRUE) ) # Step 1: Identify all border spots between clusters all_borders <- build_all_borders(df_ex, k = 6) # Step 2: Select core spots for the directed pair A→B # matching the number of border spots found cores_inner <- select_core_spots( df_ex, border_df = all_borders, cluster = "A", interface = "B", mode = "inner" ) head(cores_inner) # Step 3: Select core spots considering both directions (A→B AND B→A) cores_both <- select_core_spots( df_ex, border_df = all_borders, cluster = "A", interface = "B", mode = "both" ) head(cores_both)
Translates a point 'P' by an amount 'offset' in the direction '(nx, ny)'. The returned point is:
shift_point(P, nx, ny, offset)shift_point(P, nx, ny, offset)
P |
A data frame with columns 'x' and 'y' representing the point to shift. If it contains multiple rows, only the first row is used. |
nx |
Numeric scalar; x-component of the direction vector. |
ny |
Numeric scalar; y-component of the direction vector. |
offset |
Numeric scalar; translation magnitude (in the same units as 'x'/'y'). Positive values move in the '(nx, ny)' direction; negative values move in the opposite direction. |
A one-row data frame with columns 'x' and 'y' giving the shifted point.
# Shift the point (1, 2) by 3 units in the direction (0, 1) P <- data.frame(x = 1, y = 2) shift_point(P, nx = 0, ny = 1, offset = 3)# Shift the point (1, 2) by 3 units in the direction (0, 1) P <- data.frame(x = 1, y = 2) shift_point(P, nx = 0, ny = 1, offset = 3)
Returns the unit-length normal vector pointing to the left of the directed
segment . For a direction vector ,
the left normal is normalized by .
unit_normal_left(A, B)unit_normal_left(A, B)
A |
A data frame with columns 'x' and 'y' representing point A. If it contains multiple rows, only the first row is used. |
B |
A data frame with columns 'x' and 'y' representing point B. If it contains multiple rows, only the first row is used. |
A numeric vector of length 2 with named components:
x-component of the left unit normal.
y-component of the left unit normal.
# Horizontal segment to the right: (0,0) -> (1,0) # Left normal points upward: (0, 1) A <- data.frame(x = 0, y = 0) B <- data.frame(x = 1, y = 0) unit_normal_left(A, B)# Horizontal segment to the right: (0,0) -> (1,0) # Left normal points upward: (0, 1) A <- data.frame(x = 0, y = 0) B <- data.frame(x = 1, y = 0) unit_normal_left(A, B)
A simulated SpatialExperiment object mimicking a 10x Genomics
Visium spatial transcriptomics layout. The dataset contains spot-level
spatial coordinates and cluster annotations, and is intended for
testing.
data(visium_simulated_spe)data(visium_simulated_spe)
A SpatialExperiment object with:
One dummy assay (required container structure)
Spot metadata including barcode and
cluster
Numeric matrix ofx/y spot coordinates
This dataset contain only gene expression values for one gene
named FAKE_GENE.
The dummy assay is included solely to satisfy the
SummarizedExperiment container requirements.
Spot coordinates follow a Visium-like hexagonal grid. Cluster labels are simulated and have no biological meaning.
Simulated internally for package development.
visium_simulated_spe
data(visium_simulated_spe) visium_simulated_spedata(visium_simulated_spe) visium_simulated_spe
A simulated SpatialExperiment object mimicking a 10x Genomics
Visium spatial transcriptomics layout. The dataset contains spot-level
spatial coordinates and cluster annotations, and is intended for
testing.
data(visiumHD_16um_simulated_spe)data(visiumHD_16um_simulated_spe)
A SpatialExperiment object with:
One dummy assay (required container structure)
Spot metadata including barcode and
cluster
Numeric matrix of x/y spot coordinates
This dataset contain only gene expression values for one gene
named FAKE_GENE.
The dummy assay is included solely to satisfy the
SummarizedExperiment container requirements.
Spot coordinates follow a Visium-like squared grid. Cluster labels are simulated and have no biological meaning.
The layout is intended to mimic a Visium HD *binned* layer (e.g., 16 µm bin size), where each spot represents one bin arranged on an adjacent square grid.
Simulated internally for package development.
visiumHD_16um_simulated_spe
data(visiumHD_16um_simulated_spe) visiumHD_16um_simulated_spedata(visiumHD_16um_simulated_spe) visiumHD_16um_simulated_spe
A simulated SpatialExperiment object mimicking a 10x Genomics
Visium spatial transcriptomics layout. The dataset contains spot-level
spatial coordinates and cluster annotations, and is intended for
testing.
data(visiumHD_8um_simulated_spe)data(visiumHD_8um_simulated_spe)
A SpatialExperiment object with:
One dummy assay (required container structure)
Spot metadata including barcode and
cluster
Numeric matrix of x/y spot coordinates
This dataset contain only gene expression values for one gene
named FAKE_GENE.
The dummy assay is included solely to satisfy the
SummarizedExperiment container requirements.
Spot coordinates follow a Visium-like squared grid. Cluster labels are simulated and have no biological meaning.
The layout is intended to mimic a Visium HD *binned* layer (e.g., 8 µm bin size), where each spot represents one bin arranged on an adjacent square grid.
Simulated internally for package development.
visiumHD_8um_simulated_spe
data(visiumHD_8um_simulated_spe) visiumHD_8um_simulated_spedata(visiumHD_8um_simulated_spe) visiumHD_8um_simulated_spe