Identify contiguous clusters of time points in a time series where a variable exceeds a positive threshold, falls below a negative threshold, or both. Clusters are defined as consecutive time points satisfying the thresholding condition.
Usage
find_clusters(
data,
threshold = 10,
group = NULL,
threshold_type = c("both", "above", "below")
)Arguments
- data
A data frame containing at least two columns named
"time"and"value". Ifgroupis notNULL,datamust also contain that grouping column.- threshold
Numeric scalar specifying the (positive) threshold used to define clusters. Positive clusters are defined where
value >= threshold, and negative clusters wherevalue <= 1/threshold. Must be non-negative whenthreshold_type = "both".- group
Optional grouping column name (character scalar) used to find clusters independently within each group level (e.g.,
"participant"). Set toNULL(default) to ignore grouping.- threshold_type
Character scalar controlling which clusters are detected. Must be one of
"above","below", or"both"(default). When"above", clusters are formed wherevalue >= threshold. When"below", clusters are formed wherevalue <= 1/threshold. When"both", both types are detected and the returned data include asigncolumn.
Value
A data frame with one row per detected cluster and columns:
id: integer cluster index (starting at 1). Whengroupis provided,idrestarts at 1 within each group level;onset: time of the first point in the cluster;offset: time of the last point in the cluster;n_points: number of time points in the cluster;sign: character indicating cluster sign ("positive"or"negative").
If group is not NULL, the returned data frame also contains
the grouping column (named as in group).
If no clusters are found, an empty data frame with the same column structure is returned.
Details
By default, the function detects both positive and negative clusters in a single call and returns a column indicating the cluster sign.
If a grouping variable is provided (e.g., "participant"), clusters
are detected independently within each group level.
The function assumes that the time variable is numeric and that
consecutive rows correspond to consecutive time points (within each group
if grouping is used). Internally, the data are:
filtered to remove rows with missing values;
arranged by
time(and bygroupthentime, if used);thresholded to identify positive and/or negative excursions;
segmented into runs of consecutive threshold-exceeding values, which define clusters.
Author
Ladislas Nalborczyk ladislas.nalborczyk@cnrs.fr
Examples
set.seed(666)
df <- data.frame(
time = seq(0, 1, length.out = 100),
value = c(
rnorm(30, 0, 1),
rnorm(20, 4, 1), # positive cluster
rnorm(20, -4, 1), # negative cluster
rnorm(30, 0, 1)
)
)
# Detect both positive and negative clusters
find_clusters(data = df, threshold = 3, threshold_type = "both")
#> id onset offset n_points sign
#> 1 1 0.30303030 0.36363636 7 positive
#> 2 2 0.38383838 0.44444444 7 positive
#> 3 3 0.46464646 0.49494949 4 positive
#> 4 1 0.02020202 0.02020202 1 negative
#> 5 2 0.04040404 0.04040404 1 negative
#> 6 3 0.06060606 0.09090909 4 negative
#> 7 4 0.11111111 0.11111111 1 negative
#> 8 5 0.13131313 0.15151515 3 negative
#> 9 6 0.18181818 0.18181818 1 negative
#> 10 7 0.20202020 0.21212121 2 negative
#> 11 8 0.23232323 0.29292929 7 negative
#> 12 9 0.50505051 0.71717172 22 negative
#> 13 10 0.73737374 0.79797980 7 negative
#> 14 11 0.82828283 0.84848485 3 negative
#> 15 12 0.88888889 0.94949495 7 negative
#> 16 13 0.97979798 1.00000000 3 negative
# One-sided detection (positive only)
find_clusters(data = df, threshold = 3, threshold_type = "above")
#> id onset offset n_points sign
#> 1 1 0.3030303 0.3636364 7 positive
#> 2 2 0.3838384 0.4444444 7 positive
#> 3 3 0.4646465 0.4949495 4 positive
# One-sided detection (negative only)
find_clusters(data = df, threshold = 3, threshold_type = "below")
#> id onset offset n_points sign
#> 1 1 0.02020202 0.02020202 1 negative
#> 2 2 0.04040404 0.04040404 1 negative
#> 3 3 0.06060606 0.09090909 4 negative
#> 4 4 0.11111111 0.11111111 1 negative
#> 5 5 0.13131313 0.15151515 3 negative
#> 6 6 0.18181818 0.18181818 1 negative
#> 7 7 0.20202020 0.21212121 2 negative
#> 8 8 0.23232323 0.29292929 7 negative
#> 9 9 0.50505051 0.71717172 22 negative
#> 10 10 0.73737374 0.79797980 7 negative
#> 11 11 0.82828283 0.84848485 3 negative
#> 12 12 0.88888889 0.94949495 7 negative
#> 13 13 0.97979798 1.00000000 3 negative
# Grouped example (e.g., per participant)
df_g <- rbind(
transform(df, participant = "P01"),
transform(df, participant = "P02")
)
find_clusters(
data = df_g,
threshold = 3,
group = "participant",
threshold_type = "both"
)
#> participant id onset offset n_points sign
#> 1 P01 1 0.30303030 0.36363636 7 positive
#> 2 P01 2 0.38383838 0.44444444 7 positive
#> 3 P01 3 0.46464646 0.49494949 4 positive
#> 4 P01 1 0.02020202 0.02020202 1 negative
#> 5 P01 2 0.04040404 0.04040404 1 negative
#> 6 P01 3 0.06060606 0.09090909 4 negative
#> 7 P01 4 0.11111111 0.11111111 1 negative
#> 8 P01 5 0.13131313 0.15151515 3 negative
#> 9 P01 6 0.18181818 0.18181818 1 negative
#> 10 P01 7 0.20202020 0.21212121 2 negative
#> 11 P01 8 0.23232323 0.29292929 7 negative
#> 12 P01 9 0.50505051 0.71717172 22 negative
#> 13 P01 10 0.73737374 0.79797980 7 negative
#> 14 P01 11 0.82828283 0.84848485 3 negative
#> 15 P01 12 0.88888889 0.94949495 7 negative
#> 16 P01 13 0.97979798 1.00000000 3 negative
#> 17 P02 1 0.30303030 0.36363636 7 positive
#> 18 P02 2 0.38383838 0.44444444 7 positive
#> 19 P02 3 0.46464646 0.49494949 4 positive
#> 20 P02 1 0.02020202 0.02020202 1 negative
#> 21 P02 2 0.04040404 0.04040404 1 negative
#> 22 P02 3 0.06060606 0.09090909 4 negative
#> 23 P02 4 0.11111111 0.11111111 1 negative
#> 24 P02 5 0.13131313 0.15151515 3 negative
#> 25 P02 6 0.18181818 0.18181818 1 negative
#> 26 P02 7 0.20202020 0.21212121 2 negative
#> 27 P02 8 0.23232323 0.29292929 7 negative
#> 28 P02 9 0.50505051 0.71717172 22 negative
#> 29 P02 10 0.73737374 0.79797980 7 negative
#> 30 P02 11 0.82828283 0.84848485 3 negative
#> 31 P02 12 0.88888889 0.94949495 7 negative
#> 32 P02 13 0.97979798 1.00000000 3 negative
