Identify contiguous clusters of time points in a time series where a variable exceeds a positive threshold or falls below a negative threshold. Clusters are defined as consecutive time points satisfying the thresholding condition.
Usage
find_clusters(
data,
threshold = 10,
group = NULL,
above_threshold = TRUE,
both_signs = TRUE
)Arguments
- data
A data frame containing at least two columns named
"time"and"value". Ifgroupis notNULL,datamust also contain that grouping column.- threshold
Numeric scalar specifying the (positive) threshold used to define clusters. Positive clusters are defined where
value >= threshold, and negative clusters wherevalue <= -threshold. Must be non-negative whenboth_signs = TRUE.- group
Optional grouping column name (character scalar) used to find clusters independently within each group level (e.g.,
"participant"). Set toNULL(default) to ignore grouping.- above_threshold
Logical scalar used only when
both_signs = FALSE. IfTRUE(default), clusters are formed wherevalue >= threshold; ifFALSE, clusters are formed wherevalue <= threshold.- both_signs
Logical scalar indicating whether to detect both positive and negative clusters in a single call (default:
TRUE). WhenTRUE,above_thresholdis ignored.
Value
A data frame with one row per detected cluster and columns:
cluster_id: integer cluster index (starting at 1). Whengroupis provided,cluster_idrestarts at 1 within each group level;cluster_onset: time of the first point in the cluster;cluster_offset: time of the last point in the cluster;n_points: number of time points in the cluster;sign: character indicating cluster sign ("positive"or"negative").
If group is not NULL, the returned data frame also contains
the grouping column (named as in group).
If no clusters are found, an empty data frame with the same column structure is returned.
Details
By default, the function detects both positive and negative clusters in a single call and returns a column indicating the cluster sign.
If a grouping variable is provided (e.g., "participant"), clusters
are detected independently within each group level.
The function assumes that the time variable is numeric and that
consecutive rows correspond to consecutive time points (within each group
if grouping is used). Internally, the data are:
filtered to remove rows with missing values;
arranged by
time(and bygroupthentime, if used);thresholded to identify positive and/or negative excursions;
segmented into runs of consecutive threshold-exceeding values, which define clusters.
Author
Ladislas Nalborczyk ladislas.nalborczyk@cnrs.fr
Examples
set.seed(666)
df <- data.frame(
time = seq(0, 1, length.out = 100),
value = c(
rnorm(30, 0, 1),
rnorm(20, 4, 1), # positive cluster
rnorm(20, -4, 1), # negative cluster
rnorm(30, 0, 1)
)
)
# Detect both positive and negative clusters
find_clusters(data = df, threshold = 3)
#> cluster_id cluster_onset cluster_offset n_points sign
#> 1 1 0.3030303 0.3636364 7 positive
#> 2 2 0.3838384 0.4444444 7 positive
#> 3 3 0.4646465 0.4949495 4 positive
#> 4 1 0.5050505 0.5050505 1 negative
#> 5 2 0.5252525 0.5858586 7 negative
#> 6 3 0.6060606 0.6262626 3 negative
#> 7 4 0.6464646 0.6767677 4 negative
#> 8 5 0.6969697 0.6969697 1 negative
# One-sided detection (positive only)
find_clusters(data = df, threshold = 3, both_signs = FALSE, above_threshold = TRUE)
#> cluster_id cluster_onset cluster_offset n_points sign
#> 1 1 0.3030303 0.3636364 7 positive
#> 2 2 0.3838384 0.4444444 7 positive
#> 3 3 0.4646465 0.4949495 4 positive
# Grouped example (e.g., per participant)
df_g <- rbind(
transform(df, participant = "P01"),
transform(df, participant = "P02")
)
find_clusters(
data = df_g,
threshold = 3,
group = "participant"
)
#> participant cluster_id cluster_onset cluster_offset n_points sign
#> 1 P01 1 0.3030303 0.3636364 7 positive
#> 2 P01 2 0.3838384 0.4444444 7 positive
#> 3 P01 3 0.4646465 0.4949495 4 positive
#> 4 P02 1 0.3030303 0.3636364 7 positive
#> 5 P02 2 0.3838384 0.4444444 7 positive
#> 6 P02 3 0.4646465 0.4949495 4 positive
#> 7 P01 1 0.5050505 0.5050505 1 negative
#> 8 P01 2 0.5252525 0.5858586 7 negative
#> 9 P01 3 0.6060606 0.6262626 3 negative
#> 10 P01 4 0.6464646 0.6767677 4 negative
#> 11 P01 5 0.6969697 0.6969697 1 negative
#> 12 P02 1 0.5050505 0.5050505 1 negative
#> 13 P02 2 0.5252525 0.5858586 7 negative
#> 14 P02 3 0.6060606 0.6262626 3 negative
#> 15 P02 4 0.6464646 0.6767677 4 negative
#> 16 P02 5 0.6969697 0.6969697 1 negative
