2 - Data Analysis
Marina Papadopoulou
Source:vignettes/step2_data_analysis.Rmd
step2_data_analysis.Rmd
swaRverse
provides a pipeline to extract metrics of
collective motion from grouping individuals trajectories. Metrics
include either global (group-level) or pairwise (individual-level)
characteristics of the group. After calculating the timeseries of these
metrics, the package estimates their averages over each ‘event’ of
collective motion. More details about how an event is defined is given
below. Let’s start with ..
2.1 Velocity estimations
We start by adding headings and speeds to the trajectory data, and splitting the whole dataframe into a list of dataframes, one per set. For this, we need to specify whether the data correspond to geo data (lon-lat) or not.
library(swaRmverse)
#data_df <- trackdf::tracks
#raw$set <- c(rep('ctx1', nrow(raw)/2 ), rep('ctx2', nrow(raw)/2))
raw <- read.csv(system.file("extdata/video/01.csv", package = "trackdf"))
raw <- raw[!raw$ignore, ]
## Add fake context
raw$context <- c(rep("ctx1", nrow(raw) / 2), rep("ctx2", nrow(raw) / 2))
data_df <- set_data_format(raw_x = raw$x,
raw_y = raw$y,
raw_t = raw$frame,
raw_id = raw$id,
origin = "2020-02-1 12:00:21",
period = "0.04S",
tz = "America/New_York",
raw_context = raw$context
)
is_geo <- FALSE
data_dfs <- add_velocities(data_df,
geo = is_geo,
verbose = TRUE,
parallelize = FALSE
) ## A list of dataframes
## Adding velocity info to every set of the dataset..
## Done!
## [1] "Velocity information added for 2 sets."
If there is a high number of sets in the dataset, the parallelization of the function can be turned on (setting parallelize argument to TRUE). This is not recommended for small to intermediate data sizes.
2.2 Group characteristics
Based on the list of positional data and calculated velocities, we can now calculate the timeseries of group polarization, average speed, and shape. As a proxy for group shape we use the angle between the object-oriented bounding box that includes the position of all group members and the average heading of the group. Small angles close to 0 rads represent oblong groups, while large angles close to pi/2 rads wide groups. The group_metrics function calculates the timeseries of each measurement across sets. To reduce noise, the function further calculates the smoothed timeseries of speed and polarization over a given time window (using a moving average).
sampling_timestep <- 0.04
time_window <- 1 # seconds
smoothing_time_window <- time_window / sampling_timestep
g_metr <- group_metrics_per_set(data_list = data_dfs,
mov_av_time_window = smoothing_time_window,
step2time = sampling_timestep,
geo = is_geo,
parallelize = FALSE
)
summary(g_metr)
## set t pol
## Length:2802 Min. :2020-02-01 12:00:21.03 Min. :0.01027
## Class :character 1st Qu.:2020-02-01 12:00:49.04 1st Qu.:0.20701
## Mode :character Median :2020-02-01 12:01:17.05 Median :0.32532
## Mean :2020-02-01 12:01:17.03 Mean :0.33785
## 3rd Qu.:2020-02-01 12:01:45.02 3rd Qu.:0.44768
## Max. :2020-02-01 12:02:13.03 Max. :0.97476
## NA's :2
## speed shape N missing_ind
## Min. : 35.42 Min. :0.0002811 Min. :3.000 Min. :0.0000
## 1st Qu.: 132.02 1st Qu.:0.4236876 1st Qu.:7.000 1st Qu.:0.0000
## Median : 175.98 Median :0.8327600 Median :7.000 Median :1.0000
## Mean : 742.80 Mean :0.8127599 Mean :7.291 Mean :0.5543
## 3rd Qu.: 243.84 3rd Qu.:1.1974449 3rd Qu.:8.000 3rd Qu.:1.0000
## Max. :12232.96 Max. :1.5706044 Max. :9.000 Max. :5.0000
## NA's :2 NA's :2 NA's :2
## speed_av pol_av
## Min. : 111.5 Min. :0.1696
## 1st Qu.: 426.3 1st Qu.:0.2852
## Median : 670.1 Median :0.3259
## Mean : 746.7 Mean :0.3379
## 3rd Qu.:1005.7 3rd Qu.:0.3812
## Max. :2241.2 Max. :0.5599
## NA's :50 NA's :50
As before, one can parallelize the function if the data are from many different days/sets. A column of N and missing_ind are added to the dataframe, showing the group size of that time point and whether an individual has NA data.
2.3 Pairwise measurements
From the timeseries of positions and velocities, we can calculate information concerning the nearest neighbor of each group member. Here we estimate the distance and the bearing angle (angle between the focal individual’s heading and its neighbor) to the nearest neighbor of each individual. These, along with the id of the nearest neighbor, are added as columns to the positional timeseries dataframe:
data_df <- pairwise_metrics(data_list = data_dfs,
geo = is_geo,
verbose = TRUE,
parallelize = FALSE,
add_coords = FALSE # could be set to TRUE if the relative positions of neighbors are needed
)
## Pairwise analysis started..
#tail(data_df)
2.4 Metrics of collective motion
Based on the global and local measurements, we then calculate a series of metrics that aim to capture the dynamics of the collective motion of the group. These metrics are calculated over parts of the trajectories that the group is performing coordinated collective motion, when the group is moving (average speed is higher than a given threshold) and is somewhat polarized (polarization higher than a given threshold). These parts are defined as ‘events’. The thresholds are asked by the user in run time if ‘interactive_mode’ is activated, after printing the quantiles of average speed and polarization across all data. Otherwise, the thresholds (pol_lim and speed_lim) should be given as inputs. If both limits are set to 0, a set will be taken as a complete event. The time between observation is needed as input to distinguish between continuous events. When the group and pairwise timeseries are calculated, one can calculate the metrics per event:
### Interactive mode, if the limits of speed and polarization are unknown
# new_species_metrics <- col_motion_metrics(data_df,
# global_metrics = g_metr,
# step2time = sampling_timestep,
# verbose = TRUE,
# speed_lim = NA,
# pol_lim = NA
#
# )
new_species_metrics <- col_motion_metrics(data_df,
global_metrics = g_metr,
step2time = sampling_timestep,
verbose = TRUE,
speed_lim = 150,
pol_lim = 0.3
)
# summary(new_species_metrics)
The number of events and their total duration given the input thresholds is also printed. If we are not interested in inspecting the timeseries of the measurements, on can calculate the metrics directly from the formatted dataset:
new_species_metrics <- col_motion_metrics_from_raw(data_df,
mov_av_time_window = smoothing_time_window,
step2time = sampling_timestep,
geo = is_geo,
verbose = TRUE,
speed_lim = 150,
pol_lim = 0.3,
parallelize_all = FALSE
)
## Adding velocity info to every set of the dataset..
## Done!
# summary(new_species_metrics)
Since we are interested in comparing different datasets across species or contexts, a new species id column should be added:
new_species_metrics$species <- "new_species_1"
head(new_species_metrics)
## event N set start_time mean_mean_nnd mean_sd_nnd
## 1 1 8 2020-02-01_ctx1 2020-02-01 12:00:21 260.5298 196.39392
## 2 2 8 2020-02-01_ctx1 2020-02-01 12:00:23 194.2149 125.30859
## 3 3 7 2020-02-01_ctx1 2020-02-01 12:00:25 178.0867 70.25351
## 4 4 8 2020-02-01_ctx1 2020-02-01 12:00:26 214.7076 68.20413
## 5 5 8 2020-02-01_ctx1 2020-02-01 12:00:27 156.2709 132.57649
## 6 6 7 2020-02-01_ctx1 2020-02-01 12:00:28 159.9396 97.68897
## sd_mean_nnd mean_pol sd_pol cv_speed mean_sd_front mean_mean_bangl
## 1 3.779780 0.3334596 0.1445072 1.8472052 0.2927575 1.585528
## 2 29.110134 0.3260762 0.1675038 1.6394169 0.2781643 1.222012
## 3 25.204944 0.3260333 0.2012695 1.8615297 0.3184363 1.807799
## 4 NA 0.4152229 NA NA 0.2949830 1.495182
## 5 4.245226 0.2857535 0.1470665 0.9509542 0.2441024 1.513099
## 6 26.605974 0.3996904 0.1786352 1.9967636 0.2936794 1.691812
## mean_shape sd_shape event_dur species
## 1 0.7726862 0.4708161 1.28 new_species_1
## 2 0.7073765 0.3808718 1.12 new_species_1
## 3 0.9843532 0.4146392 1.08 new_species_1
## 4 1.1002695 NA 0.04 new_species_1
## 5 1.0568951 0.3726853 0.32 new_species_1
## 6 0.9988084 0.3790848 7.80 new_species_1
## Un-comment bellow to save the output in order to combine it with other datasets (replace 'path2file' with appropriate local path and name).
# write.csv(new_species_metrics, file = path2file.csv, row.names = FALSE) # OR R object
# save(new_species_metrics, file = path2file.rda)
The duration, starting time and group size (N) of each event are also added to the result dataframe. We suggest filtering out events of very small duration and with less than 3 individuals (singletons and pairs). The calculated metrics are:
- mean_mean_nnd: the temporal average of the group’s average nearest neighbor distance
- mean_sd_nnd: the temporal average of the group’s standard deviation in nearest neighbor distance
-
sd_mean_nnd: the temporal standard deviation of the group’s
average nearest neighbor distance
- mean_pol: the average of the group’s polarization during the event
- sd_pol: the standard deviation of the group’s polarization during the event
- cv_speed: the CV (coefficient of variation) of the group’s average speed during the event
- mean_sd_front: the average standard deviation of the individuals’ frontness to their nearest neighbor during an event
- mean_mean_bangl: the temporal average of the group’s average angle
- mean_shape: the average shape index of the group during an event (0= perfectly wide and 1= perfectly long relative to the heading direction)
- sd_mean_shape: the standard deviation of the group’s average shape index during an event.