Skip to contents

The $preprocess() method runs the preprocessing pipeline that includes data standardization, filtering, imputation, and aggregation. See the More on data preparation vignette for more information about data processing. For usage examples, refer to the More examples of R6 classes vignette.

Usage

preprocess(
  data,
  is_timevar = FALSE,
  is_aggregated = FALSE,
  special_case = NULL,
  family = NULL,
  time_freq = NULL,
  freq_threshold = 0
)

Arguments

data

An object of class data.frame (or one that can be coerced to that class) that satisfies the requirements specified in the More on data preparation vignette.

is_timevar

Logical indicating whether the data contains time-varying components.

is_aggregated

Logical indicating whether the data is already aggregated.

special_case

Character string specifying special case handling. Options are NULL (the default), "covid", and "poll".

family

Character string specifying the distribution family for the outcome variable. Options are "binomial" for binary outcome measures and "normal" for continuous outcome measures.

time_freq

Character string specifying the time indexing frequency or time length for grouping dates (YYYY-MM-DD) in the data. Options are NULL (the default), "week", "month", and "year". This parameter must be NULL for cross-sectional data or time-varying data that already has time indices.

freq_threshold

Numeric value specifying the minimum frequency threshold for including observations. Values with lower frequency will cause the entire row to be removed. The default value is 0 (no filtering).

Value

No return value, called for side effects.