Skip to contents

This vignette provides more details about the individual methods of the MRPWorkflow object, including their purposes, input requirements, and caveats. For a general overview, please refer to the “Getting started with shinymrp” vignette.

Initializing the workflow

The process begins with creating an MRPWorkflow object using the mrp_workflow function. The object-oriented design allows the object to manage the essential data under the hood and offer a clean interface that does not require the users to input the same piece of information repeatedly.

workflow <- mrp_workflow()

Data preparation

The goal of this stage is to produce two essential pieces of data:

  • Sample data: The analysis sample that includes the outcome of interest (either binary or continuous ) and predictors, such as the COVID test records and survey sample results.
  • Poststratification data: The table containing sizes of groups in the target population defined by the demographic and geographic factors.

Use cases

sample_data <- example_sample_data(
  is_timevar = TRUE,
  is_aggregated = TRUE,
  special_case = NULL,
  family = "binomial"
)

Individual-level vs. Aggregated Data

The interface accepts data in two formats:

  • Individual-level: Each row contains information for on individual.
  • Aggregated: Each row contains information for one group (e.g., White males aged 18-30 in Michigan), with geographic-demographic factors, total numbers of individuals, and summary of outcomes.

Data with continuous outcome measures are expected only at individual-level. For data with binary outcome measures, the aggregated format is preferred for computational benefits. Individual-level data will be automatically aggregated upon upload. Data requirements vary slightly between formats, mainly regarding the outcome measure.

Required Columns and Categories The application screens input data using a specific naming convention. Here’s a list of the expected columns and their values (case-insensitive):

  • Sex: male, female
  • Race: Black, White, other
  • Age
  • Education attainment (edu): below high school (no hs), high school (hs), some college, 4-year college, post-grad
  • ZIP code 1
  • County 2
  • State 3
  • Week indices (time) 4
  • Date
  • Continuous outcome measure (outcome) 5
  • Positive response indicator or number of positive responses (positive) 6
  • Cross-tabulation cell counts (total) 7
  • Survey weights (weight) 8
workflow$preprocess(
  sample_data,
  is_timevar = TRUE,
  is_aggregated = TRUE,
  special_case = NULL,
  family = "binomial"
)
#> Preprocessing sample data...

Linking data to the ACS

workflow$link_acs(link_geo = "zip", acs_year = 2021)
#> Linking data to the ACS...

Load custom poststratification data

pstrat_data <- example_pstrat_data()
workflow$load_pstrat(pstrat_data, is_aggregated = TRUE)
#> Preprocessing poststratification data...

Visualizing descriptive statistics

workflow$demo_bars(demo = "sex")

workflow$sample_size_map()
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
workflow$outcome_plot()

workflow$outcome_map()

Model fitting and diagnostics

Model specification

model1 <- workflow$create_model(
  model_spec = list(
    Intercept = list(
      Intercept = ""
    ),
    fixed = list(
      sex = "",
      race = ""
    ),
    varying = list(
      age = "",
      time = ""
    )
  )
)
model2 <- workflow$create_model(
  model_spec = list(
    Intercept = list(
      Intercept = ""
    ),
    fixed = list(
      sex = "",
      race = ""
    ),
    varying = list(
      age = "",
      time = ""
    ),
    interaction = list(
      `age:time` = "",
      `race:time` = "",
      `sex:time` = ""
    )
  )
)

Run MCMC

model1$fit(n_iter = 1000, n_chains = 2, seed = 123)
#> Running MCMC with 2 parallel chains, with 1 thread(s) per chain...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 finished in 2.1 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 2.3 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 2.2 seconds.
#> Total execution time: 2.5 seconds.
model2$fit(n_iter = 1000, n_chains = 2, seed = 123)
#> Running MCMC with 2 parallel chains, with 1 thread(s) per chain...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 4.0 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 4.1 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 4.1 seconds.
#> Total execution time: 4.2 seconds.

Diagnostics

model1$diagnostics()
#>               Metric
#> 1         Divergence
#> 2 Maximum tree depth
#> 3             E-BFMI
#>                                                        Message
#> 1         0 of 1000 (0.0%) transitions ended with a divergence
#> 2 0 of 1000 transitions hit the maximum tree depth limit of 10
#> 3                    0 of 2 chains had an E-BFMI less than 0.3
model1$summary()
#> $fixed
#>              Estimate Est.Error    l-95% CI  u-95% CI    R-hat  Bulk_ESS
#> Intercept  -0.1866930 0.2117891 -0.59747478 0.2586310 1.000951  373.0936
#> sex.male    0.2863758 0.1349500  0.02541517 0.5568895 1.004192 1132.3139
#> race.black         NA        NA          NA        NA       NA        NA
#> race.other  0.7895794 0.1662696  0.47458995 1.1313765 1.002426  717.0990
#> race.white  0.8447851 0.1643350  0.53300762 1.1576902 1.004416  676.1751
#>            Tail_ESS
#> Intercept  440.6858
#> sex.male   871.7198
#> race.black       NA
#> race.other 766.9965
#> race.white 526.8130
#> 
#> $varying
#>                   Estimate Est.Error   l-95% CI  u-95% CI    R-hat Bulk_ESS
#> age (intercept)  0.2559446 0.1914442 0.02188407 0.7371756 1.012975 273.8812
#> time (intercept) 0.2770072 0.1287866 0.04232063 0.5839267 1.011389 283.4464
#>                  Tail_ESS
#> age (intercept)  234.0232
#> time (intercept) 255.2071
#> 
#> $other
#> data frame with 0 columns and 0 rows
workflow$pp_check(model1)
#> Running posterior predictive check...

model2$diagnostics()
#>               Metric
#> 1         Divergence
#> 2 Maximum tree depth
#> 3             E-BFMI
#>                                                        Message
#> 1         0 of 1000 (0.0%) transitions ended with a divergence
#> 2 0 of 1000 transitions hit the maximum tree depth limit of 10
#> 3                    0 of 2 chains had an E-BFMI less than 0.3
model2$summary()
#> $fixed
#>              Estimate Est.Error    l-95% CI  u-95% CI    R-hat Bulk_ESS
#> Intercept  -0.5214502 0.4191023 -1.43555700 0.1838107 1.003030 251.0171
#> sex.male    0.6115240 0.3541460  0.02897347 1.3808167 1.005676 207.0159
#> race.black         NA        NA          NA        NA       NA       NA
#> race.other  0.8247892 0.1998794  0.46095992 1.2296932 1.001261 888.3374
#> race.white  0.8536937 0.1894617  0.50019513 1.2313683 0.999741 774.0723
#>            Tail_ESS
#> Intercept  345.7252
#> sex.male   417.8717
#> race.black       NA
#> race.other 602.0792
#> race.white 673.4264
#> 
#> $varying
#>                        Estimate Est.Error    l-95% CI  u-95% CI    R-hat
#> age (intercept)       0.2582248 0.2241035 0.020979727 0.8445295 0.998914
#> time (intercept)      0.1762351 0.1212142 0.012252043 0.4545369 1.006875
#> age:time (intercept)  0.2030316 0.1291391 0.008395438 0.4855221 1.002693
#> race:time (intercept) 0.2018816 0.1226739 0.013140893 0.4657229 1.016398
#> sex:time (intercept)  0.4122678 0.2068856 0.051697240 0.8969206 1.010689
#>                       Bulk_ESS Tail_ESS
#> age (intercept)       264.0830 418.8403
#> time (intercept)      295.6898 450.1611
#> age:time (intercept)  271.1361 412.1944
#> race:time (intercept) 221.8008 284.8612
#> sex:time (intercept)  206.5233 156.3357
#> 
#> $other
#> data frame with 0 columns and 0 rows
workflow$pp_check(model2)
#> Running posterior predictive check...

workflow$compare_models(model1, model2)
#> Running leave-one-out cross-validation...
#> Running leave-one-out cross-validation...
#>        elpd_diff  se_diff
#> model2  0.000000 0.000000
#> model1 -1.368539 2.503422

Saving and loading models

workflow$save_model(model1, file = "model1.qs")
model1 <- workflow$load_model("model1.qs")

Visualize estimates

workflow$estimate_plot(model1, group = "sex")
#> Running post-stratification...

workflow$estimate_map(model1, geo = "county")