designit: a flexible engine to generate experiment layouts

Introduction

Examples in this vignette are used were used in our presentation.

It uses a subset of the longitudinal_subject_samples dataset.

data("longitudinal_subject_samples")

dat <- longitudinal_subject_samples |> 
  filter(Group %in% 1:5, Week %in% c(1, 4)) |>
  select(SampleID, SubjectID, Group, Sex, Week)

# for simplicity: remove two subjects that don't have both visits
dat <- dat |>
  filter(SubjectID %in%
    (dat |> count(SubjectID) |> filter(n == 2) |> pull(SubjectID)))


subject_data <- dat |>
  select(SubjectID, Group, Sex) |>
  unique()

Batch effects matter

Here’s an example of plate effect. Here both top and bottom rows of the plate are used as controls.

This is the experiment design:

These are the readouts:

Due to the plate effect, the control rows are affected differently. It is virtually impossible to normalize readouts in a meaningful way.

Go fully random?

Could it be sufficient to randomly distribute samples across batches?
Not necessarily!
- Often sample sizes are too small to avoid grouping by change
- Experimental constraints might not allow for a fully random layout

Gone wrong: Random distribution of 31 grouped subjects into 3 batches turns out unbalanced:

“Block what you can and randomize what you cannot.” (G. Box, 1978)

designit

To avoid batch or gradient effects in complex experiments, designit is an R package that offers flexible ways to allocate a given set of samples to experiment layouts. It’s strength is that it implements a very general framework that can easily be customized and extended to fit specific constrained layouts.

Data structure: BatchContainer class
- R6 object storing:
  - Experiment dimensions (cages, plates…)
  - Sample annotation
  - Scoring functions for sample distribution
Main function: optimize_design()
- Optimizes the layout with user defined
  - Scores for sample distribution
  - Optimization protocols
  - Sample shuffling functions
- Returns improved design and optimization trace

Sample Batching

Setup

Assign 31 samples to 3 equally sized batches
Balance by:
- treatment group (higher priority)
- sex (lower priority)

bc <- BatchContainer$new(
  dimensions = list("batch" = 3, "location" = 11)
) |>
  assign_random(subject_data)

Batch composition before optimization

bc$get_samples()

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P32	5	M
1	3	P10	3	F
...	...	...	...	...
3	9	P31	3	F
3	10	P33	5	M
3	11	P24	5	F

Optimization

Assign 31 samples to 3 equally sized batches
Balance by:
- treatment group (higher priority)
- sex (lower priority)

bc <- optimize_design(
  bc,
  scoring = list(
    group = osat_score_generator(
      batch_vars = "batch",
      feature_vars = "Group"
    ),
    sex = osat_score_generator(
      batch_vars = "batch",
      feature_vars = "Sex"
    )
  ),
  n_shuffle = 1,
  acceptance_func =
    ~ accept_leftmost_improvement(..., tolerance = 0.01),
  max_iter = 150,
  quiet = TRUE
)

Batch composition after optimization

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P01	1	F
1	3	P10	3	F
...	...	...	...	...
3	9	P29	5	F
3	10	P33	5	M
3	11	P12	3	F

Plate layouts

Continuous confounding

Assays are often performed in well plates (24, 96, 384)

Observed effects

Edge effects (bad plate sealing)
Gradients (non-equal temperature distribution)
Row / column effects (pipetting issues)

Since plate effects often cannot be avoided, we aim to distribute sample groups of interest evenly across the plate and adjust for the effect computationally.

Setup

Assume previous batches are 24-well plates
Within plate optimization & across plate blocking
Balanced by:
- treatment group (higher priority)
- sex (lower priority)

set.seed(4)

bc <- BatchContainer$new(
  dimensions = list("plate" = 3, "row" = 4, "col" = 6)
) |>
  assign_in_order(dat)

plot_plate(bc,
  plate = plate, row = row, column = col,
  .color = Group, title = "Initial layout by Group"
)
plot_plate(bc,
  plate = plate, row = row, column = col,
  .color = Sex, title = "Initial layout by Sex"
)

2-step optimization

Across plate optimization using osat score as before

bc1 <- optimize_design(
  bc,
  scoring = list(
    group = osat_score_generator(
      batch_vars = "plate",
      feature_vars = "Group"
    ),
    sex = osat_score_generator(
      batch_vars = "plate",
      feature_vars = "Sex"
    )
  ),
  n_shuffle = 1,
  acceptance_func =
    ~ accept_leftmost_improvement(..., tolerance = 0.01),
  max_iter = 150,
  quiet = TRUE
)

Within plate optimization using distance based sample scoring function

bc2 <- optimize_design(
  bc1,
  scoring = mk_plate_scoring_functions(
    bc1,
    plate = "plate", row = "row", column = "col",
    group = "Group"
  ),
  shuffle_proposal_func = shuffle_with_constraints(dst = plate == .src$plate),
  max_iter = 150,
  quiet = TRUE
)

2-step optimization `multi_plate_layout()`

We are performing the same optimization as before, but using the multi_plate_layout() function to combine the two steps.

bc <- optimize_multi_plate_design(
  bc,
  across_plates_variables = c("Group", "Sex"),
  within_plate_variables = c("Group"),
  plate = "plate", row = "row", column = "col",
  n_shuffle = 2,
  max_iter = 500 # 2000
)
#> 1 ... 2 ... 3 ...

#> Warning: Removed 4509 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 4509 rows containing missing values or values outside the scale range
#> (`geom_point()`).

Glimpse on more complex application

Goal:

Assign 3 treatment conditions to 59 animals, representing 2 relevant strains
Avoid confounding by sex, weight and age

Constraints:

Cages host ideally 3 animals (preferably 2-5)
Strain, Sex and Treatment must be homogeneous within a cage
Don’t put males from different litters same cage; litter mixing is possible for females!
Average weight and age composition comparable between treatment groups and cages
Avoid animals with identical ear markings in same cage (if possible)
Treatment distribution across animal subgroups (if specified) has to be respected

see vignette invivo_study_design for the full story.

Conclusion

designit aims to be general and adaptable
- One framework to address simple batching as well as complex multi-step procedures
- Easy add-ons: custom scoring-functions, acceptance-criteria and shuffling-procedures can be passed to optimize_design by the user
Includes functions and vignettes for frequently used layouts such as plates.

Acknowledgements

Martha Serrano
Sabine Wilson
David Jitao Zhang
Fabian Birzele
PMDA group for feedback?

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P32	5	M
1	3	P10	3	F
...	...	...	...	...
3	9	P31	3	F
3	10	P33	5	M
3	11	P24	5	F

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P01	1	F
1	3	P10	3	F
...	...	...	...	...
3	9	P29	5	F
3	10	P33	5	M
3	11	P12	3	F

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P32	5	M
1	3	P10	3	F
...	...	...	...	...
3	9	P31	3	F
3	10	P33	5	M
3	11	P24	5	F

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P01	1	F
1	3	P10	3	F
...	...	...	...	...
3	9	P29	5	F
3	10	P33	5	M
3	11	P12	3	F

Juliane Siebourg-Polster, Iakov Davydov, Guido Steiner, Balazs Banfai

Introduction

Batch effects matter

Go fully random?

designit

Sample Batching

Setup

Optimization

Plate layouts

Continuous confounding

Setup

2-step optimization

Across plate optimization using osat score as before

Within plate optimization using distance based sample scoring function

2-step optimization multi_plate_layout()

Glimpse on more complex application

Conclusion

2-step optimization `multi_plate_layout()`

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P32	5	M
1	3	P10	3	F
...	...	...	...	...
3	9	P31	3	F
3	10	P33	5	M
3	11	P24	5	F

batch	location	SubjectID	Group	Sex
1	1	NA	NA	NA
1	2	P01	1	F
1	3	P10	3	F
...	...	...	...	...
3	9	P29	5	F
3	10	P33	5	M
3	11	P12	3	F