The OSAT score is intended to ensure even distribution of samples across batches and is closely related to the chi-square test contingency table (Yan et al. (2012) doi:10.1186/1471-2164-13-689 ).

osat_score(bc, batch_vars, feature_vars, expected_dt = NULL, quiet = FALSE)

Arguments

bc

BatchContainer with samples or data.table/data.frame where every row is a location in a container and a sample in this location.

batch_vars

character vector with batch variable names to take into account for the score computation.

feature_vars

character vector with sample variable names to take into account for score computation.

expected_dt

A data.table with expected number of samples sample variables and batch variables combination. This is not required, however it does not change during the optimization process. So it is a good idea to cache this value.

quiet

Do not warn about NAs in feature columns.

Value

a list with two attributes: $score (numeric score value), $expected_dt

(expected counts data.table for reuse)

Examples

sample_assignment <- tibble::tribble(
  ~ID, ~SampleType, ~Sex, ~plate,
  1, "Case", "Female", 1,
  2, "Case", "Female", 1,
  3, "Case", "Male", 2,
  4, "Control", "Female", 2,
  5, "Control", "Female", 1,
  6, "Control", "Male", 2,
  NA, NA, NA, 1,
  NA, NA, NA, 2,
)

osat_score(sample_assignment,
  batch_vars = "plate",
  feature_vars = c("SampleType", "Sex")
)
#> Warning: NAs in features / batch columns; they will be excluded from scoring
#> $score
#> [1] 3
#> 
#> $expected_dt
#> Key: <plate, SampleType, Sex>
#>    plate SampleType    Sex .n_expected
#>    <num>     <char> <char>       <num>
#> 1:     1       Case Female         1.0
#> 2:     1       Case   Male         0.5
#> 3:     1    Control Female         1.0
#> 4:     1    Control   Male         0.5
#> 5:     2       Case Female         1.0
#> 6:     2       Case   Male         0.5
#> 7:     2    Control Female         1.0
#> 8:     2    Control   Male         0.5
#>