The OSAT score is intended to ensure even distribution of samples across batches and is closely related to the chi-square test contingency table (Yan et al. (2012) doi:10.1186/1471-2164-13-689 ).
osat_score(bc, batch_vars, feature_vars, expected_dt = NULL, quiet = FALSE)
BatchContainer with samples
or data.table
/data.frame where every row is a location
in a container and a sample in this location.
character vector with batch variable names to take into account for the score computation.
character vector with sample variable names to take into account for score computation.
A data.table
with expected number of samples sample
variables and batch variables combination. This is not required, however it does not change
during the optimization process. So it is a good idea to cache this value.
Do not warn about NA
s in feature columns.
a list with two attributes: $score
(numeric score value), $expected_dt
(expected counts data.table
for reuse)
sample_assignment <- tibble::tribble(
~ID, ~SampleType, ~Sex, ~plate,
1, "Case", "Female", 1,
2, "Case", "Female", 1,
3, "Case", "Male", 2,
4, "Control", "Female", 2,
5, "Control", "Female", 1,
6, "Control", "Male", 2,
NA, NA, NA, 1,
NA, NA, NA, 2,
)
osat_score(sample_assignment,
batch_vars = "plate",
feature_vars = c("SampleType", "Sex")
)
#> Warning: NAs in features / batch columns; they will be excluded from scoring
#> $score
#> [1] 3
#>
#> $expected_dt
#> Key: <plate, SampleType, Sex>
#> plate SampleType Sex .n_expected
#> <num> <char> <char> <num>
#> 1: 1 Case Female 1.0
#> 2: 1 Case Male 0.5
#> 3: 1 Control Female 1.0
#> 4: 1 Control Male 0.5
#> 5: 2 Case Female 1.0
#> 6: 2 Case Male 0.5
#> 7: 2 Control Female 1.0
#> 8: 2 Control Male 0.5
#>