Tokenize strings by character in a similar way as the strsplit
function in the base
package. The function can return a matrix of
tokenized items when index
is missing. If index
is given,
tokenized items in the selected position(s) are returned. See examples.
strtoken(x, split, index, ...)
A vector of character strings; non-character vectors are cast into characters.
A character to split the strings.
Numeric vector indicating which fields should be returned; if
missing or set to NULL
, a matrix containing all fields are returned.
Other parameters passed to strsplit
A matrix if index
is missing, NULL
, or contains more
than one integer indices; otherwise a character vector.
The main body of the function is modified from the
strsplit2
function in the limma
package.
myStr <- c("HSV\t1887", "FCB\t1900", "FCK\t1948")
strsplit(myStr, "\t")
#> [[1]]
#> [1] "HSV" "1887"
#>
#> [[2]]
#> [1] "FCB" "1900"
#>
#> [[3]]
#> [1] "FCK" "1948"
#>
strtoken(myStr, "\t")
#> [,1] [,2]
#> [1,] "HSV" "1887"
#> [2,] "FCB" "1900"
#> [3,] "FCK" "1948"
strtoken(myStr, "\t", index=1L)
#> [1] "HSV" "FCB" "FCK"
strtoken(myStr, "\t", index=2L)
#> [1] "1887" "1900" "1948"
myFac <- factor(myStr)
strtoken(myFac, "\t")
#> [,1] [,2]
#> [1,] "HSV" "1887"
#> [2,] "FCB" "1900"
#> [3,] "FCK" "1948"
strtoken(myFac, "\t", index=1L)
#> [1] "HSV" "FCB" "FCK"