Unlike specifying the encoding argument in as_string() and as_character(), which is only declarative, these functions actually attempt to convert the encoding of their input. There are two possible cases:

  • The string is tagged as UTF-8 or latin1, the only two encodings for which R has specific support. In this case, converting to the same encoding is a no-op, and converting to native always works as expected, as long as the native encoding, the one specified by the LC_CTYPE locale (see mut_utf8_locale()) has support for all characters occurring in the strings. Unrepresentable characters are serialised as unicode points: "<U+xxxx>".

  • The string is not tagged. R assumes that it is encoded in the native encoding. Conversion to native is a no-op, and conversion to UTF-8 should work as long as the string is actually encoded in the locale codeset.

as_utf8_character(x)

as_native_character(x)

as_utf8_string(x)

as_native_string(x)

Arguments

x

An object to coerce.

Examples

# Let's create a string marked as UTF-8 (which is guaranteed by the # Unicode escaping in the string): utf8 <- "caf\uE9" str_encoding(utf8)
#> [1] "UTF-8"
as_bytes(utf8)
#> [1] 63 61 66 c3 a9
# It can then be converted to a native encoding, that is, the # encoding specified in the current locale: not_run({ mut_latin1_locale() latin1 <- as_native_string(utf8) str_encoding(latin1) as_bytes(latin1) })