Unlike specifying the
encoding argument in
as_character(), which is only declarative, these functions
actually attempt to convert the encoding of their input. There are
two possible cases:
The string is tagged as UTF-8 or latin1, the only two encodings
for which R has specific support. In this case, converting to the
same encoding is a no-op, and converting to native always works
as expected, as long as the native encoding, the one specified by
LC_CTYPE locale (see
mut_utf8_locale()) has support for
all characters occurring in the strings. Unrepresentable
characters are serialised as unicode points: "<U+xxxx>".
The string is not tagged. R assumes that it is encoded in the native encoding. Conversion to native is a no-op, and conversion to UTF-8 should work as long as the string is actually encoded in the locale codeset.
as_utf8_character(x) as_native_character(x) as_utf8_string(x) as_native_string(x)
An object to coerce.
# Let's create a string marked as UTF-8 (which is guaranteed by the # Unicode escaping in the string): utf8 <- "caf\uE9" str_encoding(utf8)#>  "UTF-8"as_bytes(utf8)#>  63 61 66 c3 a9