Primitive Type char

1.0.0

Expand description

A character type.

The char type represents a single character. More specifically, since ‘character’ isn’t a well-defined concept in Unicode, char is a ‘Unicode scalar value’.

This documentation describes a number of methods and trait implementations on the char type. For technical reasons, there is additional, separate documentation in the std::char module as well.

§Validity and Layout

A char is a ‘Unicode scalar value’, which is any ‘Unicode code point’ other than a surrogate code point. This has a fixed numerical definition: code points are in the range 0 to 0x10FFFF, inclusive. Surrogate code points, used by UTF-16, are in the range 0xD800 to 0xDFFF.

No char may be constructed, whether as a literal or at runtime, that is not a Unicode scalar value. Violating this rule causes undefined behavior.

// Each of these is a compiler error
['\u{D800}', '\u{DFFF}', '\u{110000}'];

// Panics; from_u32 returns None.
char::from_u32(0xDE01).unwrap();

// Undefined behavior
let _ = unsafe { char::from_u32_unchecked(0x110000) };

Unicode scalar values are also the exact set of values that may be encoded in UTF-8. Because char values are Unicode scalar values and functions may assume incoming str values are valid UTF-8, it is safe to store any char in a str or read any character from a str as a char.

The gap in valid char values is understood by the compiler, so in the below example the two ranges are understood to cover the whole range of possible char values and there is no error for a non-exhaustive match.

let c: char = 'a';
match c {
    '\0' ..= '\u{D7FF}' => false,
    '\u{E000}' ..= '\u{10FFFF}' => true,
};

All Unicode scalar values are valid char values, but not all of them represent a real character. Many Unicode scalar values are not currently assigned to a character, but may be in the future (“reserved”); some will never be a character (“noncharacters”); and some may be given different meanings by different users (“private use”).

char is guaranteed to have the same size, alignment, and function call ABI as u32 on all platforms.

use std::alloc::Layout;
assert_eq!(Layout::new::<char>(), Layout::new::<u32>());

§Representation

char is always four bytes in size. This is a different representation than a given character would have as part of a String. For example:

let v = vec!['h', 'e', 'l', 'l', 'o'];

// five elements times four bytes for each element
assert_eq!(20, v.len() * size_of::<char>());

let s = String::from("hello");

// five elements times one byte per element
assert_eq!(5, s.len() * size_of::<u8>());

As always, remember that a human intuition for ‘character’ might not map to Unicode’s definitions. For example, despite looking similar, the ‘é’ character is one Unicode code point while ‘é’ is two Unicode code points:

let mut chars = "é".chars();
// U+00e9: 'latin small letter e with acute'
assert_eq!(Some('\u{00e9}'), chars.next());
assert_eq!(None, chars.next());

let mut chars = "é".chars();
// U+0065: 'latin small letter e'
assert_eq!(Some('\u{0065}'), chars.next());
// U+0301: 'combining acute accent'
assert_eq!(Some('\u{0301}'), chars.next());
assert_eq!(None, chars.next());

This means that the contents of the first string above will fit into a char while the contents of the second string will not. Trying to create a char literal with the contents of the second string gives an error:

error: character literal may only contain one codepoint: 'é'
let c = 'é';
        ^^^

Another implication of the 4-byte fixed size of a char is that per-char processing can end up using a lot more memory:

let s = String::from("love: ❤️");
let v: Vec<char> = s.chars().collect();

assert_eq!(12, size_of_val(&s[..]));
assert_eq!(32, size_of_val(&v[..]));

Primitive Type char Copy item path

§Validity and Layout

§Representation

Implementations§

impl char

pub const MIN: char = '\0'

§Examples

pub const MAX: char = '\u{10ffff}'

§Examples

pub const MAX_LEN_UTF8: usize = 4usize

pub const MAX_LEN_UTF16: usize = 2usize

pub const REPLACEMENT_CHARACTER: char = '�'

pub fn decode_utf16<I: IntoIterator<Item = u16>>( iter: I, ) -> DecodeUtf16<I::IntoIter>

§Examples

pub const unsafe fn from_u32_unchecked(i: u32) -> char

§Safety

§Examples

pub const fn to_digit(self, radix: u32) -> Option<u32>

§Errors

§Panics

§Examples

pub const fn len_utf8(self) -> usize

§Examples

pub const fn encode_utf8(self, dst: &mut [u8]) -> &mut str

§Panics

§Examples

Trait Implementations§

impl Clone for char

fn clone(&self) -> Self

fn clone_from(&mut self, source: &Self)where Self:,

impl Default for char

fn default() -> char

impl Hash for char

fn hash<H: Hasher>(&self, state: &mut H)

fn hash_slice<H: Hasher>(data: &[Self], state: &mut H)where Self: Sized,

impl Ord for char

fn cmp(&self, other: &Self) -> Ordering

fn max(self, other: Self) -> Selfwhere Self: Sized,

fn min(self, other: Self) -> Selfwhere Self: Sized,

fn clamp(self, min: Self, max: Self) -> Selfwhere Self: Sized,

impl PartialEq for char

fn eq(&self, other: &Self) -> bool

fn ne(&self, other: &Self) -> bool

impl PartialOrd for char

fn partial_cmp(&self, other: &Self) -> Option<Ordering>

fn lt(&self, other: &Self) -> bool

fn le(&self, other: &Self) -> bool

fn gt(&self, other: &Self) -> bool

fn ge(&self, other: &Self) -> bool

impl ConstParamTy_ for char

impl Copy for char

impl Eq for char

impl StructuralPartialEq for char

Auto Trait Implementations§

impl Freeze for char

impl Send for char

impl Sync for char

impl Unpin for char

Blanket Implementations§

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Primitive Type char

fn clone_from(&mut self, source: &Self)
where Self:,

fn hash_slice<H: Hasher>(data: &[Self], state: &mut H)
where Self: Sized,

fn max(self, other: Self) -> Self
where Self: Sized,

fn min(self, other: Self) -> Self
where Self: Sized,

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,