char

Primitive Type char 

1.0.0
Expand description

A character type.

The char type represents a single character. More specifically, since ‘character’ isn’t a well-defined concept in Unicode, char is a ‘Unicode scalar value’.

This documentation describes a number of methods and trait implementations on the char type. For technical reasons, there is additional, separate documentation in the std::char module as well.

§Validity and Layout

A char is a ‘Unicode scalar value’, which is any ‘Unicode code point’ other than a surrogate code point. This has a fixed numerical definition: code points are in the range 0 to 0x10FFFF, inclusive. Surrogate code points, used by UTF-16, are in the range 0xD800 to 0xDFFF.

No char may be constructed, whether as a literal or at runtime, that is not a Unicode scalar value. Violating this rule causes undefined behavior.

// Each of these is a compiler error
['\u{D800}', '\u{DFFF}', '\u{110000}'];
// Panics; from_u32 returns None.
char::from_u32(0xDE01).unwrap();
// Undefined behavior
let _ = unsafe { char::from_u32_unchecked(0x110000) };

Unicode scalar values are also the exact set of values that may be encoded in UTF-8. Because char values are Unicode scalar values and functions may assume incoming str values are valid UTF-8, it is safe to store any char in a str or read any character from a str as a char.

The gap in valid char values is understood by the compiler, so in the below example the two ranges are understood to cover the whole range of possible char values and there is no error for a non-exhaustive match.

let c: char = 'a';
match c {
    '\0' ..= '\u{D7FF}' => false,
    '\u{E000}' ..= '\u{10FFFF}' => true,
};

All Unicode scalar values are valid char values, but not all of them represent a real character. Many Unicode scalar values are not currently assigned to a character, but may be in the future (“reserved”); some will never be a character (“noncharacters”); and some may be given different meanings by different users (“private use”).

char is guaranteed to have the same size, alignment, and function call ABI as u32 on all platforms.

use std::alloc::Layout;
assert_eq!(Layout::new::<char>(), Layout::new::<u32>());

§Representation

char is always four bytes in size. This is a different representation than a given character would have as part of a String. For example:

let v = vec!['h', 'e', 'l', 'l', 'o'];

// five elements times four bytes for each element
assert_eq!(20, v.len() * size_of::<char>());

let s = String::from("hello");

// five elements times one byte per element
assert_eq!(5, s.len() * size_of::<u8>());

As always, remember that a human intuition for ‘character’ might not map to Unicode’s definitions. For example, despite looking similar, the ‘é’ character is one Unicode code point while ‘é’ is two Unicode code points:

let mut chars = "é".chars();
// U+00e9: 'latin small letter e with acute'
assert_eq!(Some('\u{00e9}'), chars.next());
assert_eq!(None, chars.next());

let mut chars = "é".chars();
// U+0065: 'latin small letter e'
assert_eq!(Some('\u{0065}'), chars.next());
// U+0301: 'combining acute accent'
assert_eq!(Some('\u{0301}'), chars.next());
assert_eq!(None, chars.next());

This means that the contents of the first string above will fit into a char while the contents of the second string will not. Trying to create a char literal with the contents of the second string gives an error:

error: character literal may only contain one codepoint: 'é'
let c = 'é';
        ^^^

Another implication of the 4-byte fixed size of a char is that per-char processing can end up using a lot more memory:

let s = String::from("love: ❤️");
let v: Vec<char> = s.chars().collect();

assert_eq!(12, size_of_val(&s[..]));
assert_eq!(32, size_of_val(&v[..]));

Implementations§

Source§

impl char

1.83.0 · Source

pub const MIN: char = '\0'

The lowest valid code point a char can have, '\0'.

Unlike integer types, char actually has a gap in the middle, meaning that the range of possible chars is smaller than you might expect. Ranges of char will automatically hop this gap for you:

let dist = u32::from(char::MAX) - u32::from(char::MIN);
let size = (char::MIN..=char::MAX).count() as u32;
assert!(size < dist);

Despite this gap, the MIN and MAX values can be used as bounds for all char values.

§Examples
let c: char = something_which_returns_char();
assert!(char::MIN <= c);

let value_at_min = u32::from(char::MIN);
assert_eq!(char::from_u32(value_at_min), Some('\0'));
1.52.0 · Source

pub const MAX: char = '\u{10ffff}'

The highest valid code point a char can have, '\u{10FFFF}'.

Unlike integer types, char actually has a gap in the middle, meaning that the range of possible chars is smaller than you might expect. Ranges of char will automatically hop this gap for you:

let dist = u32::from(char::MAX) - u32::from(char::MIN);
let size = (char::MIN..=char::MAX).count() as u32;
assert!(size < dist);

Despite this gap, the MIN and MAX values can be used as bounds for all char values.

§Examples
let c: char = something_which_returns_char();
assert!(c <= char::MAX);

let value_at_max = u32::from(char::MAX);
assert_eq!(char::from_u32(value_at_max), Some('\u{10FFFF}'));
assert_eq!(char::from_u32(value_at_max + 1), None);
1.93.0 · Source

pub const MAX_LEN_UTF8: usize = 4usize

The maximum number of bytes required to encode a char to UTF-8 encoding.

1.93.0 · Source

pub const MAX_LEN_UTF16: usize = 2usize

The maximum number of two-byte units required to encode a char to UTF-16 encoding.

1.52.0 · Source

pub const REPLACEMENT_CHARACTER: char = '�'

U+FFFD REPLACEMENT CHARACTER (�) is used in Unicode to represent a decoding error.

It can occur, for example, when giving ill-formed UTF-8 bytes to String::from_utf8_lossy.

1.52.0 (const: 1.81.0) · Source

pub const unsafe fn from_u32_unchecked(i: u32) -> char

Converts a u32 to a char, ignoring validity.

Note that all chars are valid u32s, and can be cast to one with as:

let c = '💯';
let i = c as u32;

assert_eq!(128175, i);

However, the reverse is not true: not all valid u32s are valid chars. from_u32_unchecked() will ignore this, and blindly cast to char, possibly creating an invalid one.

§Safety

This function is unsafe, as it may construct invalid char values.

For a safe version of this function, see the from_u32 function.

§Examples

Basic usage:

let c = unsafe { char::from_u32_unchecked(0x2764) };

assert_eq!('❤', c);
1.0.0 (const: 1.67.0) · Source

pub const fn to_digit(self, radix: u32) -> Option<u32>

Converts a char to a digit in the given radix.

A ‘radix’ here is sometimes also called a ‘base’. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radices are supported.

‘Digit’ is defined to be only the following characters:

  • 0-9
  • a-z
  • A-Z
§Errors

Returns None if the char does not refer to a digit in the given radix.

§Panics

Panics if given a radix smaller than 2 or larger than 36.

§Examples

Basic usage:

assert_eq!('1'.to_digit(10), Some(1));
assert_eq!('f'.to_digit(16), Some(15));

Passing a non-digit results in failure:

assert_eq!('f'.to_digit(10), None);
assert_eq!('z'.to_digit(16), None);

Passing a large radix, causing a panic:

// this panics
let _ = '1'.to_digit(37);

Passing a small radix, causing a panic:

// this panics
let _ = '1'.to_digit(1);
1.0.0 (const: 1.52.0) · Source

pub const fn len_utf8(self) -> usize

Returns the number of bytes this char would need if encoded in UTF-8.

That number of bytes is always between 1 and 4, inclusive.

§Examples

Basic usage:

let len = 'A'.len_utf8();
assert_eq!(len, 1);

let len = 'ß'.len_utf8();
assert_eq!(len, 2);

let len = 'ℝ'.len_utf8();
assert_eq!(len, 3);

let len = '💣'.len_utf8();
assert_eq!(len, 4);

The &str type guarantees that its contents are UTF-8, and so we can compare the length it would take if each code point was represented as a char vs in the &str itself:

// as chars
let eastern = '東';
let capital = '京';

// both can be represented as three bytes
assert_eq!(3, eastern.len_utf8());
assert_eq!(3, capital.len_utf8());

// as a &str, these two are encoded in UTF-8
let tokyo = "東京";

let len = eastern.len_utf8() + capital.len_utf8();

// we can see that they take six bytes total...
assert_eq!(6, tokyo.len());

// ... just like the &str
assert_eq!(len, tokyo.len());

Trait Implementations§

1.0.0 (const: unstable) · Source§

impl Clone for char

Source§

fn clone(&self) -> Self

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)
where Self:,

Performs copy-assignment from source. Read more
1.0.0 (const: unstable) · Source§

impl Default for char

Source§

fn default() -> char

Returns the default value of \x00

1.0.0 (const: unstable) · Source§

impl Ord for char

Source§

fn cmp(&self, other: &Self) -> Ordering

This method returns an Ordering between self and other. Read more
1.21.0 · Source§

fn max(self, other: Self) -> Self
where Self: Sized,

Compares and returns the maximum of two values. Read more
1.21.0 · Source§

fn min(self, other: Self) -> Self
where Self: Sized,

Compares and returns the minimum of two values. Read more
1.50.0 · Source§

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

Restrict a value to a certain interval. Read more
1.0.0 (const: unstable) · Source§

impl PartialEq for char

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
Source§

fn ne(&self, other: &Self) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
1.0.0 (const: unstable) · Source§

impl PartialOrd for char

Source§

fn partial_cmp(&self, other: &Self) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
Source§

fn lt(&self, other: &Self) -> bool

Tests less than (for self and other) and is used by the < operator. Read more
Source§

fn le(&self, other: &Self) -> bool

Tests less than or equal to (for self and other) and is used by the <= operator. Read more
Source§

fn gt(&self, other: &Self) -> bool

Tests greater than (for self and other) and is used by the > operator. Read more
Source§

fn ge(&self, other: &Self) -> bool

Tests greater than or equal to (for self and other) and is used by the >= operator. Read more
Source§

impl ConstParamTy_ for char

1.0.0 · Source§

impl Copy for char

1.0.0 (const: unstable) · Source§

impl Eq for char

Source§

impl StructuralPartialEq for char

Auto Trait Implementations§

§

impl Freeze for char

§

impl Send for char

§

impl Sync for char

§

impl Unpin for char

Blanket Implementations§

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.