All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
TfUtf8CodePointIterator Class Referencefinal

Defines an iterator over a UTF-8 encoded string that extracts unicode code point values. More...

#include <unicodeUtils.h>

Classes

class  PastTheEndSentinel
 Model iteration ending when the underlying iterator's end condition has been met. More...
 

Public Types

using iterator_category = std::forward_iterator_tag
 
using value_type = TfUtf8CodePoint
 
using difference_type = std::ptrdiff_t
 
using pointer = void
 
using reference = TfUtf8CodePoint
 

Public Member Functions

 TfUtf8CodePointIterator (const std::string_view::const_iterator &it, const std::string_view::const_iterator &end)
 Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it.
 
value_type operator* () const
 Retrieves the current UTF-8 character in the sequence as its Unicode code point value.
 
std::string_view::const_iterator GetBase () const
 Retrieves the wrapped string iterator.
 
bool operator== (const TfUtf8CodePointIterator &rhs) const
 Determines if two iterators are equal.
 
bool operator!= (const TfUtf8CodePointIterator &rhs) const
 Determines if two iterators are unequal.
 
TfUtf8CodePointIteratoroperator++ ()
 Advances the iterator logically one UTF-8 character sequence in the string.
 
TfUtf8CodePointIterator operator++ (int)
 Advances the iterator logically one UTF-8 character sequence in the string.
 

Friends

bool operator== (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel)
 Checks if the lhs iterator is at or past the end for the underlying string_view
 
bool operator== (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs)
 
bool operator!= (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel rhs)
 
bool operator!= (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs)
 

Detailed Description

Defines an iterator over a UTF-8 encoded string that extracts unicode code point values.

UTF-8 is a variable length encoding, meaning that one Unicode code point can be encoded in UTF-8 as 1, 2, 3, or 4 bytes. This iterator takes care of consuming the valid UTF-8 bytes for a code point while incrementing.

Definition at line 99 of file unicodeUtils.h.


Class Documentation

◆ TfUtf8CodePointIterator::PastTheEndSentinel

class TfUtf8CodePointIterator::PastTheEndSentinel

Model iteration ending when the underlying iterator's end condition has been met.

Definition at line 109 of file unicodeUtils.h.

Member Typedef Documentation

◆ difference_type

using difference_type = std::ptrdiff_t

Definition at line 103 of file unicodeUtils.h.

◆ iterator_category

using iterator_category = std::forward_iterator_tag

Definition at line 101 of file unicodeUtils.h.

◆ pointer

using pointer = void

Definition at line 104 of file unicodeUtils.h.

◆ reference

Definition at line 105 of file unicodeUtils.h.

◆ value_type

Definition at line 102 of file unicodeUtils.h.

Constructor & Destructor Documentation

◆ TfUtf8CodePointIterator()

TfUtf8CodePointIterator ( const std::string_view::const_iterator &  it,
const std::string_view::const_iterator &  end 
)
inline

Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it.

end is used as a guard against reading byte sequences past the end of the source string.

When working with views of substrings, end must not point to a continuation byte in a valid UTF-8 byte sequence to avoid decoding errors.

Definition at line 118 of file unicodeUtils.h.

Member Function Documentation

◆ GetBase()

std::string_view::const_iterator GetBase ( ) const
inline

Retrieves the wrapped string iterator.

Definition at line 136 of file unicodeUtils.h.

◆ operator!=()

bool operator!= ( const TfUtf8CodePointIterator rhs) const
inline

Determines if two iterators are unequal.

This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.

Definition at line 154 of file unicodeUtils.h.

◆ operator*()

value_type operator* ( ) const
inline

Retrieves the current UTF-8 character in the sequence as its Unicode code point value.

Returns TfUtf8InvalidCodePoint when the byte sequence pointed to by the iterator cannot be decoded.

A code point might be invalid because it's incorrectly encoded, exceeds the maximum allowed value, or is in the disallowed surrogate range.

Definition at line 130 of file unicodeUtils.h.

◆ operator++() [1/2]

TfUtf8CodePointIterator & operator++ ( )
inline

Advances the iterator logically one UTF-8 character sequence in the string.

The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.

Definition at line 164 of file unicodeUtils.h.

◆ operator++() [2/2]

TfUtf8CodePointIterator operator++ ( int  )
inline

Advances the iterator logically one UTF-8 character sequence in the string.

The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.

Definition at line 196 of file unicodeUtils.h.

◆ operator==()

bool operator== ( const TfUtf8CodePointIterator rhs) const
inline

Determines if two iterators are equal.

This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.

Definition at line 145 of file unicodeUtils.h.

Friends And Related Function Documentation

◆ operator!= [1/2]

bool operator!= ( const TfUtf8CodePointIterator lhs,
PastTheEndSentinel  rhs 
)
friend

Definition at line 217 of file unicodeUtils.h.

◆ operator!= [2/2]

bool operator!= ( PastTheEndSentinel  lhs,
const TfUtf8CodePointIterator rhs 
)
friend

Definition at line 222 of file unicodeUtils.h.

◆ operator== [1/2]

bool operator== ( const TfUtf8CodePointIterator lhs,
PastTheEndSentinel   
)
friend

Checks if the lhs iterator is at or past the end for the underlying string_view

Definition at line 205 of file unicodeUtils.h.

◆ operator== [2/2]

bool operator== ( PastTheEndSentinel  lhs,
const TfUtf8CodePointIterator rhs 
)
friend

Definition at line 211 of file unicodeUtils.h.


The documentation for this class was generated from the following file: