|
Defines an iterator over a UTF-8 encoded string that extracts unicode code point values. More...
#include <unicodeUtils.h>
Classes | |
class | PastTheEndSentinel |
Model iteration ending when the underlying iterator's end condition has been met. More... | |
Public Types | |
using | iterator_category = std::forward_iterator_tag |
using | value_type = TfUtf8CodePoint |
using | difference_type = std::ptrdiff_t |
using | pointer = void |
using | reference = TfUtf8CodePoint |
Public Member Functions | |
TfUtf8CodePointIterator (const std::string_view::const_iterator &it, const std::string_view::const_iterator &end) | |
Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it. | |
value_type | operator* () const |
Retrieves the current UTF-8 character in the sequence as its Unicode code point value. | |
std::string_view::const_iterator | GetBase () const |
Retrieves the wrapped string iterator. | |
bool | operator== (const TfUtf8CodePointIterator &rhs) const |
Determines if two iterators are equal. | |
bool | operator!= (const TfUtf8CodePointIterator &rhs) const |
Determines if two iterators are unequal. | |
TfUtf8CodePointIterator & | operator++ () |
Advances the iterator logically one UTF-8 character sequence in the string. | |
TfUtf8CodePointIterator | operator++ (int) |
Advances the iterator logically one UTF-8 character sequence in the string. | |
Friends | |
bool | operator== (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel) |
Checks if the lhs iterator is at or past the end for the underlying string_view | |
bool | operator== (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs) |
bool | operator!= (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel rhs) |
bool | operator!= (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs) |
Defines an iterator over a UTF-8 encoded string that extracts unicode code point values.
UTF-8 is a variable length encoding, meaning that one Unicode code point can be encoded in UTF-8 as 1, 2, 3, or 4 bytes. This iterator takes care of consuming the valid UTF-8 bytes for a code point while incrementing.
Definition at line 99 of file unicodeUtils.h.
class TfUtf8CodePointIterator::PastTheEndSentinel |
Model iteration ending when the underlying iterator's end condition has been met.
Definition at line 109 of file unicodeUtils.h.
using difference_type = std::ptrdiff_t |
Definition at line 103 of file unicodeUtils.h.
using iterator_category = std::forward_iterator_tag |
Definition at line 101 of file unicodeUtils.h.
using pointer = void |
Definition at line 104 of file unicodeUtils.h.
using reference = TfUtf8CodePoint |
Definition at line 105 of file unicodeUtils.h.
using value_type = TfUtf8CodePoint |
Definition at line 102 of file unicodeUtils.h.
|
inline |
Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it.
end is used as a guard against reading byte sequences past the end of the source string.
When working with views of substrings, end must not point to a continuation byte in a valid UTF-8 byte sequence to avoid decoding errors.
Definition at line 118 of file unicodeUtils.h.
|
inline |
Retrieves the wrapped string iterator.
Definition at line 136 of file unicodeUtils.h.
|
inline |
Determines if two iterators are unequal.
This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.
Definition at line 154 of file unicodeUtils.h.
|
inline |
Retrieves the current UTF-8 character in the sequence as its Unicode code point value.
Returns TfUtf8InvalidCodePoint
when the byte sequence pointed to by the iterator cannot be decoded.
A code point might be invalid because it's incorrectly encoded, exceeds the maximum allowed value, or is in the disallowed surrogate range.
Definition at line 130 of file unicodeUtils.h.
|
inline |
Advances the iterator logically one UTF-8 character sequence in the string.
The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.
Definition at line 164 of file unicodeUtils.h.
|
inline |
Advances the iterator logically one UTF-8 character sequence in the string.
The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.
Definition at line 196 of file unicodeUtils.h.
|
inline |
Determines if two iterators are equal.
This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.
Definition at line 145 of file unicodeUtils.h.
|
friend |
Definition at line 217 of file unicodeUtils.h.
|
friend |
Definition at line 222 of file unicodeUtils.h.
|
friend |
Checks if the lhs
iterator is at or past the end for the underlying string_view
Definition at line 205 of file unicodeUtils.h.
|
friend |
Definition at line 211 of file unicodeUtils.h.