Defines an iterator over a UTF-8 encoded string that extracts unicode code point values. More...

#include <unicodeUtils.h>

Classes
class	PastTheEndSentinel
	Model iteration ending when the underlying iterator's end condition has been met. More...

Public Types
using	iterator_category = std::forward_iterator_tag

using	value_type = TfUtf8CodePoint

using	difference_type = std::ptrdiff_t

using	pointer = void

using	reference = TfUtf8CodePoint

Public Member Functions
	TfUtf8CodePointIterator (const std::string_view::const_iterator &it, const std::string_view::const_iterator &end)
	Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it.

value_type	operator* () const
	Retrieves the current UTF-8 character in the sequence as its Unicode code point value.

std::string_view::const_iterator	GetBase () const
	Retrieves the wrapped string iterator.

bool	operator== (const TfUtf8CodePointIterator &rhs) const
	Determines if two iterators are equal.

bool	operator!= (const TfUtf8CodePointIterator &rhs) const
	Determines if two iterators are unequal.

TfUtf8CodePointIterator &	operator++ ()
	Advances the iterator logically one UTF-8 character sequence in the string.

TfUtf8CodePointIterator	operator++ (int)
	Advances the iterator logically one UTF-8 character sequence in the string.

Friends
bool	operator== (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel)
	Checks if the `lhs` iterator is at or past the end for the underlying `string_view`

bool	operator== (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs)

bool	operator!= (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel rhs)

bool	operator!= (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs)

Detailed Description

Defines an iterator over a UTF-8 encoded string that extracts unicode code point values.

UTF-8 is a variable length encoding, meaning that one Unicode code point can be encoded in UTF-8 as 1, 2, 3, or 4 bytes. This iterator takes care of consuming the valid UTF-8 bytes for a code point while incrementing.

Definition at line 99 of file unicodeUtils.h.

Class Documentation

◆ TfUtf8CodePointIterator::PastTheEndSentinel

class TfUtf8CodePointIterator::PastTheEndSentinel

Model iteration ending when the underlying iterator's end condition has been met.

Definition at line 109 of file unicodeUtils.h.

Member Typedef Documentation

◆ difference_type

using difference_type = std::ptrdiff_t

Definition at line 103 of file unicodeUtils.h.

◆ iterator_category

using iterator_category = std::forward_iterator_tag

Definition at line 101 of file unicodeUtils.h.

◆ pointer

using pointer = void

Definition at line 104 of file unicodeUtils.h.

◆ reference

using reference = TfUtf8CodePoint

Definition at line 105 of file unicodeUtils.h.

◆ value_type

using value_type = TfUtf8CodePoint

Definition at line 102 of file unicodeUtils.h.

Constructor & Destructor Documentation

◆ TfUtf8CodePointIterator()

TfUtf8CodePointIterator	(	const std::string_view::const_iterator &	it,
		const std::string_view::const_iterator &	end
	)

inline

Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it.

end is used as a guard against reading byte sequences past the end of the source string.

When working with views of substrings, end must not point to a continuation byte in a valid UTF-8 byte sequence to avoid decoding errors.

Definition at line 118 of file unicodeUtils.h.

Member Function Documentation

◆ GetBase()

std::string_view::const_iterator GetBase ( ) const

inline

Retrieves the wrapped string iterator.

Definition at line 136 of file unicodeUtils.h.

◆ operator!=()

bool operator!= ( const TfUtf8CodePointIterator & rhs ) const

inline

Determines if two iterators are unequal.

This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.

Definition at line 154 of file unicodeUtils.h.

◆ operator*()

value_type operator* ( ) const

inline

Retrieves the current UTF-8 character in the sequence as its Unicode code point value.

Returns TfUtf8InvalidCodePoint when the byte sequence pointed to by the iterator cannot be decoded.

A code point might be invalid because it's incorrectly encoded, exceeds the maximum allowed value, or is in the disallowed surrogate range.

Definition at line 130 of file unicodeUtils.h.

◆ operator++() [1/2]

TfUtf8CodePointIterator & operator++ ( )

inline

Advances the iterator logically one UTF-8 character sequence in the string.

The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.

Definition at line 164 of file unicodeUtils.h.

◆ operator++() [2/2]

TfUtf8CodePointIterator operator++ ( int )

inline

Advances the iterator logically one UTF-8 character sequence in the string.

The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.

Definition at line 196 of file unicodeUtils.h.

◆ operator==()

bool operator== ( const TfUtf8CodePointIterator & rhs ) const

inline

Determines if two iterators are equal.

This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.

Definition at line 145 of file unicodeUtils.h.

Friends And Related Function Documentation

◆ operator!= [1/2]

bool operator!=	(	const TfUtf8CodePointIterator &	lhs,
		PastTheEndSentinel	rhs
	)

friend

Definition at line 217 of file unicodeUtils.h.

◆ operator!= [2/2]

bool operator!=	(	PastTheEndSentinel	lhs,
		const TfUtf8CodePointIterator &	rhs
	)

friend

Definition at line 222 of file unicodeUtils.h.

◆ operator== [1/2]

bool operator==	(	const TfUtf8CodePointIterator &	lhs,
		PastTheEndSentinel
	)

friend

Checks if the lhs iterator is at or past the end for the underlying string_view

Definition at line 205 of file unicodeUtils.h.

◆ operator== [2/2]

bool operator==	(	PastTheEndSentinel	lhs,
		const TfUtf8CodePointIterator &	rhs
	)

friend

Definition at line 211 of file unicodeUtils.h.

The documentation for this class was generated from the following file:

pxr/base/tf/unicodeUtils.h

Classes

Public Types

Public Member Functions

Friends

Detailed Description

Class Documentation

◆ TfUtf8CodePointIterator::PastTheEndSentinel

Member Typedef Documentation

◆ difference_type

◆ iterator_category

◆ pointer

◆ reference

◆ value_type

Constructor & Destructor Documentation

◆ TfUtf8CodePointIterator()

Member Function Documentation

◆ GetBase()

◆ operator!=()

◆ operator*()

◆ operator++() [1/2]

◆ operator++() [2/2]

◆ operator==()

Friends And Related Function Documentation

◆ operator!= [1/2]

◆ operator!= [2/2]

◆ operator== [1/2]

◆ operator== [2/2]