All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
unicodeUtils.h File Reference

Definitions of basic UTF-8 utilities in tf. More...

+ Include dependency graph for unicodeUtils.h:

Go to the source code of this file.

Classes

class  TfUtf8CodePoint
 Wrapper for a 32-bit code point value that can be encoded as UTF-8. More...
 
class  TfUtf8CodePointIterator
 Defines an iterator over a UTF-8 encoded string that extracts unicode code point values. More...
 
class  TfUtf8CodePointIterator::PastTheEndSentinel
 Model iteration ending when the underlying iterator's end condition has been met. More...
 
class  TfUtf8CodePointView
 Wrapper for a UTF-8 encoded std::string_view that can be iterated over as code points instead of bytes. More...
 

Functions

TF_API std::ostream & operator<< (std::ostream &, const TfUtf8CodePoint)
 
constexpr TfUtf8CodePoint TfUtf8CodePointFromAscii (const char value)
 Constructs a TfUtf8CodePoint from an ASCII charcter (0-127).
 
TF_API bool TfIsUtf8CodePointXidStart (uint32_t codePoint)
 Determines whether the given Unicode codePoint is in the XID_Start character class.
 
bool TfIsUtf8CodePointXidStart (const TfUtf8CodePoint codePoint)
 Determines whether the given Unicode codePoint is in the XID_Start character class.
 
TF_API bool TfIsUtf8CodePointXidContinue (uint32_t codePoint)
 Determines whether the given Unicode codePoint is in the XID_Continue character class.
 
bool TfIsUtf8CodePointXidContinue (const TfUtf8CodePoint codePoint)
 Determines whether the given Unicode codePoint is in the XID_Continue character class.
 

Variables

constexpr TfUtf8CodePoint TfUtf8InvalidCodePoint
 The replacement code point can be used to signal that a code point could not be decoded and needed to be replaced.
 

Detailed Description

Definitions of basic UTF-8 utilities in tf.

Definition in file unicodeUtils.h.


Class Documentation

◆ TfUtf8CodePointIterator::PastTheEndSentinel

class TfUtf8CodePointIterator::PastTheEndSentinel

Model iteration ending when the underlying iterator's end condition has been met.

Definition at line 109 of file unicodeUtils.h.

Function Documentation

◆ TfIsUtf8CodePointXidContinue() [1/2]

bool TfIsUtf8CodePointXidContinue ( const TfUtf8CodePoint  codePoint)
inline

Determines whether the given Unicode codePoint is in the XID_Continue character class.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 414 of file unicodeUtils.h.

◆ TfIsUtf8CodePointXidContinue() [2/2]

TF_API bool TfIsUtf8CodePointXidContinue ( uint32_t  codePoint)

Determines whether the given Unicode codePoint is in the XID_Continue character class.

The XID_Continue class of characters include those in XID_Start plus characters having the Unicode General Category of nonspacing marks, spacing combining marks, decimal number, and connector punctuation. That is, the character must have a category of XID_Start | Nd | Mn | Mc | Pc

◆ TfIsUtf8CodePointXidStart() [1/2]

bool TfIsUtf8CodePointXidStart ( const TfUtf8CodePoint  codePoint)
inline

Determines whether the given Unicode codePoint is in the XID_Start character class.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 393 of file unicodeUtils.h.

◆ TfIsUtf8CodePointXidStart() [2/2]

TF_API bool TfIsUtf8CodePointXidStart ( uint32_t  codePoint)

Determines whether the given Unicode codePoint is in the XID_Start character class.

The XID_Start class of characters are derived from the Unicode General_Category of uppercase letters, lowercase letters, titlecase letters, modifier letters, other letters, letters numbers, plus Other_ID_Start, minus Pattern_Syntax and Pattern_White_Space code points. That is, the character must have a category of Lu | Ll | Lt | Lm | Lo | Nl

◆ TfUtf8CodePointFromAscii()

constexpr TfUtf8CodePoint TfUtf8CodePointFromAscii ( const char  value)
constexpr

Constructs a TfUtf8CodePoint from an ASCII charcter (0-127).

Definition at line 85 of file unicodeUtils.h.

Variable Documentation

◆ TfUtf8InvalidCodePoint

constexpr TfUtf8CodePoint TfUtf8InvalidCodePoint
constexpr
Initial value:
{
static constexpr uint32_t ReplacementValue
Code points that cannot be decoded or are outside of the valid range will be replaced with this value...
Definition: unicodeUtils.h:40

The replacement code point can be used to signal that a code point could not be decoded and needed to be replaced.

Definition at line 81 of file unicodeUtils.h.