STRINGS
Strings are sequences of characters. However, what constitutes a character depends on the language used and the settings of the operating system on which the application runs. Gone are the days when you could assume each character in a string is represented by a single byte. Multibyte encodings (either fixed-length or variable-length) of Unicode are needed to accurately store text in today’s global economy.
More recently designed languages, such as Java and C#, have a multibyte fundamental character type, whereas a char
in C and C++ is always a single byte. (Recent versions of C and C++ also define a character type wchar_t
, which is usually multibyte.) Even with built-in multibyte character types, properly handling all cases of Unicode can be tricky: more than 100,000 code points (representation-independent character definitions) are defined in Unicode, so they can’t all be represented with a single, 2-byte Java or C# char
. This problem is typically solved using...