This class represent a simple stateless converter from UCS-4 and to UCS-4 for each single code point. More...

#include <booster/booster/locale/util.h>

Public Member Functions
virtual int	max_len () const

virtual bool	is_thread_safe () const

virtual base_converter *	clone () const

virtual uint32_t	to_unicode (char const &begin, char const end)

virtual uint32_t	from_unicode (uint32_t u, char begin, char const end)

Static Public Attributes
static const uint32_t	illegal =utf::illegal

static const uint32_t	incomplete =utf::incomplete

Detailed Description

This class represent a simple stateless converter from UCS-4 and to UCS-4 for each single code point.

This class is used for creation of std::codecvt facet for converting utf-16/utf-32 encoding to encoding supported by this converter

Please note, this converter should be fully stateless. Fully stateless means it should never assume that it is called in any specific order on the text. Even if the encoding itself seems to be stateless like windows-1255 or shift-jis, some encoders (most notably iconv) can actually compose several code-point into one or decompose them in case composite characters are found. So be very careful when implementing these converters for certain character set.

Member Function Documentation

virtual base_converter* booster::locale::util::base_converter::clone ( ) const

inlinevirtual

Create a polymorphic copy of this object, usually called only if is_thread_safe() return false

virtual uint32_t booster::locale::util::base_converter::from_unicode	(	uint32_t	u,
		char *	begin,
		char const *	end
	)

inlinevirtual

Convert a single code-point u into encoding and store it in [begin,end) range.

If u is invalid Unicode code-point, or it can not be mapped correctly to represented character set, illegal should be returned

If u can be converted to a sequence of bytes c1, ... , cN (1<= N <= max_len() ) then

If end - begin >= N, c1, ... cN are written starting at begin and N is returned
If end - begin < N, incomplete is returned, it is unspecified what would be stored in bytes in range [begin,end)

References booster::locale::util::create_codecvt(), booster::locale::util::create_simple_codecvt(), booster::locale::util::create_simple_converter(), booster::locale::util::create_utf8_codecvt(), booster::locale::util::create_utf8_converter(), illegal, and incomplete.

virtual bool booster::locale::util::base_converter::is_thread_safe ( ) const

inlinevirtual

Returns true if calling the functions from_unicode, to_unicode, and max_len is thread safe.

Rule of thumb: if this class' implementation uses simple tables that are unchanged or is purely algorithmic like UTF-8 - so it does not share any mutable bit for independent to_unicode, from_unicode calls, you may set it to true, otherwise, for example if you use iconv_t descriptor or UConverter as conversion object return false, and this object will be cloned for each use.

virtual int booster::locale::util::base_converter::max_len ( ) const

inlinevirtual

Return the maximal length that one Unicode code-point can be converted to, for example for UTF-8 it is 4, for Shift-JIS it is 2 and ISO-8859-1 is 1

virtual uint32_t booster::locale::util::base_converter::to_unicode	(	char const *&	begin,
		char const *	end
	)

inlinevirtual

Convert a single character starting at begin and ending at most at end to Unicode code-point.

if valid input sequence found in [begin,code_point_end) such as begin < code_point_end && code_point_end <= end it is converted to its Unicode code point equivalent, begin is set to code_point_end

if incomplete input sequence found in [begin,end), i.e. there my be such code_point_end that code_point_end > end and [begin, code_point_end) would be valid input sequence, then incomplete is returned begin stays unchanged, for example for UTF-8 conversion a *begin = 0xc2, begin +1 = end is such situation.

if invalid input sequence found, i.e. there is a sequence [begin, code_point_end) such as code_point_end <= end that is illegal for this encoding, illegal is returned and begin stays unchanged. For example if *begin = 0xFF and begin < end for UTF-8, then illegal is returned.

References illegal, and incomplete.

Member Data Documentation

const uint32_t booster::locale::util::base_converter::illegal =utf::illegal

static

This value should be returned when an illegal input sequence or code-point is observed: For example if a UCS-32 code-point is in the range reserved for UTF-16 surrogates or an invalid UTF-8 sequence is found

Referenced by from_unicode(), and to_unicode().

const uint32_t booster::locale::util::base_converter::incomplete =utf::incomplete

static

This value is returned in following cases: The of incomplete input sequence was found or insufficient output buffer was provided so complete output could not be written.

Referenced by from_unicode(), and to_unicode().

The documentation for this class was generated from the following file:

booster/locale/util.h

Public Member Functions

Static Public Attributes

Detailed Description

Member Function Documentation

Member Data Documentation