CppCMS
Public Member Functions | Static Public Attributes | List of all members
booster::locale::util::base_converter Class Reference

This class represent a simple stateless converter from UCS-4 and to UCS-4 for each single code point. More...

#include <booster/booster/locale/util.h>

Public Member Functions

virtual int max_len () const
 
virtual bool is_thread_safe () const
 
virtual base_converterclone () const
 
virtual uint32_t to_unicode (char const *&begin, char const *end)
 
virtual uint32_t from_unicode (uint32_t u, char *begin, char const *end)
 

Static Public Attributes

static const uint32_t illegal =utf::illegal
 
static const uint32_t incomplete =utf::incomplete
 

Detailed Description

This class represent a simple stateless converter from UCS-4 and to UCS-4 for each single code point.

This class is used for creation of std::codecvt facet for converting utf-16/utf-32 encoding to encoding supported by this converter

Please note, this converter should be fully stateless. Fully stateless means it should never assume that it is called in any specific order on the text. Even if the encoding itself seems to be stateless like windows-1255 or shift-jis, some encoders (most notably iconv) can actually compose several code-point into one or decompose them in case composite characters are found. So be very careful when implementing these converters for certain character set.

Member Function Documentation

virtual base_converter* booster::locale::util::base_converter::clone ( ) const
inlinevirtual

Create a polymorphic copy of this object, usually called only if is_thread_safe() return false

virtual uint32_t booster::locale::util::base_converter::from_unicode ( uint32_t  u,
char *  begin,
char const *  end 
)
inlinevirtual

Convert a single code-point u into encoding and store it in [begin,end) range.

If u is invalid Unicode code-point, or it can not be mapped correctly to represented character set, illegal should be returned

If u can be converted to a sequence of bytes c1, ... , cN (1<= N <= max_len() ) then

  1. If end - begin >= N, c1, ... cN are written starting at begin and N is returned
  2. If end - begin < N, incomplete is returned, it is unspecified what would be stored in bytes in range [begin,end)

References booster::locale::util::create_codecvt(), booster::locale::util::create_simple_codecvt(), booster::locale::util::create_simple_converter(), booster::locale::util::create_utf8_codecvt(), booster::locale::util::create_utf8_converter(), illegal, and incomplete.

virtual bool booster::locale::util::base_converter::is_thread_safe ( ) const
inlinevirtual

Returns true if calling the functions from_unicode, to_unicode, and max_len is thread safe.

Rule of thumb: if this class' implementation uses simple tables that are unchanged or is purely algorithmic like UTF-8 - so it does not share any mutable bit for independent to_unicode, from_unicode calls, you may set it to true, otherwise, for example if you use iconv_t descriptor or UConverter as conversion object return false, and this object will be cloned for each use.

virtual int booster::locale::util::base_converter::max_len ( ) const
inlinevirtual

Return the maximal length that one Unicode code-point can be converted to, for example for UTF-8 it is 4, for Shift-JIS it is 2 and ISO-8859-1 is 1

virtual uint32_t booster::locale::util::base_converter::to_unicode ( char const *&  begin,
char const *  end 
)
inlinevirtual

Convert a single character starting at begin and ending at most at end to Unicode code-point.

if valid input sequence found in [begin,code_point_end) such as begin < code_point_end && code_point_end <= end it is converted to its Unicode code point equivalent, begin is set to code_point_end

if incomplete input sequence found in [begin,end), i.e. there my be such code_point_end that code_point_end > end and [begin, code_point_end) would be valid input sequence, then incomplete is returned begin stays unchanged, for example for UTF-8 conversion a *begin = 0xc2, begin +1 = end is such situation.

if invalid input sequence found, i.e. there is a sequence [begin, code_point_end) such as code_point_end <= end that is illegal for this encoding, illegal is returned and begin stays unchanged. For example if *begin = 0xFF and begin < end for UTF-8, then illegal is returned.

References illegal, and incomplete.

Member Data Documentation

const uint32_t booster::locale::util::base_converter::illegal =utf::illegal
static

This value should be returned when an illegal input sequence or code-point is observed: For example if a UCS-32 code-point is in the range reserved for UTF-16 surrogates or an invalid UTF-8 sequence is found

Referenced by from_unicode(), and to_unicode().

const uint32_t booster::locale::util::base_converter::incomplete =utf::incomplete
static

This value is returned in following cases: The of incomplete input sequence was found or insufficient output buffer was provided so complete output could not be written.

Referenced by from_unicode(), and to_unicode().


The documentation for this class was generated from the following file: