Boost.Nowide
|
Table of Contents:
Boost.Nowide is a library implemented by Artyom Beilis that makes cross platform Unicode aware programming easier.
The library provides an implementation of standard C and C++ library functions, such that their inputs are UTF-8 aware on Windows without requiring to use Wide API.
Consider a simple application that splits a big file into chunks, such that they can be sent by e-mail. It requires doing a few very simple tasks:
int main(int argc,char **argv)
std::fstream::open(char const *,std::ios::openmode m)
std::remove(char const *file)
std::cout << file_name
Unfortunately it is impossible to implement this simple task in plain C++ if the file names contain non-ASCII characters.
The simple program that uses the API would work on the systems that use UTF-8 internally – the vast majority of Unix-Line operating systems: Linux, Mac OS X, Solaris, BSD. But it would fail on files like War and Peace - Война и мир - מלחמה ושלום.zip
under Microsoft Windows because the native Windows Unicode aware API is Wide-API – UTF-16.
This incredibly trivial task is very hard to implement in a cross platform manner.
Boost.Nowide provides a set of standard library functions that are UTF-8 aware and makes Unicode aware programming easier.
The library provides:
argc
, argc
and env
parameters of main
use UTF-8stdio.h
functions:fopen
freopen
remove
rename
stdlib.h
functions:system
getenv
setenv
unsetenv
putenv
fstream
filebuf
fstream/ofstream/ifstream
iostream
cout
cerr
clog
cin
Why not provide both Wide and Narrow implementations so the developer can choose to use Wide characters on Unix-like platforms?
Several reasons:
wchar_t
is not really portable, it can be 2 bytes, 4 bytes or even 1 byte making Unicode aware programming harderfopen(wchar_t const *,wchar_t const *)
in the standard library, so it is better to stick to the standards rather than re-implement Wide API in "Microsoft Windows Style"The library is mostly header only, only console I/O requires separate compilation under Windows.
As a developer you are expected to use boost::nowide
functions instead of the functions available in the std
namespace.
For example, here is a Unicode unaware implementation of a line counter:
To make this program handle Unicode properly, we do the following changes:
This very simple and straightforward approach helps writing Unicode aware programs.
Of course, this simple set of functions does not cover all needs. If you need to access Wide API from a Windows application that uses UTF-8 internally you can use functions like boost::nowide::widen
and boost::nowide::narrow
.
For example:
The conversion is done at the last stage, and you continue using UTF-8 strings everywhere else. You only switch to the Wide API at glue points.
boost::nowide::widen
returns std::string
. Sometimes it is useful to prevent allocation and use on-stack buffers instead. Boost.Nowide provides the boost::nowide::basic_stackstring
class for this purpose.
The example above could be rewritten as:
stackstring
and wstackstring
using 256-character buffers, and short_stackstring
and wshort_stackstring
using 16-character buffers. If the string is longer, they fall back to memory allocation.The library does not include the windows.h
in order to prevent namespace pollution with numerous defines and types. Instead, the library defines the prototypes of the Win32 API functions.
However, you may request to use the windows.h
header by defining BOOST_NOWIDE_USE_WINDOWS_H
before including any of the Boost.Nowide headers
Boost.Filesystem supports selection of narrow encoding. Unfortunatelly the default narrow encoding on Windows isn't UTF-8, you can enable UTF-8 as default encoding on Boost.Filesystem by calling boost::nowide::nowide_filesystem()
in the beginning of your program
For Microsoft Windows, the library provides UTF-8 aware variants of some std:
: functions in the boost::nowide
namespace. For example, std::fopen
becomes boost::nowide::fopen
.
Under POSIX platforms, the functions in boost::nowide are aliases of their standard library counterparts:
Console I/O is implemented as a wrapper around ReadConsoleW/WriteConsoleW (used when the stream goes to the "real" console) and ReadFile/WriteFile (used when the stream was piped/redirected).
This approach eliminates a need of manual code page handling. If TrueType fonts are used the Unicode aware input and output works as intended.
Q: Why doesn't the library convert the string to/from the locale's encoding (instead of UTF-8) on POSIX systems?
A: It is inherently incorrect to convert strings to/from locale encodings on POSIX platforms.
You can create a file named "\xFF\xFF.txt" (invalid UTF-8), remove it, pass its name as a parameter to a program and it would work whether the current locale is UTF-8 or not. Also, changing the locale from let's say en_US.UTF-8
to en_US.ISO-8859-1
would not magically change all files in the OS or the strings a user may pass to the program (which is different on Windows)
POSIX OSs treat strings as NULL
terminated cookies.
So altering their content according to the locale would actually lead to incorrect behavior.
For example, this is a naive implementation of a standard program "rm"
It would work with ANY locale and changing the strings would lead to incorrect behavior.
The meaning of a locale under POSIX and Windows platforms is different and has very different effects.
It is possible to use Nowide library without having the huge Boost project as a dependency. There is a standalone version that has all the functionality in the nowide
namespace instead of boost::nowide
. The example above would look like
The upstream sources can be found at GitHub: https://github.com/artyom-beilis/nowide
You can download the latest sources there: