Example 1: Defining a Narrow Character Code Conversion (ASCII <-> EBCDIC)

Apache C++ Standard Library User's Guide

40.3 Example 1: Defining a Narrow Character Code Conversion (ASCII <-> EBCDIC)

As an example of how file stream buffers and code conversion facets collaborate, we would now like to implement a code conversion facet that can translate text files encoded in EBCDIC into character streams encoded in ASCII. The conversion between ASCII characters and EBCDIC characters is a constant-size code conversion where each character is represented by one byte. Hence the conversion can be done on a character-by-character basis.

To implement and use an ASCII-EBCDIC code conversion facet, we:

Derive a new facet type from the standard code conversion facet type std::codecvt.
Specialize the new facet type for the character type char.
Implement the member functions that are used by the file buffer.
Imbue a file stream's buffer with a locale that carries an ASCII-EBCDIC code conversion facet.

The following sections explain these steps in detail.

40.3.1 Derive a New Facet Type

Here is the new code conversion facet type AsciiEbcdicConversion:

template <class internT, class externT, class stateT>
class AsciiEbcdicConversion
  : public std::codecvt<internT, externT, stateT>
{ };

It is empty because we will specialize the class template for the character type char.

40.3.2 Specialize the New Facet Type and Implement the Member Functions

Each code conversion facet has two main member functions, in() and out():

The public member function in() is responsible for the conversion done on reading from the external device.
The public member function out() is responsible for the conversion necessary for writing to the external device.

The other member functions of a code conversion facet used by a file stream buffer are:

The public member function always_noconv(), which returns true if no conversion is performed by the facet. This is because file stream buffers might want to bypass the code conversion facet when no conversion is necessary; for example, when the external encoding is identical to the internal. Our facet obviously will perform a conversion and does not want to be bypassed, so always_noconv() returns false in our example.
The public member function encoding(), which provides information about the type of conversion; that is, whether it is state-dependent or constant-size, etc. In our example, the conversion is constant-size. The function encoding() is supposed to return the size of the internal characters, which is 1 because the file buffer uses an ASCII encoding internally.

All public member functions of a facet call the respective, protected virtual member function, named do_...(). Here is the declaration of the specialized facet type:

template <>
class AsciiEbcdicConversion<char, char, std::mbstate_t>
: public std::codecvt<char, char, std::mbstate_t>
{
protected:

 virtual std::codecvt_base::result
 do_in (std::mbstate_t& state, const char* from,
        const char* from_end,  const char*& from_next,
        char* to,              char* to_limit,
        char*& to_next) const;

 virtual std::codecvt_base::result
 do_out(std::mbstate_t& state,
        const char* from,       const char* from_end,
        const char*& from_next, char* to,
        char* to_limit,         char*& to_next) const;

 virtual
 bool do_always_noconv() const throw(){ 
   return false;
 }

 int do_encoding() const throw(){
     return  1;
 }

};

For the sake of brevity, we implement only those functions used by this implementation of file stream buffers. If you want to provide a code conversion facet that is more widely usable, you must also override the virtual member functions do_length() and do_max_length().

The implementation of the functions do_in() and do_out() is straightforward. Each of the functions translates a sequence of characters in the range [from,from_end) into the corresponding sequence [to,to_end). The pointers from_next and to_next point one beyond the last character successfully converted. In principle, you can do whatever you want, or whatever it takes, in these functions. However, for effective communication with the file stream buffer, it is important to indicate success or failure properly.

40.3.3 Use the New Code Conversion Facet

Here is an example of how the new code conversion facet can be used:

std::fstream inout("/tmp/fil");                               //1
AsciiEbcdicConversion<char,char,std::mbstate_t> cvtfac;
std::locale cvtloc(std::locale(),&cvtfac);
inout.rdbuf()->pubimbue(cvtloc)                               //2
std::cout << inout.rdbuf();                                   //3

`//1`	When a file is created, a snapshot of the current global locale is attached as the default locale. Remember that a stream has two locale objects: one used for formatting numeric items, and a second used by the stream's buffer for code conversions.
`//2`	Here the stream buffer's locale is replaced by a copy of the global locale that has an ASCII-EBCDIC code conversion facet.
`//3`	The content of the EBCDIC file `"/tmp/fil"` is read, automatically converted to ASCII, and written to `std::cout`.