Monday, June 11, 2007

How to write file in unicode using std::fstream?

If you try to use fstream to write file in Unicode like the following style, you will got failed.

 

tofstream testFile( "test.txt" ) ;

testFile << _T("ABC") ;

you would expect the above code to produce a 3-byte file when compiled using single-byte characters and a 6-byte file when using double-byte. Except you don't. You get a 3-byte file for both. WTH is going on?!

It turns out that the C++ standard dictates that wide-streams are required to convert double-byte characters to single-byte when writing to a file. So in the example above, the wide string L"ABC" (which is 6 bytes long) gets converted to a narrow string (3 bytes) before it is written to the file. And if that wasn't bad enough, how this conversion is done is implementation-dependent.

I haven't been able to find a definitive explanation of why things were specified like this. My best guess is that a file, by definition, is considered to be a stream of (single-byte) characters and allowing stuff to be written 2-bytes at a time would break that abstraction. Right or wrong, this causes serious problems. For example, you can't write binary data to a wofstream because the class will try to narrow it first (usually failing miserably) before writing it out.

 

 

Solution 1: Use fstream::Write()

 

void Func10()

{

            wstringstream wss;

            wss << L"hello" << L"你好" <<L"test";

            char* st = (char*)wss.str().c_str();   //importance!

 

            fstream  m_ofs;

            m_ofs.open(_T("c:\\AllElem.txt"),ios_base::binary | ios_base::out );

 

            m_ofs.write(st, 100);

            m_ofs.close();

}

 

Solution 2: write a new codecvt-derived class that converts wchar_ts to wchar_ts (i.e. do nothing) and attach it to the wofstream object.

http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp

 

#include <locale>

// nb: MSVC6+Stlport can't handle "std::"

// appearing in the NullCodecvtBase typedef.

using std::codecvt ;

typedef codecvt < wchar_t , char , mbstate_t > NullCodecvtBase ;

 

class NullCodecvt

    : public NullCodecvtBase

{

 

public:

    typedef wchar_t _E ;

    typedef char _To ;

    typedef mbstate_t _St ;

 

    explicit NullCodecvt( size_t _R=0 ) : NullCodecvtBase(_R) { }

 

protected:

    virtual result do_in( _St& _State ,

                   const _To* _F1 , const _To* _L1 , const _To*& _Mid1 ,

                   _E* F2 , _E* _L2 , _E*& _Mid2

                   ) const

    {

        return noconv ;

    }

    virtual result do_out( _St& _State ,

                   const _E* _F1 , const _E* _L1 , const _E*& _Mid1 ,

                   _To* F2, _E* _L2 , _To*& _Mid2

                   ) const

    {

        return noconv ;

    }

    virtual result do_unshift( _St& _State ,

            _To* _F2 , _To* _L2 , _To*& _Mid2 ) const

    {

        return noconv ;

     }

    virtual int do_length( _St& _State , const _To* _F1 ,

           const _To* _L1 , size_t _N2 ) const _THROW0()

    {

        return (_N2 < (size_t)(_L1 - _F1)) ? _N2 : _L1 - _F1 ;

    }

    virtual bool do_always_noconv() const _THROW0()

    {

        return true ;

    }

    virtual int do_max_length() const _THROW0()

    {

        return 2 ;

    }

    virtual int do_encoding() const _THROW0()

    {

        return 2 ;

    }

};

 

#define IMBUE_NULL_CODECVT( outputFile ) \

{ \

            NullCodecvt* pNullCodecvt = new NullCodecvt ; \

            locale loc = locale::classic() ; \

            loc._Addfac( pNullCodecvt , NullCodecvt::id, NullCodecvt::_Getcat() ) ; \

            (outputFile).imbue( loc ) ; \

}

 

 

tofstream testFile ;

IMBUE_NULL_CODECVT( testFile ) ;

testFile.open( "test.txt" , ios::out | ios::binary ) ;

testFile << _T("ABC") ;

 

 

BTW: if you got the following compile error in VC 2005( I got itJ ):

- error C2661: 'std::locale::facet::operator new' : no overloaded function

takes 3 arguments

 

Solution:
Look for the following line in your code somewhere:
 
#define new DEBUG_NEW
 
MFC wizard puts it in. It messes with class-specific operator new(). It
looks like facet class now provides its own operator new. 
 
Comment it will fix your problem.

 

No comments: