kfengbest: How to write BOM(Byte of Marker) into file using fstream

The Byte Order Marker (BOM) is Unicode character U+FEFF. (It can also represent a Zero Width No-break Space.) The code point U+FFFE is illegal in Unicode, and should never appear in a Unicode character stream. Therefore the BOM can be used in the first character of a file (or more generally a string), as an indicator of endian-ness. With UTF-16, if the first character is read as bytes FE FF then the text has the same endian-ness as the machine reading it. If the character is read as bytes FF FE, then the endian-ness is reversed and all 16-bit words should be byte-swapped as they are read-in. In the same way, the BOM indicates the endian-ness of text encoded with UTF-32.

Note that not all files start with a BOM however. In fact, the Unicode Standard says that text that does not begin with a BOM MUST be interpreted in big-endian form.

The character U+FEFF also serves as an encoding signature for the Unicode Encoding Forms. The table shows the encoding of U+FEFF in each of the Unicode encoding forms. Note that by definition, text labeled as UTF-16BE, UTF-32BE, UTF-32LE or UTF-16LE should not have a BOM. The endian-ness is indicated in the label.

For text that is compressed with the SCSU (Standard Compression Scheme for Unicode) algorithm, there is also a recommended signature.

Encoding Form	BOM Encoding
UTF-8	EF BB BF
UTF-16 (big-endian)	FE FF
UTF-16 (little-endian)	FF FE
UTF-16BE, UTF-32BE (big-endian)	No BOM!
UTF-16LE, UTF-32LE (little-endian)	No BOM!
UTF-32 (big-endian)	00 00 FE FF
UTF-32 (little-endian)	FF FE 00 00
SCSU (compression)	0E FE FF

Solutions:

Case 1: using ANSI std::ofstream:

wchar_t BOM = 0xFEFF;
std::ofstream outFile("filename.dat", std::ios::out | std::ios::binary);
outfile.write((char *) &BOM,sizeof(wchar_t));

Case 2: using ANSI std::wofstream:

const wchar_t BOM = 0xFEFF;

const char *fname = "abc.txt";

std::wofstream wfout;

  wfout.open(fname,ios_base::binary);

//S1:

testFile << BOM;

//S2:

   //testFile.put(BOM);

Monday, June 11, 2007

How to write BOM(Byte of Marker) into file using fstream

1 comment:

kfengbest

Labels

google analyse

Blog Archive

About Me