"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Print UTF-8 Character Correctly in Windows Console with German Characters?

How to Print UTF-8 Character Correctly in Windows Console with German Characters?

Published on 2024-11-09
Browse:209

How to Print UTF-8 Character Correctly in Windows Console with German Characters?

Proper UTF-8 Character Printing in Windows Console

This article aims to address the challenges faced when attempting to print UTF-8 characters in the Windows console.

Issue Description

Users have encountered difficulties in displaying German characters using a specific code snippet:

#include 
#include 

int main() {
  SetConsoleOutputCP(CP_UTF8);
  // German characters not appearing
  char const* text = "aäbcdefghijklmnoöpqrsßtuüvwxyz";
  int len = MultiByteToWideChar(CP_UTF8, 0, text, -1, 0, 0);
  wchar_t *unicode_text = new wchar_t[len];
  MultiByteToWideChar(CP_UTF8, 0, text, -1, unicode_text, len);
  wprintf(L"%s", unicode_text);
}

Despite setting the output codepage to UTF-8, German characters are not printed correctly.

Solution

To print Unicode data correctly in the Windows console, there are several available methods:

  1. Using WriteConsoleW Directly: Communicate with the console API explicitly using WriteConsoleW. This approach ensures data is written correctly to the console. However, it requires distinguishing between console and non-console output situations.
  2. Setting Output Mode: Set the output mode of standard output file descriptors to "_O_U16TEXT" or "_O_U8TEXT" via _setmode. This enables wide character output functions to output Unicode data correctly to the console. Note that this method requires using only wide character functions on the selected stream.
  3. CP_UTF8 Encoding: Print UTF-8 text directly to the console by setting the console output codepage to CP_UTF8 and using appropriate low-level functions or a custom ostream implementation.

Troubleshooting

In case of incorrect output with the third method:

putc('\302'); putc('\260'); // doesn't work with CP_UTF8

puts("\302\260"); // correctly writes UTF-8 data to Windows console with CP_UTF8 

This is because the console API interprets data passed in separate calls as illegal encodings when using CP_UTF8.

To resolve this, consider creating a streambuf subclass that accurately handles multibyte character conversion and maintains conversion state between writes.

Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3