Proper UTF-8 Character Printing in Windows Console
This article aims to address the challenges faced when attempting to print UTF-8 characters in the Windows console.
Issue Description
Users have encountered difficulties in displaying German characters using a specific code snippet:
#include
#include
int main() {
SetConsoleOutputCP(CP_UTF8);
// German characters not appearing
char const* text = "aäbcdefghijklmnoöpqrsßtuüvwxyz";
int len = MultiByteToWideChar(CP_UTF8, 0, text, -1, 0, 0);
wchar_t *unicode_text = new wchar_t[len];
MultiByteToWideChar(CP_UTF8, 0, text, -1, unicode_text, len);
wprintf(L"%s", unicode_text);
}
Despite setting the output codepage to UTF-8, German characters are not printed correctly.
Solution
To print Unicode data correctly in the Windows console, there are several available methods:
Troubleshooting
In case of incorrect output with the third method:
putc('\302'); putc('\260'); // doesn't work with CP_UTF8
puts("\302\260"); // correctly writes UTF-8 data to Windows console with CP_UTF8
This is because the console API interprets data passed in separate calls as illegal encodings when using CP_UTF8.
To resolve this, consider creating a streambuf subclass that accurately handles multibyte character conversion and maintains conversion state between writes.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3