Accurate judgment of string encoding is crucial for processing text data from different sources. This article will explore how to achieve this goal efficiently in C#.
There are several ways to determine the encoding of a string without explicit declaration:
combines all three methods to determine the encoding of a string, first of which is BOM detection. If the BOM is not found, the code uses a detector to heuristically identify common encodings such as UTF-8 and UTF-16. Finally, if no suitable encoding is found, it will fall back to the system's default code page.
This code not only detects encoding, but also returns the decoded text to provide the required information in full.
The following C# code implements this solution:
public Encoding detectTextEncoding(string filename, out String text, int taster = 1000)
{
// 检查BOM
// 为简洁起见省略
// 基于探测器的编码检测
bool utf8 = false;
int i = 0;
while (i
To use this code, provide the file path as the string and retrieve the detected encoded and decoded text as the output parameters. Here is an example:
```c# string text; Encoding encoding = detectTextEncoding("my_file.txt", out text); Console.WriteLine("Detected encoding: " encoding.EncodingName); Console.WriteLine("Decoded text: " text); ```All in all, this code provides a powerful way to determine the encoding of strings in C#, leveraging BOM and heuristic checks to ensure accurate detection.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3