"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Extract UCS-2 Code Points from UTF-8 Characters in PHP?

How to Extract UCS-2 Code Points from UTF-8 Characters in PHP?

Posted on 2025-02-06
Browse:163

How to Extract UCS-2 Code Points from UTF-8 Characters in PHP?

Determining UCS-2 Code Points for UTF-8 Characters in PHP

The task at hand is to extract the UCS-2 code points for characters within a given UTF-8 string. To accomplish this, a custom PHP function can be defined.

Firstly, it's important to understand the UTF-8 encoding scheme. Each character is represented by a sequence of 1 to 4 bytes, depending on its Unicode code point. The ranges for each byte size are as follows:

  • 0xxxxxxx: 1 byte
  • 110xxxxx 10xxxxxx: 2 bytes
  • 1110xxxx 10xxxxxx 10xxxxxx: 3 bytes
  • 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx: 4 bytes

To determine the number of bytes per character, examine the first byte:

  • 0: 1 byte character
  • 110: 2 byte character
  • 1110: 3 byte character
  • 11110: 4 byte character
  • 10: Continuation byte
  • 11111: Invalid character

Once the number of bytes is determined, bit manipulation can be used to extract the code point.

Custom PHP Function:

Based on the above analysis, here's a custom PHP function that takes a single UTF-8 character as input and returns its UCS-2 code point:

function get_ucs2_codepoint($char)
{
    // Initialize the code point
    $codePoint = 0;

    // Get the first byte
    $firstByte = ord($char);

    // Determine the number of bytes
    if ($firstByte 

Example Usage:

To use the function, simply provide a UTF-8 character as input:

$char = "ñ";
$codePoint = get_ucs2_codepoint($char);
echo "UCS-2 code point: $codePoint\n";

Output:

UCS-2 code point: 241
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3