"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Load 8 Chars into an __m256 Variable as Packed Single Precision Floats?

How to Load 8 Chars into an __m256 Variable as Packed Single Precision Floats?

Published on 2024-11-06
Browse:137

How to Load 8 Chars into an __m256 Variable as Packed Single Precision Floats?

Loading 8 Chars from Memory into an __m256 Variable as Packed Single Precision Floats

In an effort to optimize an algorithm for Gaussian blur, you seek to replace the usage of a float buffer with an __m256 intrinsic variable. This question aims to determine the optimal instructions for this task.

Instruction for AVX2 Architecture:

  • Utilize PMOVZX to zero-extend your chars into 32-bit integers in a 256b register.
  • Convert to float in-place with VCVTDQ2PS.
; rsi = new_image
VPMOVZXBD   ymm0,  [rsi]   ; or SX to sign-extend  (Byte to DWord)
VCVTDQ2PS   ymm0, ymm0     ; convert to packed foat

Additional Strategies:

  • Consider using a 128-bit broadcast load to feed vpmovzxbd ymm,xmm and vpshufb ymm (_mm256_shuffle_epi8) for the high 64 bits. This approach reduces uop count and can be beneficial on Ryzen CPUs.
  • Avoid using extra shuffle instructions, as they may become a bottleneck when shuffling is already a limitation.

Instructions for AVX1 Architecture:

  • Perform the following steps:

    VPMOVZXBD   xmm0,  [rsi]
    VPMOVZXBD   xmm1,  [rsi 4]
    VINSERTF128 ymm0, ymm0, xmm1, 1   ; put the 2nd load of data into the high128 of ymm0
    VCVTDQ2PS   ymm0, ymm0     ; convert to packed float

Intrinsics Considerations:

  • GCC and MSVC may require special handling to ensure optimal code generation when using intrinsics for VPMOVZXBD ymm,[mem].
  • Consider using the _mm_loadl_epi64 intrinsic instead, which can be folded into the memory operand for optimal asm at -O3 with GCC on GCC versions 9 and later.
  • For AVX1-only optimization, writing the intrinsics version is an un-fun exercise.
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3