Loading 8 Chars from Memory into an __m256 Variable as Packed Single Precision Floats
In an effort to optimize an algorithm for Gaussian blur, you seek to replace the usage of a float buffer with an __m256 intrinsic variable. This question aims to determine the optimal instructions for this task.
Instruction for AVX2 Architecture:
; rsi = new_image VPMOVZXBD ymm0, [rsi] ; or SX to sign-extend (Byte to DWord) VCVTDQ2PS ymm0, ymm0 ; convert to packed foat
Additional Strategies:
Instructions for AVX1 Architecture:
Perform the following steps:
VPMOVZXBD xmm0, [rsi] VPMOVZXBD xmm1, [rsi 4] VINSERTF128 ymm0, ymm0, xmm1, 1 ; put the 2nd load of data into the high128 of ymm0 VCVTDQ2PS ymm0, ymm0 ; convert to packed float
Intrinsics Considerations:
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3