"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > When Should You Use _mm_sfence, _mm_lfence, and _mm_mfence?

When Should You Use _mm_sfence, _mm_lfence, and _mm_mfence?

Posted on 2025-02-06
Browse:493

When Should You Use _mm_sfence, _mm_lfence, and _mm_mfence?

When Should You Use _mm_sfence, _mm_lfence, and _mm_mfence?

Multi-threaded programming introduces concurrency-related complexities, necessitating mechanisms to maintain data integrity and synchronization. Intel's intrinsics library provides several functions, including _mm_sfence, _mm_lfence, and _mm_mfence, to control memory ordering in x86 architectures.

Memory Ordering in x86

x86 CPUs have a strongly ordered memory model, but C and C have weaker ones. Hence, additional precautions are required to ensure proper memory ordering and prevent data corruption or race conditions.

_mm_sfence

_mm_sfence is primarily used after non-temporal (NT) stores (_mm_stream_*) to prevent speculative reordering. NT stores are weakly ordered, meaning they can appear to occur out of order relative to other memory operations. _mm_sfence creates a barrier that ensures subsequent memory operations become globally visible after the NT stores are committed to memory.

_mm_lfence

_mm_lfence is rarely used as a load fence. It only has relevance when loading from Write-Combining (WC) memory regions, such as video RAM. _mm_lfence can prevent execution of subsequent instructions until it retires, which can be useful for microbenchmarking.

_mm_mfence

_mm_mfence provides sequential consistency, ensuring subsequent loads cannot read values until after preceding stores become globally visible. It can be useful if you implement your custom version of std::atomic or need to explicitly control memory ordering for operations that would otherwise be speculative.

Summary

  • Use _mm_sfence after NT stores to prevent data corruption and race conditions.
  • Avoid _mm_lfence for load ordering unless specifically working with WC memory regions.
  • _mm_mfence offers sequential consistency but may be less efficient than locked atomic read-modify-write operations.
  • Consider using C 11 std::atomic or C11 stdatomic for memory synchronization, as they provide a more convenient and optimized approach.
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3