”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 棘手的 Golang 面试问题 - 部分数据竞赛

棘手的 Golang 面试问题 - 部分数据竞赛

发布于2024-11-09
浏览:254

Tricky Golang interview questions - Part Data Race

Here is another code review interview question for you. This question is more advanced than the previous ones and is targeted toward a more senior audience. The problem requires knowledge of slices and sharing data between parallel processes.

If you're not familiar with the slices and how they are constructed, please check out my previous article about the Slice Header

What is a Data Race?

A data race occurs when two or more threads (or goroutines, in the case of Go) concurrently access shared memory, and at least one of those accesses is a write operation. If there are no proper synchronization mechanisms (such as locks or channels) in place to manage access, the result can be unpredictable behavior, including corruption of data, inconsistent states, or crashes.

In essence, a data race happens when:

  • Two or more threads (or goroutines) access the same memory location at the same time.
  • At least one of the threads (or goroutines) is writing to that memory.
  • There is no synchronization to control the access to that memory.

Because of this, the order in which the threads or goroutines access or modify the shared memory is unpredictable, leading to non-deterministic behavior that can vary between runs.

      ----------------------        --------------------- 
     | Thread A: Write      |      | Thread B: Read      |
      ----------------------        --------------------- 
     | 1. Reads x           |      | 1. Reads x          |
     | 2. Adds 1 to x       |      |                     |
     | 3. Writes new value  |      |                     |
      ----------------------        --------------------- 

                    Shared variable x
                    (Concurrent access without synchronization)

Here, Thread A is modifying x (writing to it), while Thread B is reading it at the same time. If both threads are running concurrently and there’s no synchronization, Thread B could read x before Thread A has finished updating it. As a result, the data could be incorrect or inconsistent.

Question: One of your teammates submitted the following code for a code review. Please review the code carefully and identify any potential issues.
And here the code that you have to review:

package main  

import (  
    "bufio"  
    "bytes"
    "io"
    "math/rand"
    "time"
)  

func genData() []byte {  
    r := rand.New(rand.NewSource(time.Now().Unix()))  
    buffer := make([]byte, 512)  
    if _, err := r.Read(buffer); err != nil {  
       return nil  
    }  
    return buffer  
}  

func publish(input []byte, output chan



What we have here?

The publish() function is responsible for reading the input data chunk by chunk and sending each chunk to the output channel. It begins by using bytes.NewReader(input) to create a reader from the input data, which allows the data to be read sequentially. A buffer of size 8 is created to hold each chunk of data as it’s being read from the input. During each iteration, reader.Read(buffer) reads up to 8 bytes from the input, and the function then sends a slice of this buffer (buffer[:n]) containing up to 8 bytes to the output channel. The loop continues until reader.Read(buffer) either encounters an error or reaches the end of the input data.

The consume() function handles the data chunks received from the channel. It processes these chunks using a bufio.Scanner, which scans each chunk of data, potentially breaking it into lines or tokens depending on how it’s configured. The variable b := scanner.Bytes() retrieves the current token being scanned. This function represents a basic input processing.

The main() creates a buffered channel chunkChannel with a capacity equal to workersCount, which is set to 4 in this case. The function then launches 4 worker goroutines, each of which will read data from the chunkChannel concurrently. Every time a worker receives a chunk of data, it processes the chunk by calling the consume() function. The publish() function reads the generated data, breaks it into chunks of up to 8 bytes, and sends them to the channel.

The program uses goroutines to create multiple consumers, allowing for concurrent data processing. Each consumer runs in a separate goroutine, processing chunks of data independently.

If you run this code, noting suspicious will happen:

[Running] go run "main.go"

[Done] exited with code=0 in 0.94 seconds

But there is a problem. We have a Data Race Risk. In this code, there’s a potential data race because the publish() function reuses the same buffer slice for each chunk. The consumers are reading from this buffer concurrently, and since slices share underlying memory, multiple consumers could be reading the same memory, leading to a data race. Let's try to use a race detection. Go provides a built-in tool to detect data races: the race detector. You can enable it by running your program with the -race flag:

go run -race main.go

If we add the -race flag to the run command we will receive the following output:

[Running] go run -race "main.go"

==================
WARNING: DATA RACE
Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44  0xcc
  bufio.(*Scanner).Scan()
      /GOROOT/go1.22.0/src/bufio/scan.go:219  0xef4
  main.consume()
      /GOPATH/example/main.go:40  0x140
  main.main.func1()
      /GOPATH/example/main.go:55  0x48

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44  0x168
  main.publish()
      /GOPATH/example/main.go:27  0xe4
  main.main()
      /GOPATH/example/main.go:60  0xdc

Goroutine 6 (running) created at:
  main.main()
      /GOPATH/example/main.go:53  0x50
==================
Found 1 data race(s)
exit status 66

[Done] exited with code=0 in 0.94 seconds

The warning you’re seeing is a classic data race detected by Go’s race detector. The warning message indicates that two goroutines are accessing the same memory location (0x00c00011e018) concurrently. One goroutine is reading from this memory, while another goroutine is writing to it at the same time, without proper synchronization.

The first part of the warning tells us that Goroutine 6 (which is one of the worker goroutines in your program) is reading from the memory address 0x00c00011e018 during a call to bufio.Scanner.Scan() inside the consume() function.

Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44  0xcc
  bufio.(*Scanner).Scan()
  /GOROOT/go1.22.0/src/bufio/scan.go:219  0xef4
  main.consume()
  /GOPATH/example/main.go:40  0x140
  main.main.func1()
  /GOPATH/example/main.go:55  0x48

The second part of the warning shows that the main goroutine previously wrote to the same memory location (0x00c00011e018) during a call to bytes.Reader.Read() inside the publish() function.

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44  0x168
  main.publish()
  /GOPATH/example/main.go:27  0xe4
  main.main()
  /GOPATH/example/main.go:60  0xdc

The final part of the warning explains that Goroutine 6 was created in the main function.

Goroutine 6 (running) created at:
  main.main()
  /GOPATH/example/main.go:53  0x50

In this case, while one goroutine (Goroutine 6) is reading from the buffer in consume(), the publish() function in the main goroutine is simultaneously writing to the same buffer, leading to the data race.

 -------------------                 -------------------- 
|     Publisher     |               |      Consumer      |
 -------------------                 -------------------- 
        |                                   |
        v                                   |
1. Read data into buffer                    |
        |                                   |
        v                                   |
2. Send slice of buffer to chunkChannel     |
        |                                   |
        v                                   |
  ----------------                          |
 |  chunkChannel  |                         |
  ----------------                          |
        |                                   |
        v                                   |
3. Consume reads from slice                 |
                                            v
                                    4. Concurrent access
                                    (Data Race occurs)

Why the Data Race Occurs

The data race in this code arises because of how Go slices work and how memory is shared between goroutines when a slice is reused. To fully understand this, let’s break it down into two parts: the behavior of the buffer slice and the mechanics of how the race occurs. When you pass a slice like buffer[:n] to a function or channel, what you are really passing is the slice header which contains a reference to the slice’s underlying array. Any modifications to the slice or the underlying array will affect all other references to that slice.

buffer = [ a, b, c, d, e, f, g, h ]  





func publish(input []byte, output chan



If you send buffer[:n] to a channel, both the publish() function and any consumer goroutines will be accessing the same memory. During each iteration, the reader.Read(buffer) function reads up to 8 bytes from the input data into this buffer slice. After reading, the publisher sends buffer[:n] to the output channel, where n is the number of bytes read in the current iteration.

The problem here is that buffer is reused across iterations. Every time reader.Read() is called, it overwrites the data stored in buffer.

  • Iteration 1: publish() function reads the first 8 bytes into buffer and sends buffer[:n] (say, [a, b, c, d, e, f, g, h]) to the channel.
  • Iteration 2: The publish() function overwrites the buffer with the next 8 bytes, let’s say [i, j, k, l, m, n, o, p], and sends buffer[:n] again.

At this point, if one of the worker goroutines is still processing the first chunk, it is now reading stale or corrupted data because the buffer has been overwritten by the second chunk. Reusing a slice neans sharing the same memory.

How to fix the Data Race?

To avoid the race condition, we must ensure that each chunk of data sent to the channel has its own independent memory. This can be achieved by creating a new slice for each chunk and copying the data from the buffer to this new slice. The key fix is to copy the contents of the buffer into a new slice before sending it to the chunkChannel:

chunk := make([]byte, n)    // Step 1: Create a new slice with its own memory
copy(chunk, buffer[:n])     // Step 2: Copy data from buffer to the new slice
output 



Why this fix works? By creating a new slice (chunk) for each iteration, you ensure that each chunk has its own memory. This prevents the consumers from reading from the buffer that the publisher is still modifying. copy() function copies the contents of the buffer into the newly allocated slice (chunk). This decouples the memory used by each chunk from the buffer. Now, when the publisher reads new data into the buffer, it doesn’t affect the chunks that have already been sent to the channel.

 -------------------------             ------------------------ 
|  Publisher (New Memory) |           | Consumers (Read Copy)  |
|  [ a, b, c ] --> chunk1 |           |  Reading: chunk1       |
|  [ d, e, f ] --> chunk2 |           |  Reading: chunk2       |
 -------------------------             ------------------------ 
         ↑                                    ↑
        (1)                                  (2)
   Publisher Creates New Chunk          Consumers Read Safely

This solution works is that it breaks the connection between the publisher and the consumers by eliminating shared memory. Each consumer now works on its own copy of the data, which the publisher does not modify. Here’s how the modified publish() function looks:

func publish(input []byte, output chan



Summary

Slices Are Reference Types:
As mentioned earlier, Go slices are reference types, meaning they point to an underlying array. When you pass a slice to a channel or a function, you’re passing a reference to that array, not the data itself. This is why reusing a slice leads to a data race: multiple goroutines end up referencing and modifying the same memory.

Memory Allocation:
When we create a new slice with make([]byte, n), Go allocates a separate block of memory for that slice. This means the new slice (chunk) has its own backing array, independent of the buffer. By copying the data from buffer[:n] into chunk, we ensure that each chunk has its own private memory space.

Decoupling Memory:
By decoupling the memory of each chunk from the buffer, the publisher can continue to read new data into the buffer without affecting the chunks that have already been sent to the channel. Each chunk now has its own independent copy of the data, so the consumers can process the chunks without interference from the publisher.

Preventing Data Races:
The main source of the data race was the concurrent access to the shared buffer. By creating new slices and copying the data, we eliminate the shared memory, and each goroutine operates on its own data. This removes the possibility of a race condition because there’s no longer any contention over the same memory.

Conclusion

The core of the fix is simple but powerful: by ensuring that each chunk of data has its own memory, we eliminate the shared resource (the buffer) that was causing the data race. This is achieved by copying the data from the buffer into a new slice before sending it to the channel. With this approach, each consumer works on its own copy of the data, independent of the publisher’s actions, ensuring safe concurrent processing without race conditions. This method of decoupling shared memory is a fundamental strategy in concurrent programming. It prevents the unpredictable behavior caused by race conditions and ensures that your Go programs remain safe, predictable, and correct, even when multiple goroutines are accessing data concurrently.

It's that easy!

版本声明 本文转载于:https://dev.to/crusty0gphr/tricky-golang-interview-questions-part-7-data-race-753?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 如何增加 PHP 最大 POST 变量限制?
    如何增加 PHP 最大 POST 变量限制?
    PHP最大POST变量限制处理具有大量输入字段的POST请求时,当变量数量超过默认值时,会出现常见问题PHP 中的限制。例如,超过 1000 个字段的表单可能只公开 $_POST 数组中的前 1001 个变量。要解决此问题,需要调整 PHP 允许的 POST 变量的最大数量。在版本 5.3.9 中引...
    编程 发布于2024-11-09
  • 如何防止 Pandas 在保存 CSV 时添加索引列?
    如何防止 Pandas 在保存 CSV 时添加索引列?
    避免使用 Pandas 保存的 CSV 中的索引列使用 Pandas 进行修改后保存 csv 文件时,默认行为是包含索引列。为了避免这种情况,可以在使用 to_csv() 方法时将索引参数设置为 False。为了详细说明,请考虑以下命令序列:pd.read_csv('C:/Path/to/file....
    编程 发布于2024-11-09
  • 为什么 OpenX 仪表板显示“错误 330 (net::ERR_CONTENT_DECODING_FAILED)”?
    为什么 OpenX 仪表板显示“错误 330 (net::ERR_CONTENT_DECODING_FAILED)”?
    错误 330 (net::ERR_CONTENT_DECODING_FAILED):解开压缩问题遇到神秘的“错误 330 (net::ERR_CONTENT_DECODING_FAILED)” “在导航到仪表板页面时,必须深入研究根本原因。当 Web 服务器错误识别 HTTP 请求中使用的内容压缩方...
    编程 发布于2024-11-09
  • 除了“if”语句之外:还有哪些地方可以在不进行强制转换的情况下使用具有显式“bool”转换的类型?
    除了“if”语句之外:还有哪些地方可以在不进行强制转换的情况下使用具有显式“bool”转换的类型?
    无需强制转换即可上下文转换为 bool您的类定义了对 bool 的显式转换,使您能够在条件语句中直接使用其实例“t”。然而,这种显式转换提出了一个问题:“t”在哪里可以在不进行强制转换的情况下用作 bool?上下文转换场景C 标准指定了四种值可以根据上下文转换为的主要场景bool:语句:if、whi...
    编程 发布于2024-11-09
  • 大批
    大批
    方法是可以在对象上调用的 fns 数组是对象,因此它们在 JS 中也有方法。 slice(begin):将数组的一部分提取到新数组中,而不改变原始数组。 let arr = ['a','b','c','d','e']; // Usecase: Extract till index p...
    编程 发布于2024-11-09
  • 如何在Java中使用堆栈将算术表达式解析为树结构?
    如何在Java中使用堆栈将算术表达式解析为树结构?
    在 Java 中将算术表达式解析为树结构从算术表达式创建自定义树可能是一项具有挑战性的任务,特别是在确保树结构时准确反映表达式的操作和优先级。要实现这一点,一种有效的方法是使用堆栈。以下是该过程的逐步描述:初始化:从空堆栈开始。处理令牌:迭代表达式中的每个标记:如果标记是左括号,则压入它如果 tok...
    编程 发布于2024-11-09
  • 在 Go 中使用 WebSocket 进行实时通信
    在 Go 中使用 WebSocket 进行实时通信
    构建需要实时更新的应用程序(例如聊天应用程序、实时通知或协作工具)需要一种比传统 HTTP 更快、更具交互性的通信方法。这就是 WebSockets 发挥作用的地方!今天,我们将探讨如何在 Go 中使用 WebSocket,以便您可以向应用程序添加实时功能。 在这篇文章中,我们将介绍: WebSoc...
    编程 发布于2024-11-09
  • 如何进行有替换和无替换的有效加权随机选择?
    如何进行有替换和无替换的有效加权随机选择?
    带替换和不带替换的加权随机选择为了应对编程挑战,我们寻求从列表中进行加权随机选择的有效算法, 带替换的加权选择一种有效的方法带替换的加权选择是别名方法。该技术为每个加权元素创建一组相同大小的箱。通过利用位操作,我们可以有效地索引这些容器,而无需诉诸二分搜索。每个 bin 存储一个百分比,表示原始加权...
    编程 发布于2024-11-09
  • 如何在不依赖框架的情况下确定 DOM 准备情况?
    如何在不依赖框架的情况下确定 DOM 准备情况?
    Document.isReady:DOM 就绪检测的本机解决方案依赖于 Prototype 和 jQuery 等框架来管理 window.onload 事件可能不会总是令人向往。本文探讨了确定 DOM 就绪情况的替代方法,特别是通过使用 document.isReady.查询 Document.is...
    编程 发布于2024-11-09
  • 如何在 Golang 中检索 XML 数组中的所有元素而不仅限于第一个元素?
    如何在 Golang 中检索 XML 数组中的所有元素而不仅限于第一个元素?
    在 XML 中解组数组元素:检索所有元素,而不仅仅是第一个当使用 xml.Unmarshal( 在 Golang 中解组 XML 数组时[]byte(p.Val.Inner), &t),您可能会遇到仅检索第一个元素的情况。要解决此问题,请利用 xml.Decoder 并重复调用其 Decode 方法...
    编程 发布于2024-11-09
  • 带有管理面板的轻量级 Rest Api,可轻松管理食物食谱。
    带有管理面板的轻量级 Rest Api,可轻松管理食物食谱。
    你好, ?所有这篇文章都是关于我刚刚在 Github 上发布的 Django Rest Framework API。 如果您正在寻找一些简单而高效的 API 来从管理面板管理食物食谱并将其返回以供客户端使用,那么此存储库适合您。 该代码是轻量级的,可以在任何低功耗迷你 PC(如 Raspberry...
    编程 发布于2024-11-09
  • 如何使用正则表达式匹配带有或不带有可选 HTTP 和 WWW 前缀的 URL?
    如何使用正则表达式匹配带有或不带有可选 HTTP 和 WWW 前缀的 URL?
    使用可选 HTTP 和 WWW 前缀匹配 URL正则表达式是执行复杂模式匹配任务的强大工具。当涉及到匹配 URL 时,格式通常会有所不同,例如是否包含“http://www”。 使用正则表达式的解决方案匹配带或不带“http://www”的 URL。前缀,可以使用以下正则表达式:((https?|f...
    编程 发布于2024-11-09
  • 如何在不依赖扩展名的情况下确定文件类型?
    如何在不依赖扩展名的情况下确定文件类型?
    如何在不依赖扩展名的情况下检测文件类型除了检查文件的扩展名之外,确定文件是 mp3 还是图像格式是很有价值的编程中的任务。这是一个不依赖扩展的全面解决方案:PHP >= 5.3:$mimetype = finfo_fopen(fopen($filename, 'r'), FILEINFO_MIME_...
    编程 发布于2024-11-09
  • 在 JavaScript 中实现斐波那契数列:常见方法和变体
    在 JavaScript 中实现斐波那契数列:常见方法和变体
    作为开发人员,您可能遇到过编写函数来计算斐波那契数列中的值的任务。这个经典问题经常出现在编码面试中,通常要求递归实现。然而,面试官有时可能会要求具体的方法。在本文中,我们将探讨 JavaScript 中最常见的斐波那契数列实现。 什么是斐波那契数列? 首先,让我们回顾一下。斐波那契数...
    编程 发布于2024-11-09
  • 如何使用 .htaccess 更改共享服务器上的 PHP 版本?
    如何使用 .htaccess 更改共享服务器上的 PHP 版本?
    在共享服务器上通过 .htaccess 更改 PHP 版本如果您正在操作共享服务器并且需要更改 PHP 版本,可以通过 .htaccess 文件来做到这一点。这允许您为您的网站运行特定的 PHP 版本,同时服务器维护其默认版本。要切换 PHP 版本,请按照下列步骤操作:找到 . htaccess 文...
    编程 发布于2024-11-09

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3