」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 棘手的 Golang 面試問題 - 部分數據競賽

棘手的 Golang 面試問題 - 部分數據競賽

發佈於2024-11-09
瀏覽:581

Tricky Golang interview questions - Part Data Race

Here is another code review interview question for you. This question is more advanced than the previous ones and is targeted toward a more senior audience. The problem requires knowledge of slices and sharing data between parallel processes.

If you're not familiar with the slices and how they are constructed, please check out my previous article about the Slice Header

What is a Data Race?

A data race occurs when two or more threads (or goroutines, in the case of Go) concurrently access shared memory, and at least one of those accesses is a write operation. If there are no proper synchronization mechanisms (such as locks or channels) in place to manage access, the result can be unpredictable behavior, including corruption of data, inconsistent states, or crashes.

In essence, a data race happens when:

  • Two or more threads (or goroutines) access the same memory location at the same time.
  • At least one of the threads (or goroutines) is writing to that memory.
  • There is no synchronization to control the access to that memory.

Because of this, the order in which the threads or goroutines access or modify the shared memory is unpredictable, leading to non-deterministic behavior that can vary between runs.

      ----------------------        --------------------- 
     | Thread A: Write      |      | Thread B: Read      |
      ----------------------        --------------------- 
     | 1. Reads x           |      | 1. Reads x          |
     | 2. Adds 1 to x       |      |                     |
     | 3. Writes new value  |      |                     |
      ----------------------        --------------------- 

                    Shared variable x
                    (Concurrent access without synchronization)

Here, Thread A is modifying x (writing to it), while Thread B is reading it at the same time. If both threads are running concurrently and there’s no synchronization, Thread B could read x before Thread A has finished updating it. As a result, the data could be incorrect or inconsistent.

Question: One of your teammates submitted the following code for a code review. Please review the code carefully and identify any potential issues.
And here the code that you have to review:

package main  

import (  
    "bufio"  
    "bytes"
    "io"
    "math/rand"
    "time"
)  

func genData() []byte {  
    r := rand.New(rand.NewSource(time.Now().Unix()))  
    buffer := make([]byte, 512)  
    if _, err := r.Read(buffer); err != nil {  
       return nil  
    }  
    return buffer  
}  

func publish(input []byte, output chan



What we have here?

The publish() function is responsible for reading the input data chunk by chunk and sending each chunk to the output channel. It begins by using bytes.NewReader(input) to create a reader from the input data, which allows the data to be read sequentially. A buffer of size 8 is created to hold each chunk of data as it’s being read from the input. During each iteration, reader.Read(buffer) reads up to 8 bytes from the input, and the function then sends a slice of this buffer (buffer[:n]) containing up to 8 bytes to the output channel. The loop continues until reader.Read(buffer) either encounters an error or reaches the end of the input data.

The consume() function handles the data chunks received from the channel. It processes these chunks using a bufio.Scanner, which scans each chunk of data, potentially breaking it into lines or tokens depending on how it’s configured. The variable b := scanner.Bytes() retrieves the current token being scanned. This function represents a basic input processing.

The main() creates a buffered channel chunkChannel with a capacity equal to workersCount, which is set to 4 in this case. The function then launches 4 worker goroutines, each of which will read data from the chunkChannel concurrently. Every time a worker receives a chunk of data, it processes the chunk by calling the consume() function. The publish() function reads the generated data, breaks it into chunks of up to 8 bytes, and sends them to the channel.

The program uses goroutines to create multiple consumers, allowing for concurrent data processing. Each consumer runs in a separate goroutine, processing chunks of data independently.

If you run this code, noting suspicious will happen:

[Running] go run "main.go"

[Done] exited with code=0 in 0.94 seconds

But there is a problem. We have a Data Race Risk. In this code, there’s a potential data race because the publish() function reuses the same buffer slice for each chunk. The consumers are reading from this buffer concurrently, and since slices share underlying memory, multiple consumers could be reading the same memory, leading to a data race. Let's try to use a race detection. Go provides a built-in tool to detect data races: the race detector. You can enable it by running your program with the -race flag:

go run -race main.go

If we add the -race flag to the run command we will receive the following output:

[Running] go run -race "main.go"

==================
WARNING: DATA RACE
Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44  0xcc
  bufio.(*Scanner).Scan()
      /GOROOT/go1.22.0/src/bufio/scan.go:219  0xef4
  main.consume()
      /GOPATH/example/main.go:40  0x140
  main.main.func1()
      /GOPATH/example/main.go:55  0x48

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44  0x168
  main.publish()
      /GOPATH/example/main.go:27  0xe4
  main.main()
      /GOPATH/example/main.go:60  0xdc

Goroutine 6 (running) created at:
  main.main()
      /GOPATH/example/main.go:53  0x50
==================
Found 1 data race(s)
exit status 66

[Done] exited with code=0 in 0.94 seconds

The warning you’re seeing is a classic data race detected by Go’s race detector. The warning message indicates that two goroutines are accessing the same memory location (0x00c00011e018) concurrently. One goroutine is reading from this memory, while another goroutine is writing to it at the same time, without proper synchronization.

The first part of the warning tells us that Goroutine 6 (which is one of the worker goroutines in your program) is reading from the memory address 0x00c00011e018 during a call to bufio.Scanner.Scan() inside the consume() function.

Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44  0xcc
  bufio.(*Scanner).Scan()
  /GOROOT/go1.22.0/src/bufio/scan.go:219  0xef4
  main.consume()
  /GOPATH/example/main.go:40  0x140
  main.main.func1()
  /GOPATH/example/main.go:55  0x48

The second part of the warning shows that the main goroutine previously wrote to the same memory location (0x00c00011e018) during a call to bytes.Reader.Read() inside the publish() function.

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325  0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44  0x168
  main.publish()
  /GOPATH/example/main.go:27  0xe4
  main.main()
  /GOPATH/example/main.go:60  0xdc

The final part of the warning explains that Goroutine 6 was created in the main function.

Goroutine 6 (running) created at:
  main.main()
  /GOPATH/example/main.go:53  0x50

In this case, while one goroutine (Goroutine 6) is reading from the buffer in consume(), the publish() function in the main goroutine is simultaneously writing to the same buffer, leading to the data race.

 -------------------                 -------------------- 
|     Publisher     |               |      Consumer      |
 -------------------                 -------------------- 
        |                                   |
        v                                   |
1. Read data into buffer                    |
        |                                   |
        v                                   |
2. Send slice of buffer to chunkChannel     |
        |                                   |
        v                                   |
  ----------------                          |
 |  chunkChannel  |                         |
  ----------------                          |
        |                                   |
        v                                   |
3. Consume reads from slice                 |
                                            v
                                    4. Concurrent access
                                    (Data Race occurs)

Why the Data Race Occurs

The data race in this code arises because of how Go slices work and how memory is shared between goroutines when a slice is reused. To fully understand this, let’s break it down into two parts: the behavior of the buffer slice and the mechanics of how the race occurs. When you pass a slice like buffer[:n] to a function or channel, what you are really passing is the slice header which contains a reference to the slice’s underlying array. Any modifications to the slice or the underlying array will affect all other references to that slice.

buffer = [ a, b, c, d, e, f, g, h ]  





func publish(input []byte, output chan



If you send buffer[:n] to a channel, both the publish() function and any consumer goroutines will be accessing the same memory. During each iteration, the reader.Read(buffer) function reads up to 8 bytes from the input data into this buffer slice. After reading, the publisher sends buffer[:n] to the output channel, where n is the number of bytes read in the current iteration.

The problem here is that buffer is reused across iterations. Every time reader.Read() is called, it overwrites the data stored in buffer.

  • Iteration 1: publish() function reads the first 8 bytes into buffer and sends buffer[:n] (say, [a, b, c, d, e, f, g, h]) to the channel.
  • Iteration 2: The publish() function overwrites the buffer with the next 8 bytes, let’s say [i, j, k, l, m, n, o, p], and sends buffer[:n] again.

At this point, if one of the worker goroutines is still processing the first chunk, it is now reading stale or corrupted data because the buffer has been overwritten by the second chunk. Reusing a slice neans sharing the same memory.

How to fix the Data Race?

To avoid the race condition, we must ensure that each chunk of data sent to the channel has its own independent memory. This can be achieved by creating a new slice for each chunk and copying the data from the buffer to this new slice. The key fix is to copy the contents of the buffer into a new slice before sending it to the chunkChannel:

chunk := make([]byte, n)    // Step 1: Create a new slice with its own memory
copy(chunk, buffer[:n])     // Step 2: Copy data from buffer to the new slice
output 



Why this fix works? By creating a new slice (chunk) for each iteration, you ensure that each chunk has its own memory. This prevents the consumers from reading from the buffer that the publisher is still modifying. copy() function copies the contents of the buffer into the newly allocated slice (chunk). This decouples the memory used by each chunk from the buffer. Now, when the publisher reads new data into the buffer, it doesn’t affect the chunks that have already been sent to the channel.

 -------------------------             ------------------------ 
|  Publisher (New Memory) |           | Consumers (Read Copy)  |
|  [ a, b, c ] --> chunk1 |           |  Reading: chunk1       |
|  [ d, e, f ] --> chunk2 |           |  Reading: chunk2       |
 -------------------------             ------------------------ 
         ↑                                    ↑
        (1)                                  (2)
   Publisher Creates New Chunk          Consumers Read Safely

This solution works is that it breaks the connection between the publisher and the consumers by eliminating shared memory. Each consumer now works on its own copy of the data, which the publisher does not modify. Here’s how the modified publish() function looks:

func publish(input []byte, output chan



Summary

Slices Are Reference Types:
As mentioned earlier, Go slices are reference types, meaning they point to an underlying array. When you pass a slice to a channel or a function, you’re passing a reference to that array, not the data itself. This is why reusing a slice leads to a data race: multiple goroutines end up referencing and modifying the same memory.

Memory Allocation:
When we create a new slice with make([]byte, n), Go allocates a separate block of memory for that slice. This means the new slice (chunk) has its own backing array, independent of the buffer. By copying the data from buffer[:n] into chunk, we ensure that each chunk has its own private memory space.

Decoupling Memory:
By decoupling the memory of each chunk from the buffer, the publisher can continue to read new data into the buffer without affecting the chunks that have already been sent to the channel. Each chunk now has its own independent copy of the data, so the consumers can process the chunks without interference from the publisher.

Preventing Data Races:
The main source of the data race was the concurrent access to the shared buffer. By creating new slices and copying the data, we eliminate the shared memory, and each goroutine operates on its own data. This removes the possibility of a race condition because there’s no longer any contention over the same memory.

Conclusion

The core of the fix is simple but powerful: by ensuring that each chunk of data has its own memory, we eliminate the shared resource (the buffer) that was causing the data race. This is achieved by copying the data from the buffer into a new slice before sending it to the channel. With this approach, each consumer works on its own copy of the data, independent of the publisher’s actions, ensuring safe concurrent processing without race conditions. This method of decoupling shared memory is a fundamental strategy in concurrent programming. It prevents the unpredictable behavior caused by race conditions and ensures that your Go programs remain safe, predictable, and correct, even when multiple goroutines are accessing data concurrently.

It's that easy!

版本聲明 本文轉載於:https://dev.to/crusty0gphr/tricky-golang-interview-questions-part-7-data-race-753?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 如何在 Golang 中檢索 XML 陣列中的所有元素而不僅限於第一個元素?
    如何在 Golang 中檢索 XML 陣列中的所有元素而不僅限於第一個元素?
    在XML 中解組數組元素:檢索所有元素,而不僅僅是第一個當使用xml.Unmarshal( 在Golang 中解組XML 陣列時[]byte(p.Val.Inner), &t),您可能會遇到僅檢索第一個元素的情況。若要解決此問題,請利用 xml.Decoder 並重複呼叫其 Decode 方法。 解...
    程式設計 發佈於2024-11-09
  • 帶有管理面板的輕量級 Rest Api,可輕鬆管理食物食譜。
    帶有管理面板的輕量級 Rest Api,可輕鬆管理食物食譜。
    你好, ?所有這篇文章都是關於我剛剛在 Github 上發布的 Django Rest Framework API。 如果您正在尋找一些簡單且高效的 API 來從管理面板管理食物食譜並將其返回以供客戶端使用,那麼此儲存庫適合您。 該程式碼是輕量級的,可以在任何低功耗迷你 PC(如 Raspbe...
    程式設計 發佈於2024-11-09
  • 如何在Java中使用堆疊將算術表達式解析為樹結構?
    如何在Java中使用堆疊將算術表達式解析為樹結構?
    在Java 中將算術表達式解析為樹結構從算術表達式創建自定義樹可能是一項具有挑戰性的任務,特別是在確保樹結構時準確反映表達式的操作和優先順序。 要實現這一點,一種有效的方法是使用堆疊。以下是該過程的逐步描述:初始化:從空堆疊開始。 處理令牌:迭代表達式中的每個標記:如果標記是左括號,則將其壓入堆疊。...
    程式設計 發佈於2024-11-09
  • 如何使用正規表示式來匹配帶有或不帶有可選 HTTP 和 WWW 前綴的 URL?
    如何使用正規表示式來匹配帶有或不帶有可選 HTTP 和 WWW 前綴的 URL?
    使用可選 HTTP 和 WWW 前綴匹配 URL正則表達式是執行複雜模式匹配任務的強大工具。當涉及到符合 URL 時,格式通常會有所不同,例如是否包含「http://www」。 使用正規表示式的解決方案匹配帶或不帶「http://www」的 URL。前綴,可以使用以下正規表示式:((https?|f...
    程式設計 發佈於2024-11-09
  • 如何在不依賴副檔名的情況下確定檔案類型?
    如何在不依賴副檔名的情況下確定檔案類型?
    如何在不依賴副檔名的情況下偵測檔案類型除了檢查檔案的副檔名之外,確定檔案是mp3 還是圖像格式是很有價值的程式設計中的任務。這是一個不依賴擴充的全面解決方案:PHP >= 5.3:$mimetype = finfo_fopen(fopen($filename, 'r'), FILEINFO_MIME...
    程式設計 發佈於2024-11-09
  • 在 Go 中使用 WebSocket 進行即時通信
    在 Go 中使用 WebSocket 進行即時通信
    构建需要实时更新的应用程序(例如聊天应用程序、实时通知或协作工具)需要一种比传统 HTTP 更快、更具交互性的通信方法。这就是 WebSockets 发挥作用的地方!今天,我们将探讨如何在 Go 中使用 WebSocket,以便您可以向应用程序添加实时功能。 在这篇文章中,我们将介绍: WebSoc...
    程式設計 發佈於2024-11-09
  • 在 JavaScript 中實作斐波那契數列:常見方法和變體
    在 JavaScript 中實作斐波那契數列:常見方法和變體
    作為開發人員,您可能遇到過編寫函數來計算斐波那契數列中的值的任務。這個經典問題經常出現在程式設計面試中,通常要求遞歸實現。然而,面試官有時可能會要求具體的方法。在本文中,我們將探討 JavaScript 中最常見的斐波那契數列實作。 什麼是斐波那契數列? 首先,讓我們回顧一下。斐波...
    程式設計 發佈於2024-11-09
  • 如何使用 .htaccess 更改共享伺服器上的 PHP 版本?
    如何使用 .htaccess 更改共享伺服器上的 PHP 版本?
    在共享伺服器上透過.htaccess 更改PHP 版本如果您正在操作共享伺服器並且需要更改PHP 版本,可以透過.htaccess文件來做到這一點。這允許您為您的網站運行特定的 PHP 版本,同時伺服器維護其預設版本。 要切換 PHP 版本,請按照下列步驟操作:找到 . htaccess 檔案: 該...
    程式設計 發佈於2024-11-09
  • 如何在Ajax資料載入過程中顯示進度條?
    如何在Ajax資料載入過程中顯示進度條?
    如何在Ajax 資料載入期間顯示進度條處理使用者觸發的事件(例如從下拉方塊中選擇值)時,通常會使用非同步擷取資料阿賈克斯。在獲取數據時,向用戶提供正在發生某事的視覺指示是有益的。本文探討了一種在 Ajax 請求期間顯示進度條的方法。 使用 Ajax 實作進度條要建立一個準確追蹤 Ajax 呼叫進度的...
    程式設計 發佈於2024-11-09
  • TCJavaScript 更新、TypeScript Beta、Node.js 等等
    TCJavaScript 更新、TypeScript Beta、Node.js 等等
    歡迎來到新一期的「JavaScript 本週」! 今天,我們從 TC39、Deno 2 正式版本、TypeScript 5.7 Beta 等方面獲得了一些針對 JavaScript 語言的巨大新更新,所以讓我們開始吧! TC39 更新:JavaScript 有何變化? 最近在東京...
    程式設計 發佈於2024-11-09
  • 為什麼 Bootstrap 用戶應該在下一個專案中考慮使用 Tailwind CSS?
    為什麼 Bootstrap 用戶應該在下一個專案中考慮使用 Tailwind CSS?
    Tailwind CSS 入门 Bootstrap 用户指南 大家好! ?如果您是 Bootstrap 的长期用户,并且对过渡到 Tailwind CSS 感到好奇,那么本指南适合您。 Tailwind 是一个实用程序优先的 CSS 框架,与 Bootstrap 基于组件的结构相比...
    程式設計 發佈於2024-11-09
  • 組合與繼承
    組合與繼承
    介绍 继承和组合是面向对象编程(OOP)中的两个基本概念,但它们的用法不同并且具有不同的目的。这篇文章的目的是回顾这些目的,以及选择它们时要记住的一些事情。 继承的概念 当我们考虑在设计中应用继承时,我们必须了解: 定义:在继承中,一个类(称为派生类或子类)可以从另...
    程式設計 發佈於2024-11-09
  • 如何在 JavaScript 中將浮點數轉換為整數?
    如何在 JavaScript 中將浮點數轉換為整數?
    如何在 JavaScript 中將浮點數轉換為整數要將浮點數轉換為整數,您可以使用 JavaScript內建數學物件。 Math 物件提供了多種處理數學運算的方法,包括舍入和截斷。 方法:1。截斷:截斷去除數字的小數部分。要截斷浮點數,請使用 Math.floor()。此方法向下舍入到小於或等於原始...
    程式設計 發佈於2024-11-09
  • 標準字串實作中的 c_str() 和 data() 是否有顯著差異?
    標準字串實作中的 c_str() 和 data() 是否有顯著差異?
    標準字串實作中的c_str()與data()STL中c_str()和data()函數的區別人們普遍認為類似的實作是基於空終止的。據推測,c_str() 總是提供以 null 結尾的字串,而 data() 則不然。 然而,在實踐中,實作經常透過讓 data() 在內部呼叫 c_str() 來消除這種區...
    程式設計 發佈於2024-11-09
  • C/C++ 中的類型轉換如何運作以及程式設計師應該注意哪些陷阱?
    C/C++ 中的類型轉換如何運作以及程式設計師應該注意哪些陷阱?
    了解C/C 中的類型轉換型別轉換是C 和C 程式設計的一個重要方面,涉及將資料從一種類型轉換為另一種類型。它在記憶體管理、資料操作和不同類型之間的互通性方面發揮著重要作用。然而,了解類型轉換的工作原理及其限制對於防止潛在錯誤至關重要。 明確型別轉換使用 (type) 語法執行的明確型別轉換可讓程式設...
    程式設計 發佈於2024-11-09

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3