"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > Golang: How Observability and Profiling Revealed Nearly Undetectable Throttling

Golang: How Observability and Profiling Revealed Nearly Undetectable Throttling

Published on 2024-11-08
Browse:507

In a personal project with Go, which obtains information on financial assets from Bovespa.
The system makes intense use of concurrency and parallelism with goroutines, updating asset information (along with business calculations) every 8 seconds.
Initially, no errors or warnings appeared, but I noticed that some goroutines were taking longer than others to execute.

To be more specific, while the p99 time was at 0.03 ms, at some points, it increased to 0.9 ms. This led me to investigate the problem further.

I discovered that I was using a semaphore goroutine pool, which was created based on the GOMAXPROCS variable.
However, I realized there was a problem with this approach.

When we use the GOMAXPROCS variable, it does not correctly capture the number of cores available in the container. If the container has fewer available cores than the VM's total, it considers the VM's total. For example, my VM has 8 cores available, but the container only had 4. This resulted in 8 goroutines being created to run at the same time, causing throttling.

After much research overnight, I found a library developed by Uber that automatically adjusts the GOMAXPROCS variable more efficiently, regardless of whether it is in a container or not. This solution proved to be extremely stable and efficient: automaxprocs

Golang: Como a observabilidade e profiling revelaram um throttling quase indetectável uber-go / automaxprocs

Automatically set GOMAXPROCS to match Linux container CPU quota.

automaxprocs Golang: Como a observabilidade e profiling revelaram um throttling quase indetectável Golang: Como a observabilidade e profiling revelaram um throttling quase indetectável Golang: Como a observabilidade e profiling revelaram um throttling quase indetectável

Automatically set GOMAXPROCS to match Linux container CPU quota.

Installation

go get -u go.uber.org/automaxprocs

Quick Start

import _ "go.uber.org/automaxprocs"

func main() {
  // Your application logic here.
}
Enter fullscreen mode Exit fullscreen mode

Performance

Data measured from Uber's internal load balancer. We ran the load balancer with 200% CPU quota (i.e., 2 cores):

GOMAXPROCS RPS P50 (ms) P99.9 (ms)
1 28,893.18 1.46 19.70
2 (equal to quota) 44,715.07 0.84 26.38
3 44,212.93 0.66 30.07
4 41,071.15 0.57 42.94
8 33,111.69 0.43 64.32
Default (24) 22,191.40 0.45 76.19

When GOMAXPROCS is increased above the CPU quota, we see P50 decrease slightly, but see significant increases to P99. We also see that the total RPS handled also decreases.

When GOMAXPROCS is higher than the CPU quota allocated, we also saw significant throttling:

$ cat /sys/fs/cgroup/cpu,cpuacct/system.slice/[...]/cpu.stat
nr_periods 42227334
nr_throttled 131923
throttled_time 88613212216618

Once GOMAXPROCS was reduced to match the CPU quota, we saw no CPU throttling.

View on GitHub
.

After implementing the use of this library, the problem was resolved, and now the p99 time remained at 0.02 ms constantly. This experience highlighted the importance of observability and profiling in concurrent systems.

The following is a very simple example, but one that demonstrates the difference in performance.

Using Go's native testing and benckmak package, I created two files:

benchmarking_with_enhancement_test.go:

package main

import (
    _ "go.uber.org/automaxprocs"
    "runtime"
    "sync"
    "testing"
)

// BenchmarkWithEnhancement Função com melhoria, para adicionar o indice do loop em um array de inteiro
func BenchmarkWithEnhancement(b *testing.B) {
    // Obtém o número de CPUs disponíveis
    numCPUs := runtime.NumCPU()
    // Define o máximo de CPUs para serem usadas pelo programa
    maxGoroutines := runtime.GOMAXPROCS(numCPUs)
    // Criação do semáforo
    semaphore := make(chan struct{}, maxGoroutines)

    var (
        // Espera para grupo de goroutines finalizar
        wg sync.WaitGroup
        // Propriade
        mu sync.Mutex
        // Lista para armazenar inteiros
        list []int
    )

    // Loop com mihão de indices
    for i := 0; i 



benchmarking_without_enhancement_test.go:

package main

import (
    "runtime"
    "sync"
    "testing"
)

// BenchmarkWithoutEnhancement Função sem a melhoria, para adicionar o indice do loop em um array de inteiro
func BenchmarkWithoutEnhancement(b *testing.B) {
    // Obtém o número de CPUs disponíveis
    numCPUs := runtime.NumCPU()
    // Define o máximo de CPUs para serem usadas pelo programa
    maxGoroutines := runtime.GOMAXPROCS(numCPUs)
    // Criação do semáforo
    semaphore := make(chan struct{}, maxGoroutines)

    var (
        // Espera para grupo de goroutines finalizar
        wg sync.WaitGroup
        // Propriade
        mu sync.Mutex
        // Lista para armazenar inteiros
        list []int
    )

    // Loop com mihão de indices
    for i := 0; i 



The difference between them is that one uses the Uber library import.

When running the benchmark assuming that 2 CPUs would be used, the result was:

Golang: Como a observabilidade e profiling revelaram um throttling quase indetectável

ns/op: Provides an average in nanoseconds of how long it takes to perform a specific operation.

Note that the total available of my CPU is 8 cores, and that is what the runtime.NumCPU() property returned. However, as in running the benchmark, I defined that the use would be only two CPUs, and the file that did not use automaxprocs, defined that the execution limit at a time would be 8 goroutines, while the most efficient would be 2, because this way using less allocation makes execution more efficient.

So, the importance of observability and profiling of our applications is clear.

Release Statement This article is reproduced at: https://dev.to/mggcmatheus/golang-como-a-observabilidade-e-profiling-revelaram-um-throttling-quase-indetectavel-1h5p?1 If there is any infringement, please contact [email protected] delete
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3