"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How can I efficiently read and write CSV files in Go using concurrency?

How can I efficiently read and write CSV files in Go using concurrency?

Published on 2024-11-09
Browse:775

How can I efficiently read and write CSV files in Go using concurrency?

Efficient CSV Read and Write in Go

The task of reading and writing a CSV file efficiently in Go involves optimizing the I/O operations. Consider the following code snippet that reads a CSV file, performs calculations on the data, and writes the results to a new CSV file:

package main

import (
  "encoding/csv"
  "fmt"
  "log"
  "os"
  "strconv"
)

func ReadRow(r *csv.Reader) (map[string]string, error) {
  record, err := r.Read()
  if err == io.EOF {
    return nil, io.EOF
  }
  if err != nil {
      return nil, err
  }
  m := make(map[string]string)
  for i, v := range record {
    m[strconv.Itoa(i)] = v
  }
  return m, nil
}

func main() {
  // load data csv
  csvFile, err := os.Open("./path/to/datafile.csv")
  if err != nil {
    log.Fatal(err)
  }
  defer csvFile.Close()

  // create channel to process rows concurrently
  recCh := make(chan map[string]string, 10)
  go func() {
    defer close(recCh)
    r := csv.NewReader(csvFile)
    if _, err := r.Read(); err != nil { //read header
        log.Fatal(err)
    }

    for {
        rec, err := ReadRow(r)
        if err == io.EOF {
          return  // no more rows to read
        }
        if err != nil {
          log.Fatal(err)
        }
        recCh <- rec
    }
  }()

  // write results to a new csv
  outfile, err := os.Create("./where/to/write/resultsfile.csv"))
  if err != nil {
    log.Fatal("Unable to open output")
  }
  defer outfile.Close()
  writer := csv.NewWriter(outfile)

  for record := range recCh {
    time := record["0"]
    value := record["1"]

    // get float values
    floatValue, err := strconv.ParseFloat(value, 64)
    if err != nil {
      log.Fatal("Record: %v, Error: %v", floatValue, err)
    }

    // calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
    score := calculateStuff(floatValue)

    valueString := strconv.FormatFloat(floatValue, 'f', 8, 64)
    scoreString := strconv.FormatFloat(prob, 'f', 8, 64)
    //fmt.Printf("Result: %v\n", []string{time, valueString, scoreString})

    writer.Write([]string{time, valueString, scoreString})
  }

  writer.Flush()
}

The key improvement in this code is the use of concurrency to process CSV rows one at a time. By using a channel, we can read rows from the input CSV file in a goroutine and write the results to the output CSV file in the main routine concurrently. This approach avoids loading the entire file into memory, which can significantly reduce the memory consumption and improve the performance.

Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3