Export JSON to CSV: a note on CSV and Unicode

Front page > Programming > Export JSON to CSV: a note on CSV and Unicode

Export JSON to CSV: a note on CSV and Unicode

Published on 2024-08-01

Browse:168

Export JSON to CSV: a note on CSV and Unicode

There are multiple JS libraries that allow export to excel. But what if we want to follow a minimalist approach avoiding extra dependencies.

The simplest approach would be to produce CSV from JSON that can easily be opened in excel.

But before I show the conversion logic, let's understand what CSV is and which encoding we shall use when creating a CSV file.

CSV format

This RFC 4180 Common Format and MIME Type for Comma-Separated Values (CSV) Files specifies definition of the CSV format. Note that this is a memo only as the CSV format is not officially standardized.

Main definitions

Each record is located on a separate line, delimited by a line break (CRLF).
The last record in the file may or may not have an ending line break.
Header should contain the same number of fields throughout the file.
Each field may or may not be enclosed in double quotes.
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

Encoding

In my JSON data I have characters æ å ø from ISO Latin-1 (ISO/IEC 8859-1) character set that have to be considered when creating a csv file.

Some excerpts The Unicode® Standard Version 15.0 to consider.

Unicode vs UTF-8/16/32

From The Unicode® Standard Version 15.0:

Unicode is the universal character encoding standard for written characters and text, containing 149,186 characters from the world’s scripts.
Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8).
The Unicode Consortium fully endorses the use of any of the three Unicode encoding forms as a conformant way of implementing the Unicode Standard. It is important not to fall into the trap of trying to distinguish “UTF-8 versus Unicode,” for example. UTF-8, UTF-16, and UTF-32 are all equally valid and conformant ways of implementing the encoded characters of the Unicode Standard.

Byte Order Mark (BOM)

The character U FEFF (UTF-8 EF BB BF) used for the byte order mark is named zero width no-break space.
The UTF-16 and UTF-32 encoding forms of Unicode plain text are sensitive to the byte ordering that is used when writing data to a file.
Identification of the byte sequence at the beginning of a data stream can be taken as a near-certain indication that the data stream is using the UTF-8 encoding scheme.

In short, adding zero width no-break space before the CSV string will enforce Excel to apply UTF-8 encoding instead of 1252: Western European (Windows) or some other encoding which Excel will choose in case the U FEFF character is not provided.

I will show the difference between producing CSV file with zero width no-break space and without it in the next post of this series...

Release Statement This article is reproduced at: https://dev.to/andrewelans/export-json-to-csv-what-is-csv-and-unicode-2341?1 If there is any infringement, please contact [email protected] to delete it

Latest tutorial More>

How Can I UNION Database Tables with Different Numbers of Columns?
Combined tables with different columns] Can encounter challenges when trying to merge database tables with different columns. A straightforward way i...

Programming Posted on 2025-04-20
How to deal with sliced memory in Go language garbage collection?
Garbage Collection in Go Slices: A Detailed AnalysisIn Go, a slice is a dynamic array that references an underlying array. When working with slices, i...

Programming Posted on 2025-04-20
How do you extract a random element from an array in PHP?
Random Selection from an ArrayIn PHP, obtaining a random item from an array can be accomplished with ease. Consider the following array:$items = [523,...

Programming Posted on 2025-04-20
Implementing a slash method of left-aligning text in all browsers
]]Text alignment on slanted lines Background Achieving Left-Aligned Text on a slanted line can pose a challenge, particully when secreta. compatibilit...

Programming Posted on 2025-04-20
Method to correctly convert Latin1 characters to UTF8 in UTF8 MySQL table
Convert Latin1 Characters in a UTF8 Table to UTF8You've encountered an issue where characters with diacritics (e.g., "Jáuò Iñe") were in...

Programming Posted on 2025-04-20
How Can You Define Variables in Laravel Blade Templates Elegantly?
Defining Variables in Laravel Blade Templates with EleganceUnderstanding how to assign variables in Blade templates is crucial for storing data for la...

Programming Posted on 2025-04-20
Is There a Performance Difference Between Using a For-Each Loop and an Iterator for Collection Traversal in Java?
For Each Loop vs. Iterator: Efficiency in Collection TraversalIntroductionWhen traversing a collection in Java, the choice arises between using a for-...

Programming Posted on 2025-04-20
How Can I Maintain Custom JTable Cell Rendering After Cell Editing?
Maintaining JTable Cell Rendering After Cell EditIn a JTable, implementing custom cell rendering and editing capabilities can enhance the user experie...

Programming Posted on 2025-04-20
Why do images still have borders in Chrome? `border: none;` invalid solution
Removing the Image Border in ChromeOne frequent issue encountered when working with images in Chrome and IE9 is the appearance of a persistent thin bo...

Programming Posted on 2025-04-20
Why Doesn't `body { margin: 0; }` Always Remove Top Margin in CSS?
Addressing Body Margin Removal in CSSFor novice web developers, removing the margin of the body element can be a confusing task. Often, the code provi...

Programming Posted on 2025-04-20
Why doesn't Java have unsigned integers?
Understanding Java's Absence of Unsigned IntegersDespite the potential benefits of unsigned integers, such as reduced risk of overflow, self-docum...

Programming Posted on 2025-04-20
Why does the session data lose after PHP refresh?
Troubleshooting PHP Session Data LossPHP sessions are a valuable tool for storing and retrieving data across multiple pages. However, issues can arise...

Programming Posted on 2025-04-20
Can I use NOLOCK in SQL Server to improve performance?
NOLOCK in SQL Server: Performance improvement and risk coexist SQL Server's transaction isolation level ensures that data modifications for conc...

Programming Posted on 2025-04-20
How to Convert a Pandas DataFrame Column to DateTime Format and Filter by Date?
Transform Pandas DataFrame Column to DateTime FormatScenario:Data within a Pandas DataFrame often exists in various formats, including strings. When w...

Programming Posted on 2025-04-20
Detect click object in sprite group and resolve "AttributeError: Group has no attribute rect" error
Detecting Clicked Objects within a Sprite GroupWhen working with sprites in a Pygame application, it becomes necessary to detect when the user clicks ...

Programming Posted on 2025-04-20