Skipping Rows During CSV Import with Pandas
When importing CSV data using Pandas, it's often necessary to skip rows that you don't want to include in your analysis. However, the ambiguity surrounding the skiprows argument can be confusing.
The syntax for skiprows is as follows:
skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file.
The question arises: How does Pandas know whether to skip the first row or the row with index 1 when skiprows=1 is specified?
To unravel this, let's perform an experiment using a sample CSV file with three rows:
1, 2 3, 4 5, 6
Skipping the Row with Index 1
If you want to skip the row with index 1, pass skiprows as a list:
import pandas as pd
from io import StringIO
s = """1, 2
... 3, 4
... 5, 6"""
df = pd.read_csv(StringIO(s), skiprows=[1], header=None) # Skip row with index 1
print(df)
Output:
0 1 0 1 2 1 5 6
Skipping a Number of Rows
To skip a specific number of rows (in this case, 1), pass skiprows as an integer:
df = pd.read_csv(StringIO(s), skiprows=1, header=None) # Skip the first row
print(df)
Output:
0 1 0 3 4 1 5 6
Hence, it's clear that the skiprows argument behaves differently depending on whether you provide a list or an integer. If you want to skip a row by its index, use a list. Otherwise, use an integer to skip a specified number of rows from the beginning of the file.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3