Pandas DataFrame GroupBy Multiple Columns for Value Counts
In DataFrame manipulation with Pandas, grouping data by multiple columns can provide valuable insights. This article demonstrates how to count observations while grouping by two columns, as well as determine the highest count for each grouping.
Given a DataFrame with multiple columns, it is possible to apply the 'groupby' function to group data based on specific columns. Here, we have a DataFrame named 'df' with five columns: 'col1', 'col2', 'col3', 'col4', and 'col5'.
import pandas as pd
df = pd.DataFrame([
[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3],
list('AAABBBBABCBDDD'),
[1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8],
['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],
['1','3','3','2','4','2','5','3','6','3','5','1','1','1']
]).T
df.columns = ['col1','col2','col3','col4','col5']
Counting by Row Groups
To count the number of observations in each row group, use the 'groupby' function on the desired columns and then apply the 'size' function.
result = df.groupby(['col5', 'col2']).size()
This will produce a DataFrame with the grouped columns as the index and the size as the values.
print(result)
Determining the Highest Count
To determine the maximum count for each 'col2' value, use the 'groupby' function on 'col2' and then apply the 'max' function on the grouped data.
result = df.groupby(['col5', 'col2']).size().groupby(level=1).max()
This will produce a Series with the maximum count for each 'col2' value.
print(result)
In summary, using the 'groupby' and 'size' functions in Pandas allows for efficient analysis and aggregation of data, enabling users to extract insights about their data in various ways.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3