"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > Amazon product dataset

Amazon product dataset

Published on 2024-08-29
Browse:999

Hi, I found a dataset of Amazon products in Kaggle and decided to find a relationship between price and star rating.

Full code in :
https://github.com/victordalet/Kaggle_analysis/tree/feat/amazon_products


I - Preparing data

To do this, I use SQLAlchemy to convert the csv file into a small database, and plotly to display the information.

pip install SQLAlchemy
pip install plotly

In the following script, I extract the data and obtain :

  • ratio between price and number of stars
  • final rating and number of stars
  • price and number of stars
import pandas as pd
from sqlalchemy import create_engine, text
import plotly.express as px


class Main:
    def __init__(self):
        self.result = None
        self.connection = None

        self.engine = create_engine("sqlite:///my_database.db", echo=False)
        self.df = pd.read_csv("amazon_product.csv")
        self.df.to_sql("products", self.engine, index=False, if_exists="append")

        self.get_data()
        self.transform_data()
        self.display_graph()
        self.get_data_number_start_and_price()
        self.transform_data()
        self.display_graph()
        self.get_data_number_start_and_start()
        self.display_graph()

    def get_data(self):
        self.connection = self.engine.connect()
        query = text(
            "SELECT product_price, product_star_rating FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()

    def get_data_number_start_and_price(self):
        query = text(
            "SELECT product_price, product_num_ratings FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()

    def get_data_number_start_and_start(self):
        query = text(
            "SELECT product_star_rating, product_num_ratings FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()
        for i in range(len(self.result)):
            self.result[i] = [self.result[i][0], self.result[i][1]]

    def transform_data(self):
        for i in range(len(self.result)):
            self.result[i] = [float(self.result[i][0].split("$")[1]), self.result[i][1]]

    def display_graph(self):
        fig = px.scatter(
            self.result, x=0, y=1, title="Amazon Product Price vs Star Rating"
        )
        fig.show()


Main()

II - Result

Price and notation

Amazon product dataset

Price and number of notation

Amazon product dataset

Notation and number of opinion

Amazon product dataset

III - Conclusion

We can see, there's not necessarily a relationship between price and rating, but the higher the price, the lower the rating, and the more reviews, the higher the rating.
Which seems logical, since if a product is bought a lot, it means it's popular.

Release Statement This article is reproduced at: https://dev.to/victordalet/amazon-product-dataset-h00?1 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3