This project asks ChatGPT to teach you how to generate Python code for creating plots and visualizations. Simply provide a brief description of the plot you want to create, and ChatGPT will generate the necessary code using the popular Matplotlib library. This tool allows even users with no programming experience to develop professional-grade plots for their data. Whether you’re a researcher, data scientist, or student, ChatGPT makes it easy to create beautiful and informative plots in Python. (Generated by ChatGPT with minor changes).
The project in JupyterNotebook is available free in Cognitive Class.
Asking :”teach me how to plot in python”
The result:
There are many ways to plot data in Python. Here is a simple example using Matplotlib, a popular data visualization library in Python:
# using the provided by chatGPT to run in the notebook import matplotlib.pyplot as plt # Sample data x = [1, 2, 3, 4] y = [1, 4, 9, 16] # Create a figure and an axes fig, ax = plt.subplots() # Plot the data ax.plot(x, y) # Add a title ax.set_title('My Plot') # Show the plot plt.show()
This will create a simple line plot of the data in x and y. There are many options and customization options available for creating plots in Matplotlib. For example, you can change the appearance of the lines, change the axis labels, add a legend, etc.
Here is some more information on Matplotlib and other plotting libraries in Python:
The Matplotlib documentation: https://matplotlib.org/stable/index.html
A tutorial on Matplotlib: https://matplotlib.org/stable/tutorials/index.html
Seaborn, a data visualization library based on Matplotlib: https://seaborn.pydata.org/
Plotly, a library for creating interactive plots: https://plotly.com/python/
As you can see, ChatGPT provides introductory text and code blocks (with a copy icon for convenience), and concludes the conversation with additional useful information.
By simply asking “show me more examples of plots in python”. In fact, chatGPT remembers your previous conversations.
The result:
Here are a few more examples of plots that you can create in Python using the Matplotlib library: Scatter plot
import matplotlib.pyplot as plt # Sample data x = [1, 2, 3, 4] y = [1, 4, 9, 16] # Create a figure and an axes fig, ax = plt.subplots() # Plot the data ax.scatter(x, y) # Add a title ax.set_title('My Scatter Plot') # Show the plot plt.show()
Bar plot
import matplotlib.pyplot as plt # Sample data x = [1, 2, 3, 4] y = [1, 4, 9, 16] # Create a figure and an axes fig, ax = plt.subplots() # Plot the data ax.bar(x, y) # Add a title ax.set_title('My Bar Plot') # Show the plot plt.show()
Pie chart
import matplotlib.pyplot as plt # Sample data labels = ['Apples', 'Bananas', 'Oranges'] sizes = [30, 45, 25] # Create a figure and an axes fig, ax = plt.subplots() # Plot the data ax.pie(sizes, labels=labels) # Add a title ax.set_title('My Pie Chart') # Show the plot plt.show()
Keep in mind that these are just a few examples, and there are many other types of plots that you can create in Python using Matplotlib. You can find more information and examples in the Matplotlib documentation: https://matplotlib.org/stable/index.html.
Let's plot from a sample dataset
The intial stage you need to load data. We can ask ChatGPT to suggest some sample datasets by asking “Could you tell me sample dataset for beginners in python?”
The result:
Sure! Here are a few datasets that might be useful for beginners in Python:Iris dataset: This is a small dataset that contains information about different species of iris flowers, including the sepal and petal length and width. It is often used as a beginner’s dataset for learning machine learning and data analysis techniques. You can load the Iris dataset in Python using the sklearn.datasets module.
Boston Housing dataset: This dataset contains information about housing prices in the Boston area, including the median value of homes, crime rate, and other factors. It is often used as a beginner’s dataset for learning regression techniques. You can load the Boston Housing dataset in Python using the sklearn.datasets module.
MNIST dataset: This is a dataset of handwritten digits, with 60,000 training examples and 10,000 test examples. It is often used as a beginner’s dataset for learning image classification techniques. You can load the MNIST dataset in Python using the keras.datasets module.
From these suggested datasets, we select Iris dataset to continue to work with as a sample.
h
We need a proper sample in tabular format, so we asked “load iris data with table looking format”
The result:
To load the Iris dataset in a table-like format in Python, you can use the Pandas library. Here is an example of how you can load the Iris dataset and display it in a table-like format:
import pandas as pd from sklearn import datasets # Load the Iris dataset iris = datasets.load_iris() # Create a DataFrame with the features and target values df = pd.DataFrame(iris.data, columns=iris.feature_names) df['target'] = iris.target # Display the DataFrame print(df) #--> or we can call the dataset to displace it df
_This will create a Pandas DataFrame with the features (sepal length, sepal width, petal length, and petal width) of the Iris dataset as columns, and the target values (the species of iris) as an additional column.
You can then use the usual Pandas functions to manipulate and analyze the data in the DataFrame. For example, you can use the head() function to display the first few rows of the DataFrame, or use the describe() function to get summary statistics for the numerical columns._
Conveniently, we can ask follow-up questions; here we asked “plot it”
The result:
To plot the Iris dataset using Matplotlib, you can use the following code:
# --> Trying of the suggestions from ChatGPT # --> we removed the redundant lines (e.g. loading libraries and importing data) from the suggestion # Create a figure and an axes fig, ax = plt.subplots() # Plot the data, coloring the points by the target values scatter = ax.scatter(df['sepal length (cm)'], df['sepal width (cm)'], c=df['target']) # Add a legend legend = ax.legend(*scatter.legend_elements(), title='Species') # Add a title ax.set_title('Iris dataset') # Show the plot plt.show()
We select and ask to plot a specific column in the dataset. For example, asking “show me the distribution of the column ‘sepal length (cm)”
The result:To show the distribution of a column in a Pandas DataFrame, you can use the hist() function of the DataFrame. For example, to show the distribution of the ‘sepal length (cm)’ column in the Iris dataset, you can use the following code:
# Create a figure and an axes fig, ax = plt.subplots() # Plot the distribution of the 'sepal length (cm)' column df['sepal length (cm)'].hist(ax=ax) # Add a title ax.set_title('Distribution of sepal length') # Show the plot plt.show()
As you may notice, the plot does not have labels. Therefore, we can **ask “redo it with proper axis labels”.
The result:**
# Create a figure and an axes fig, ax = plt.subplots() # Plot the distribution of the 'sepal length (cm)' column df['sepal length (cm)'].hist(ax=ax) # Add axis labels ax.set_xlabel('Sepal length (cm)') ax.set_ylabel('Frequency') # Add a title ax.set_title('Distribution of sepal length') # Show the plot plt.show()
Another useful plot to show the distribution is using boxplot which enables us to visualize standard deviation and identify outliers.
Here we ask for a boxplot of all the columns showing in the single plot by asking “show me boxplot of distributions of all columns in same plot”.
The result:To show the distribution of all columns in a Pandas DataFrame as boxplots on the same plot, you can use the boxplot() function of the DataFrame.
For example, to show the distributions of all columns in the Iris dataset as boxplots, you can use the following code:
import matplotlib.pyplot as plt import pandas as pd from sklearn import datasets # Load the Iris dataset iris = datasets.load_iris() # Create a DataFrame with the features df = pd.DataFrame(iris.data, columns=iris.feature_names) # Add a column with the target values df['target'] = iris.target # Create a figure and an axes fig, ax = plt.subplots() # Plot the boxplots of all columns df.boxplot(ax=ax) # Add a title ax.set_title('Distributions of all columns') # Show the plot plt.show()
ChatGPT is an excellent tool for working on your project. It is important to frame the question in a way that addresses your problem. Keep in mind that dividing your project into smaller blocks and addressing them individually is the key to addressing the project.
To ask a programming question from ChatGPT, you can simply provide a clear and concise description of your problem, along with any relevant code or error messages. You should also specify the programming language and any libraries or frameworks that you are using.
So, let's ask ChatGPT what it thinks.
When formulating a prompt for ChatGPT, it’s important to keep a few things in mind in order to get the most out of the model:
Be specific and clear about what you want the model to do. For example, instead of asking “What do you know about this topic?”, ask “Can you provide a brief overview of the history of the topic?”Provide context if necessary. If you’re asking a question about a specific event or person, give some background information so that the model can understand the context in which you’re asking the question.
Be concise. Long and complex prompts can be difficult for the model to understand and may result in less accurate or relevant responses.
Avoid using overly technical language or jargon that the model may not be familiar with.
Avoid asking multiple questions in one prompt, it is better to ask one question at a time.
Here are a few examples of good prompts for ChatGPT:
“Can you explain the main difference between supervised and unsupervised learning?”“What are the main components of a neural network and how do they work?”
“Please write a short story about a robot who can read minds”
On the other hand, some examples of poor prompts would be:
“everything”
“What can you tell me?”
“what you know”By following these guidelines, you can help ensure that you get the most accurate and relevant responses from the model when using ChatGPT.
ChatGPT is not perfect; the example provided is from sklearn. In the real world, you need to load your own data from your resource, using Pandas (the example below shows loading iris from local storage).
In general, as a practicing data scientist, you will frequently encounter the multifaceted process of data collection, preparation, model construction, and interpretation. These stages each have their own unique requirements in terms of skills and resources. For instance, you may utilize pandas to load and manipulate your datasets. It’s important to keep in mind that the examples provided by ChatGPT are intended to serve as a foundational learning aid, and may not be suitable for handling more complex or substantial datasets.
So, let's ask ChatGPT what it thinks.
There are several reasons why ChatGPT and other large language models are not going to replace data scientists:
Data scientists have a much broader skill set: Data scientists are trained in a wide range of skills, including data analysis, machine learning, programming, statistics, and more. ChatGPT and other large language models are only capable of generating text based on a given prompt, and they do not have the other skills that data scientists possess.
Data scientists can interpret and explain the results of their work: Data scientists are trained to interpret and explain the results of their work, which is critical for decision-making and communication with stakeholders. ChatGPT and other large language models are not capable of interpreting or explaining the results of their work.
Data scientists can work with a wide range of data: Data scientists are trained to work with a wide range of data, including structured and unstructured data, and they can use a variety of tools and techniques to extract insights from the data. ChatGPT and other large language models are only capable of generating text based on a given prompt, and they are not able to work with other types of data.
Data scientists can work with a wide range of stakeholders: Data scientists are trained to work with a wide range of stakeholders, including business leaders, subject matter experts, and technical experts, and they can communicate complex concepts in a clear and understandable way. ChatGPT and other large language models are not able to work with stakeholders or communicate complex concepts.
Check out this beginner friendly guided project other free courses developed by IBM developers and data scientists at https://cognitiveclass.ai/
Until next time!
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3