In today's era of Artificial Intelligence (AI), scaling businesses and streamlining workflows has never been easier or more accessible. AI and machine learning equip companies to make informed decisions, giving them a superpower to predict the future with just a few lines of code. Before taking a significant risk, wouldn't knowing if it's worth it be beneficial? Have you ever wondered how these AIs and machine learning models are trained to make such precise predictions?
In this article, we will explore, hands-on, how to create a machine-learning model that can make predictions from our input data. Join me on this journey as we delve into these principles together.
This is the first part of a series on mastering machine learning, focusing on the foundations and key concepts. In the second part, we will dive deeper into advanced techniques and real-world applications.
Machine Learning (ML) essentially means training a model to solve problems. It involves feeding large amounts of data (input-data) to a model, enabling it to learn and discover patterns from the data. Interestingly, the model's accuracy depends solely on the quantity and quality of data it is fed.
Machine learning extends beyond making predictions for enterprises; it powers innovations like self-driving cars, robotics, and much more. With continuous advancements in ML, there's no telling what incredible achievements lie ahead - it's simply amazing, right?
There's no contest as to why Python remains one of the most sought-after programming languages for machine learning. Its vast libraries, such as Scikit-Learn and Pandas, and its easy-to-read syntax make it ideal for ML tasks. Python offers a simplified and well-structured environment that allows developers to maximize their potential. As an open-source programming language, it benefits from contributions worldwide, making it even more suitable and advantageous for data science and machine learning.
Machine Learning (ML) is a vast and complex field that requires years of continuous learning and practice. While it's impossible to cover everything in this article, let's look into some important fundamentals of machine learning, specifically:
Regression: When a variable input (i) is supplied as data to train a machine, it produces a continuous numerical output (j). For example, a regression algorithm can be used to predict the price of an item based on its size and other features.
Classification: This algorithm makes predictions based on grouping by determining certain attributes that make up the group. For example, predicting whether a product review is positive, negative, or neutral.
When setting up your environment to create your first model, it's essential to understand some basic steps in ML and familiarize yourself with the libraries and tools we will explore in this article.
NumPy: Known for providing multidimensional arrays, NumPy is fundamental for numerical computations.
Pandas: A data analysis library that offers data frames (two-dimensional data structures similar to Excel spreadsheets) with rows and columns.
Matplotlib: Matplotlib is a two-dimensional plotting library for creating graphs and plots.
Scikit-Learn: The most popular machine learning library, providing all common algorithms like decision trees, neural networks, and more.
Standard IDEs such as VS Code or terminals may not be ideal when creating a model due to the difficulty in inspecting data while writing code. For our learning purposes, the recommended environment is Jupyter Notebook, which provides an interactive platform to write and execute code, visualize data, and document the process simultaneously.
Download Anaconda:
Anaconda is a popular distribution of Python and R for scientific computing and data science. It includes the Jupyter Notebook and other essential tools.
Download Anaconda from this link.
Install Anaconda:
Follow the installation instructions based on your operating system (Windows, macOS, or Linux).
After the installation is complete, you will have access to the Anaconda Navigator, which is a graphical interface for managing your Anaconda packages, environments, and notebooks.
Launching Jupyter Notebook:
Open the Anaconda Navigator
In the Navigator, click on the "Environments" tab.
Select the "base (root)" environment, and then click "Open with Terminal" or "Open Terminal" (the exact wording may vary depending on the OS).
In the terminal window that opens, type the command jupyter notebook and press Enter.
This command will launch the Jupyter Notebook server and automatically open a new tab in your default web browser, displaying the Jupyter Notebook interface.
The browser window will show a file directory where you can navigate to your project folder or create new notebooks.
Click "New" and select "Python 3" (or the appropriate kernel) to create a new Jupyter Notebook.
You can now start writing and executing your code in the cells of the notebook. The interface allows you to document your code, visualize data, and explore datasets interactively.
In building your first model, we have to take cognizance of the steps in Machine Learning as discussed earlier, which are:
Now, let's assume a scenario involving an online bookstore where users sign up and provide their necessary information such as name, age, and gender. Based on their profile, we aim to recommend various books they are likely to buy and build a model that helps boost sales.
First, we need to feed the model with sample data from existing users. The model will learn patterns from this data to make predictions. When a new user signs up, we can tell the model, "Hey, we have a new user with this profile. What kind of book are they likely to be interested in?" The model will then recommend, for instance, a history or a romance novel, and based on that, we can make personalized suggestions to the user.
By following these steps, you will be able to build a machine-learning model that effectively recommends books to users, enhancing their experience and boosting sales for the online bookstore. You can gain access to the datasets used in this tutorial here.
Let's walk through a sample code snippet to illustrate the process of testing the accuracy of the model:
import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score
We start by importing the essential libraries. pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library.
book_data = pd.read_csv('book_Data.csv') Read the dataset from a `CSV file` into a pandas DataFrame.
X = book_data.drop(columns=['Genre']) y = book_data['Genre']
Create a feature matrix X by dropping the 'Genre' column from the dataset and a target vector y containing the 'Genre' column.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Split the data into training and testing sets with 80% for training and 20% for testing.
model = DecisionTreeClassifier() model.fit(X_train, y_train)
Initialize the DecisionTreeClassifier model and train it using the training data.
predictions = model.predict(X_test) score = accuracy_score(y_test, predictions) print(score)
Make predictions on the test data and calculate the accuracy of the model by comparing the test labels to the predictions. Finally, print the accuracy score to the console.
In this example, we start by importing the essential libraries. Pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library. We then read the dataset from a CSV file into a pandas DataFrame, prepare the data by creating a feature matrix X and a target vector y, split the data into training and testing sets, initialize and train the DecisionTreeClassifier model, make predictions on the test data, and calculate the accuracy of the model by comparing the test labels to the predictions.
Depending on the data you're using, the results will vary. For instance, in the output below, the accuracy score displayed is 0.7, but it may show 0.5 when the code is run again with a different dataset. The accuracy score will vary, a higher score indicates a more accurate model.
Output:
Now that you've successfully created your model, it's important to note that the kind of data used to train your model is crucial to the accuracy and reliability of your predictions. In Mastering Data Analysis: Unveiling the Power of Fairness and Bias in Information, I discussed extensively the importance of data cleaning and ensuring data fairness. Depending on what you intend to do with your model, it is essential to consider if your data is fair and free of any bias. Data cleaning is a very vital part of machine learning, ensuring that your model is trained on accurate, unbiased data. Some of these ethical considerations are:
Removing Outliers: Ensure that the data does not contain extreme values that could skew the model's predictions.
Handling Missing Values: Address any missing data points to avoid inaccurate predictions.
Standardizing Data: Make sure the data is in a consistent format, allowing the model to interpret it correctly.
Balancing the Dataset: Ensure that your dataset represents all categories fairly to avoid bias in predictions.
Ensuring Data Fairness: Check for any biases in your data that could lead to unfair predictions and take steps to mitigate them.
By addressing these ethical considerations, you ensure that your model is not only accurate but also fair and reliable, providing meaningful predictions.
Machine learning is a powerful tool that can transform data into valuable insights and predictions. In this article, we explored the fundamentals of machine learning, focusing on supervised and unsupervised learning, and demonstrated how to set up your environment and build a simple machine learning model using Python and its libraries. By following these steps and experimenting with different algorithms and datasets, you can unlock the potential of machine learning to solve complex problems and make data-driven decisions.
In the next part of this series, we will dive deeper into advanced techniques and real-world applications of machine learning, exploring topics such as feature engineering, model evaluation, and optimization. Stay tuned for more insights and practical examples to enhance your machine-learning journey.
Programming with Mosh
Machine Learning Tutorial geeksforgeeks
免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。
Copyright© 2022 湘ICP备2022001581号-3