”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 使用 Python 掌握机器学习:基础和关键概念

使用 Python 掌握机器学习:基础和关键概念

发布于2024-11-04
浏览:625

In today's era of Artificial Intelligence (AI), scaling businesses and streamlining workflows has never been easier or more accessible. AI and machine learning equip companies to make informed decisions, giving them a superpower to predict the future with just a few lines of code. Before taking a significant risk, wouldn't knowing if it's worth it be beneficial? Have you ever wondered how these AIs and machine learning models are trained to make such precise predictions?
In this article, we will explore, hands-on, how to create a machine-learning model that can make predictions from our input data. Join me on this journey as we delve into these principles together.
This is the first part of a series on mastering machine learning, focusing on the foundations and key concepts. In the second part, we will dive deeper into advanced techniques and real-world applications.

Introduction:

Machine Learning (ML) essentially means training a model to solve problems. It involves feeding large amounts of data (input-data) to a model, enabling it to learn and discover patterns from the data. Interestingly, the model's accuracy depends solely on the quantity and quality of data it is fed.

Machine learning extends beyond making predictions for enterprises; it powers innovations like self-driving cars, robotics, and much more. With continuous advancements in ML, there's no telling what incredible achievements lie ahead - it's simply amazing, right?

There's no contest as to why Python remains one of the most sought-after programming languages for machine learning. Its vast libraries, such as Scikit-Learn and Pandas, and its easy-to-read syntax make it ideal for ML tasks. Python offers a simplified and well-structured environment that allows developers to maximize their potential. As an open-source programming language, it benefits from contributions worldwide, making it even more suitable and advantageous for data science and machine learning.

Fundamentals Of Machine Learning

Machine Learning (ML) is a vast and complex field that requires years of continuous learning and practice. While it's impossible to cover everything in this article, let's look into some important fundamentals of machine learning, specifically:

  • Supervised Machine Learning From its name, we can deduce that supervised machine learning involves some form of monitoring or structure. It entails mapping one function to another; that is, providing labeled data input (i) to the machine, explaining what should be done (algorithms), and waiting for its output (j). Through this mapping, the machine learns to predict the output (j) whenever an input (i) is fed into it. The result will always remain output (j). Supervised ML can further be classified into:

Regression: When a variable input (i) is supplied as data to train a machine, it produces a continuous numerical output (j). For example, a regression algorithm can be used to predict the price of an item based on its size and other features.

Classification: This algorithm makes predictions based on grouping by determining certain attributes that make up the group. For example, predicting whether a product review is positive, negative, or neutral.

  • Unsupervised Machine Learning Unsupervised Machine Learning tackles unlabeled or unmonitored data. Unlike supervised learning, where models are trained on labeled data, unsupervised learning algorithms identify patterns and relationships in data without prior knowledge of the outcomes. For example, grouping customers based on their purchasing behavior.

Setting Up Your Environment

When setting up your environment to create your first model, it's essential to understand some basic steps in ML and familiarize yourself with the libraries and tools we will explore in this article.

Steps in Machine Learning:

  1. Import the Data: Gather the data you need for your analysis.
  2. Clean the Data: Ensure your data is in good and complete shape by handling missing values and correcting inconsistencies.
  3. Split the Data: Divide the data into training and test sets.
  4. Create a Model: Choose your preferred algorithm to analyze the data and build your model.
  5. Train the Model: Use the training set to teach your model.
  6. Make Predictions: Use the test set to make predictions with your trained model.
  7. Evaluate and Improve: Assess the model's performance and refine it based on the outputs.

Common Libraries and Tools:

  • NumPy: Known for providing multidimensional arrays, NumPy is fundamental for numerical computations.

  • Pandas: A data analysis library that offers data frames (two-dimensional data structures similar to Excel spreadsheets) with rows and columns.

  • Matplotlib: Matplotlib is a two-dimensional plotting library for creating graphs and plots.

  • Scikit-Learn: The most popular machine learning library, providing all common algorithms like decision trees, neural networks, and more.

Recommended Development Environment:

Standard IDEs such as VS Code or terminals may not be ideal when creating a model due to the difficulty in inspecting data while writing code. For our learning purposes, the recommended environment is Jupyter Notebook, which provides an interactive platform to write and execute code, visualize data, and document the process simultaneously.

Step-by-Step Setup:

Download Anaconda:
Anaconda is a popular distribution of Python and R for scientific computing and data science. It includes the Jupyter Notebook and other essential tools.

Download Anaconda from this link.
Install Anaconda:
Follow the installation instructions based on your operating system (Windows, macOS, or Linux).
After the installation is complete, you will have access to the Anaconda Navigator, which is a graphical interface for managing your Anaconda packages, environments, and notebooks.
Launching Jupyter Notebook:

Mastering Machine Learning with Python: Foundations and Key Concepts

Open the Anaconda Navigator
In the Navigator, click on the "Environments" tab.
Select the "base (root)" environment, and then click "Open with Terminal" or "Open Terminal" (the exact wording may vary depending on the OS).
In the terminal window that opens, type the command jupyter notebook and press Enter.

Mastering Machine Learning with Python: Foundations and Key Concepts

This command will launch the Jupyter Notebook server and automatically open a new tab in your default web browser, displaying the Jupyter Notebook interface.

Using Jupyter Notebook:

The browser window will show a file directory where you can navigate to your project folder or create new notebooks.
Click "New" and select "Python 3" (or the appropriate kernel) to create a new Jupyter Notebook.
You can now start writing and executing your code in the cells of the notebook. The interface allows you to document your code, visualize data, and explore datasets interactively.

Mastering Machine Learning with Python: Foundations and Key Concepts

Building Your First Machine Learning Model

In building your first model, we have to take cognizance of the steps in Machine Learning as discussed earlier, which are:

  1. Import the Data
  2. Clean the Data
  3. Split the Data
  4. Create a Model
  5. Train the Model
  6. Make Predictions
  7. Evaluate and Improve

Now, let's assume a scenario involving an online bookstore where users sign up and provide their necessary information such as name, age, and gender. Based on their profile, we aim to recommend various books they are likely to buy and build a model that helps boost sales.

First, we need to feed the model with sample data from existing users. The model will learn patterns from this data to make predictions. When a new user signs up, we can tell the model, "Hey, we have a new user with this profile. What kind of book are they likely to be interested in?" The model will then recommend, for instance, a history or a romance novel, and based on that, we can make personalized suggestions to the user.

Let's break down the process step-by-step:

  1. Import the Data: Load the dataset containing user profiles and their book preferences.
  2. Clean the Data: Handle missing values, correct inconsistencies, and prepare the data for analysis.
  3. Split the Data: Divide the dataset into training and testing sets to evaluate the model's performance.
  4. Create a Model: Choose a suitable machine learning algorithm to build the recommendation model.
  5. Train the Model: Train the model using the training data to learn the patterns and relationships within the data.
  6. Make Predictions: Use the trained model to predict book preferences for new users based on their profiles.
  7. Evaluate and Improve: Assess the model's accuracy using the testing data and refine it to improve its performance.

By following these steps, you will be able to build a machine-learning model that effectively recommends books to users, enhancing their experience and boosting sales for the online bookstore. You can gain access to the datasets used in this tutorial here.

Let's walk through a sample code snippet to illustrate the process of testing the accuracy of the model:

  • Import the necessary libraries:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

We start by importing the essential libraries. pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library.

  • Load the dataset:
book_data = pd.read_csv('book_Data.csv')
Read the dataset from a `CSV file` into a pandas DataFrame.
  • Prepare the data:
X = book_data.drop(columns=['Genre'])
y = book_data['Genre']

Create a feature matrix X by dropping the 'Genre' column from the dataset and a target vector y containing the 'Genre' column.

  • Split the data:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Split the data into training and testing sets with 80% for training and 20% for testing.

  • Initialize and train the model:
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

Initialize the DecisionTreeClassifier model and train it using the training data.

  • Make predictions and evaluate the model:
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
print(score)

Make predictions on the test data and calculate the accuracy of the model by comparing the test labels to the predictions. Finally, print the accuracy score to the console.

In this example, we start by importing the essential libraries. Pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library. We then read the dataset from a CSV file into a pandas DataFrame, prepare the data by creating a feature matrix X and a target vector y, split the data into training and testing sets, initialize and train the DecisionTreeClassifier model, make predictions on the test data, and calculate the accuracy of the model by comparing the test labels to the predictions.

Depending on the data you're using, the results will vary. For instance, in the output below, the accuracy score displayed is 0.7, but it may show 0.5 when the code is run again with a different dataset. The accuracy score will vary, a higher score indicates a more accurate model.
Output:

Mastering Machine Learning with Python: Foundations and Key Concepts

Data Preprocessing:

Now that you've successfully created your model, it's important to note that the kind of data used to train your model is crucial to the accuracy and reliability of your predictions. In Mastering Data Analysis: Unveiling the Power of Fairness and Bias in Information, I discussed extensively the importance of data cleaning and ensuring data fairness. Depending on what you intend to do with your model, it is essential to consider if your data is fair and free of any bias. Data cleaning is a very vital part of machine learning, ensuring that your model is trained on accurate, unbiased data. Some of these ethical considerations are:

  1. Removing Outliers: Ensure that the data does not contain extreme values that could skew the model's predictions.

  2. Handling Missing Values: Address any missing data points to avoid inaccurate predictions.

  3. Standardizing Data: Make sure the data is in a consistent format, allowing the model to interpret it correctly.

  4. Balancing the Dataset: Ensure that your dataset represents all categories fairly to avoid bias in predictions.

  5. Ensuring Data Fairness: Check for any biases in your data that could lead to unfair predictions and take steps to mitigate them.

By addressing these ethical considerations, you ensure that your model is not only accurate but also fair and reliable, providing meaningful predictions.

Conclusion:

Machine learning is a powerful tool that can transform data into valuable insights and predictions. In this article, we explored the fundamentals of machine learning, focusing on supervised and unsupervised learning, and demonstrated how to set up your environment and build a simple machine learning model using Python and its libraries. By following these steps and experimenting with different algorithms and datasets, you can unlock the potential of machine learning to solve complex problems and make data-driven decisions.

In the next part of this series, we will dive deeper into advanced techniques and real-world applications of machine learning, exploring topics such as feature engineering, model evaluation, and optimization. Stay tuned for more insights and practical examples to enhance your machine-learning journey.

Additional Resources:

  • Programming with Mosh

  • Machine Learning Tutorial geeksforgeeks

版本声明 本文转载于:https://dev.to/eztosin/mastering-machine-learning-with-python-foundations-and-key-concepts-54di?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 为什么我的 GoLang 网络服务器无法提供大型 MP4 视频?
    为什么我的 GoLang 网络服务器无法提供大型 MP4 视频?
    GoLang HTTP Webserver Serving MP4 Video挑战使用 GoLang 创建了一个提供 HTML/JS/CSS 和图像的 Web 服务器。当服务器尝试提供 MP4 视频文件时,视频加载失败,仅显示视频控件。调查检查视频文件后,发现较小的视频可以正常工作,而较大的视频没有...
    编程 发布于2024-11-08
  • 如何在不使用 HTML 表单的情况下使用 PHP 重定向网页并发送 POST 数据?
    如何在不使用 HTML 表单的情况下使用 PHP 重定向网页并发送 POST 数据?
    使用 PHP 重定向和发送 POST 数据在这个问题中,我们遇到了一个独特的挑战:如何重定向网页并通过POST 方法不依赖于 HTML 表单。期望的结果是使用 PHP 脚本将隐藏字段提交到外部网关。通常,通过 GET 发送数据非常简单,如下面的代码片段所示:header('Location: htt...
    编程 发布于2024-11-08
  • 如何处理JSF表单提交过程中的授权失败?
    如何处理JSF表单提交过程中的授权失败?
    JSF 表单提交期间的授权失败:综合分析在 JSF 应用程序中实现自定义授权机制时,了解页面导航和表单提交之间的区别至关重要。虽然重定向可以无缝地进行页面导航,但它们在表单提交期间可能会遇到问题。问题原因此问题的根本原因在于 JSF 表单提交触发异步请求。当发送重定向作为对异步请求的响应时,JSF ...
    编程 发布于2024-11-08
  • 如何有效管理多个 JavaScript 和 CSS 文件以获得最佳页面性能?
    如何有效管理多个 JavaScript 和 CSS 文件以获得最佳页面性能?
    管理多个 JavaScript 和 CSS 文件:最佳实践组织过多的 JavaScript 和 CSS 文件可能会带来挑战,特别是在保持最佳页面性能方面。下面列出了有效解决此问题的最佳实践。PHP Minify:简化 HTTP 请求不要加载大量单独的文件,而是考虑使用 PHP Minify。该工具将...
    编程 发布于2024-11-08
  • 我的 Amazon SDE 面试经历 – 5 月 4 日
    我的 Amazon SDE 面试经历 – 5 月 4 日
    我的 Amazon SDE 面试经历 – 2024 年 5 月 2024 年 5 月,我有机会面试亚马逊的软件开发工程师 (SDE) 职位。这一切都始于一位招聘人员通过 LinkedIn 联系我。我很惊喜,因为它总是令人兴奋。 一切是如何开始的 招聘人员专业且清晰,...
    编程 发布于2024-11-08
  • 如何在 cURL POST 请求中发送多个图像?
    如何在 cURL POST 请求中发送多个图像?
    在 cURL POST 请求中使用数组在尝试使用 cURL 发送图像数组时,用户可能会遇到仅第一个图像的问题传输数组值。这个问题探讨了如何纠正这个问题。原始代码似乎在数组结构上有一个小缺陷。要解决此问题,建议使用 http_build_query 正确格式化数组:$fields = array( ...
    编程 发布于2024-11-08
  • 为什么 $_POST 中的 Axios POST 数据不可访问?
    为什么 $_POST 中的 Axios POST 数据不可访问?
    Axios Post 参数未由 $_POST 读取您正在使用 Axios 将数据发布到 PHP 端点,并希望在 $ 中访问它_POST 或 $_REQUEST。但是,您目前无法检测到它。最初,您使用了默认的 axios.post 方法,但由于怀疑标头问题而切换到提供的代码片段。尽管发生了这种变化,数...
    编程 发布于2024-11-08
  • ## JPQL 中的构造函数表达式:使用还是不使用?
    ## JPQL 中的构造函数表达式:使用还是不使用?
    JPQL 中的构造函数表达式:有益还是有问题的实践?JPQL 提供了使用构造函数表达式在 select 语句中创建新对象的能力。虽然此功能提供了某些优势,但它引发了关于其在软件开发实践中是否适用的问题。构造函数表达式的优点构造函数表达式允许开发人员从实体中提取特定数据并进行组装,从而简化了数据检索将...
    编程 发布于2024-11-08
  • 原型
    原型
    创意设计模式之一。 用于创建给定对象的重复/浅副本。 当直接创建对象成本高昂时,此模式很有用,例如:如果在查询大型数据库后创建对象,则一次又一次地创建该对象在性能方面并不经济。 因此,一旦创建了对象,我们就缓存该对象,并且在将来需要相同的对象时,我们从缓存中获取它,而不是从数据库中再次创建它,并在需...
    编程 发布于2024-11-08
  • Python 变量:命名规则和类型推断解释
    Python 变量:命名规则和类型推断解释
    Python 是一种广泛使用的编程语言,以其简单性和可读性而闻名。了解变量的工作原理是编写高效 Python 代码的基础。在本文中,我们将介绍Python变量命名规则和类型推断,确保您可以编写干净、无错误的代码。 Python变量命名规则 在Python中命名变量时,必须遵循一定的...
    编程 发布于2024-11-08
  • 如何同时高效地将多个列添加到 Pandas DataFrame 中?
    如何同时高效地将多个列添加到 Pandas DataFrame 中?
    同时向 Pandas DataFrame 添加多个列在 Pandas 数据操作中,有效地向 DataFrame 添加多个新列可能是一项需要优雅解决方案的任务。虽然使用带有等号的列列表语法的直观方法可能看起来很简单,但它可能会导致意外的结果。挑战如提供的示例中所示,以下语法无法按预期创建新列:df[[...
    编程 发布于2024-11-08
  • 从开发人员到高级架构师:技术专长和奉献精神的成功故事
    从开发人员到高级架构师:技术专长和奉献精神的成功故事
    一个开发人员晋升为高级架构师的真实故事 一位熟练的Java EE开发人员,只有4年的经验,加入了一家跨国IT公司,并晋升为高级架构师。凭借多样化的技能和 Oracle 认证的 Java EE 企业架构师,该开发人员已经证明了他在架构领域的勇气。 加入公司后,开发人员被分配到一个项目,该公司在为汽车制...
    编程 发布于2024-11-08
  • 如何在 PHP 8.1 中有条件地将元素添加到关联数组?
    如何在 PHP 8.1 中有条件地将元素添加到关联数组?
    条件数组元素添加在 PHP 中,有条件地将元素添加到关联数组的任务可能是一个挑战。例如,考虑以下数组:$arr = ['a' => 'abc'];我们如何有条件地添加 'b' => 'xyz'使用 array() 语句对此数组进行操作?在这种情况下,三元运算符不...
    编程 发布于2024-11-08
  • 从打字机到像素:CMYK、RGB 和构建色彩可视化工具的旅程
    从打字机到像素:CMYK、RGB 和构建色彩可视化工具的旅程
    当我还是个孩子的时候,我出版了一本关于漫画的粉丝杂志。那是在我拥有计算机之前很久——它是用打字机、纸和剪刀创建的! 粉丝杂志最初是黑白的,在我的学校复印的。随着时间的推移,随着它取得了更大的成功,我能够负担得起带有彩色封面的胶印! 然而,管理这些颜色非常具有挑战性。每个封面必须打印四次,每种颜色打印...
    编程 发布于2024-11-08
  • 如何将 Boehm 的垃圾收集器与 C++ 标准库集成?
    如何将 Boehm 的垃圾收集器与 C++ 标准库集成?
    集成 Boehm 垃圾收集器和 C 标准库要将 Boehm 保守垃圾收集器与 C 标准库集合无缝集成,有两种主要方法:重新定义运算符::new此方法涉及重新定义运算符::new以使用Boehm的GC。但是,它可能与现有 C 代码冲突,并且可能无法在不同编译器之间移植。显式分配器参数您可以使用而不是重...
    编程 发布于2024-11-08

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3