”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 使用 Python 掌握机器学习:基础和关键概念

使用 Python 掌握机器学习:基础和关键概念

发布于2024-11-04
浏览:427

In today's era of Artificial Intelligence (AI), scaling businesses and streamlining workflows has never been easier or more accessible. AI and machine learning equip companies to make informed decisions, giving them a superpower to predict the future with just a few lines of code. Before taking a significant risk, wouldn't knowing if it's worth it be beneficial? Have you ever wondered how these AIs and machine learning models are trained to make such precise predictions?
In this article, we will explore, hands-on, how to create a machine-learning model that can make predictions from our input data. Join me on this journey as we delve into these principles together.
This is the first part of a series on mastering machine learning, focusing on the foundations and key concepts. In the second part, we will dive deeper into advanced techniques and real-world applications.

Introduction:

Machine Learning (ML) essentially means training a model to solve problems. It involves feeding large amounts of data (input-data) to a model, enabling it to learn and discover patterns from the data. Interestingly, the model's accuracy depends solely on the quantity and quality of data it is fed.

Machine learning extends beyond making predictions for enterprises; it powers innovations like self-driving cars, robotics, and much more. With continuous advancements in ML, there's no telling what incredible achievements lie ahead - it's simply amazing, right?

There's no contest as to why Python remains one of the most sought-after programming languages for machine learning. Its vast libraries, such as Scikit-Learn and Pandas, and its easy-to-read syntax make it ideal for ML tasks. Python offers a simplified and well-structured environment that allows developers to maximize their potential. As an open-source programming language, it benefits from contributions worldwide, making it even more suitable and advantageous for data science and machine learning.

Fundamentals Of Machine Learning

Machine Learning (ML) is a vast and complex field that requires years of continuous learning and practice. While it's impossible to cover everything in this article, let's look into some important fundamentals of machine learning, specifically:

  • Supervised Machine Learning From its name, we can deduce that supervised machine learning involves some form of monitoring or structure. It entails mapping one function to another; that is, providing labeled data input (i) to the machine, explaining what should be done (algorithms), and waiting for its output (j). Through this mapping, the machine learns to predict the output (j) whenever an input (i) is fed into it. The result will always remain output (j). Supervised ML can further be classified into:

Regression: When a variable input (i) is supplied as data to train a machine, it produces a continuous numerical output (j). For example, a regression algorithm can be used to predict the price of an item based on its size and other features.

Classification: This algorithm makes predictions based on grouping by determining certain attributes that make up the group. For example, predicting whether a product review is positive, negative, or neutral.

  • Unsupervised Machine Learning Unsupervised Machine Learning tackles unlabeled or unmonitored data. Unlike supervised learning, where models are trained on labeled data, unsupervised learning algorithms identify patterns and relationships in data without prior knowledge of the outcomes. For example, grouping customers based on their purchasing behavior.

Setting Up Your Environment

When setting up your environment to create your first model, it's essential to understand some basic steps in ML and familiarize yourself with the libraries and tools we will explore in this article.

Steps in Machine Learning:

  1. Import the Data: Gather the data you need for your analysis.
  2. Clean the Data: Ensure your data is in good and complete shape by handling missing values and correcting inconsistencies.
  3. Split the Data: Divide the data into training and test sets.
  4. Create a Model: Choose your preferred algorithm to analyze the data and build your model.
  5. Train the Model: Use the training set to teach your model.
  6. Make Predictions: Use the test set to make predictions with your trained model.
  7. Evaluate and Improve: Assess the model's performance and refine it based on the outputs.

Common Libraries and Tools:

  • NumPy: Known for providing multidimensional arrays, NumPy is fundamental for numerical computations.

  • Pandas: A data analysis library that offers data frames (two-dimensional data structures similar to Excel spreadsheets) with rows and columns.

  • Matplotlib: Matplotlib is a two-dimensional plotting library for creating graphs and plots.

  • Scikit-Learn: The most popular machine learning library, providing all common algorithms like decision trees, neural networks, and more.

Recommended Development Environment:

Standard IDEs such as VS Code or terminals may not be ideal when creating a model due to the difficulty in inspecting data while writing code. For our learning purposes, the recommended environment is Jupyter Notebook, which provides an interactive platform to write and execute code, visualize data, and document the process simultaneously.

Step-by-Step Setup:

Download Anaconda:
Anaconda is a popular distribution of Python and R for scientific computing and data science. It includes the Jupyter Notebook and other essential tools.

Download Anaconda from this link.
Install Anaconda:
Follow the installation instructions based on your operating system (Windows, macOS, or Linux).
After the installation is complete, you will have access to the Anaconda Navigator, which is a graphical interface for managing your Anaconda packages, environments, and notebooks.
Launching Jupyter Notebook:

Mastering Machine Learning with Python: Foundations and Key Concepts

Open the Anaconda Navigator
In the Navigator, click on the "Environments" tab.
Select the "base (root)" environment, and then click "Open with Terminal" or "Open Terminal" (the exact wording may vary depending on the OS).
In the terminal window that opens, type the command jupyter notebook and press Enter.

Mastering Machine Learning with Python: Foundations and Key Concepts

This command will launch the Jupyter Notebook server and automatically open a new tab in your default web browser, displaying the Jupyter Notebook interface.

Using Jupyter Notebook:

The browser window will show a file directory where you can navigate to your project folder or create new notebooks.
Click "New" and select "Python 3" (or the appropriate kernel) to create a new Jupyter Notebook.
You can now start writing and executing your code in the cells of the notebook. The interface allows you to document your code, visualize data, and explore datasets interactively.

Mastering Machine Learning with Python: Foundations and Key Concepts

Building Your First Machine Learning Model

In building your first model, we have to take cognizance of the steps in Machine Learning as discussed earlier, which are:

  1. Import the Data
  2. Clean the Data
  3. Split the Data
  4. Create a Model
  5. Train the Model
  6. Make Predictions
  7. Evaluate and Improve

Now, let's assume a scenario involving an online bookstore where users sign up and provide their necessary information such as name, age, and gender. Based on their profile, we aim to recommend various books they are likely to buy and build a model that helps boost sales.

First, we need to feed the model with sample data from existing users. The model will learn patterns from this data to make predictions. When a new user signs up, we can tell the model, "Hey, we have a new user with this profile. What kind of book are they likely to be interested in?" The model will then recommend, for instance, a history or a romance novel, and based on that, we can make personalized suggestions to the user.

Let's break down the process step-by-step:

  1. Import the Data: Load the dataset containing user profiles and their book preferences.
  2. Clean the Data: Handle missing values, correct inconsistencies, and prepare the data for analysis.
  3. Split the Data: Divide the dataset into training and testing sets to evaluate the model's performance.
  4. Create a Model: Choose a suitable machine learning algorithm to build the recommendation model.
  5. Train the Model: Train the model using the training data to learn the patterns and relationships within the data.
  6. Make Predictions: Use the trained model to predict book preferences for new users based on their profiles.
  7. Evaluate and Improve: Assess the model's accuracy using the testing data and refine it to improve its performance.

By following these steps, you will be able to build a machine-learning model that effectively recommends books to users, enhancing their experience and boosting sales for the online bookstore. You can gain access to the datasets used in this tutorial here.

Let's walk through a sample code snippet to illustrate the process of testing the accuracy of the model:

  • Import the necessary libraries:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

We start by importing the essential libraries. pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library.

  • Load the dataset:
book_data = pd.read_csv('book_Data.csv')
Read the dataset from a `CSV file` into a pandas DataFrame.
  • Prepare the data:
X = book_data.drop(columns=['Genre'])
y = book_data['Genre']

Create a feature matrix X by dropping the 'Genre' column from the dataset and a target vector y containing the 'Genre' column.

  • Split the data:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Split the data into training and testing sets with 80% for training and 20% for testing.

  • Initialize and train the model:
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

Initialize the DecisionTreeClassifier model and train it using the training data.

  • Make predictions and evaluate the model:
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
print(score)

Make predictions on the test data and calculate the accuracy of the model by comparing the test labels to the predictions. Finally, print the accuracy score to the console.

In this example, we start by importing the essential libraries. Pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library. We then read the dataset from a CSV file into a pandas DataFrame, prepare the data by creating a feature matrix X and a target vector y, split the data into training and testing sets, initialize and train the DecisionTreeClassifier model, make predictions on the test data, and calculate the accuracy of the model by comparing the test labels to the predictions.

Depending on the data you're using, the results will vary. For instance, in the output below, the accuracy score displayed is 0.7, but it may show 0.5 when the code is run again with a different dataset. The accuracy score will vary, a higher score indicates a more accurate model.
Output:

Mastering Machine Learning with Python: Foundations and Key Concepts

Data Preprocessing:

Now that you've successfully created your model, it's important to note that the kind of data used to train your model is crucial to the accuracy and reliability of your predictions. In Mastering Data Analysis: Unveiling the Power of Fairness and Bias in Information, I discussed extensively the importance of data cleaning and ensuring data fairness. Depending on what you intend to do with your model, it is essential to consider if your data is fair and free of any bias. Data cleaning is a very vital part of machine learning, ensuring that your model is trained on accurate, unbiased data. Some of these ethical considerations are:

  1. Removing Outliers: Ensure that the data does not contain extreme values that could skew the model's predictions.

  2. Handling Missing Values: Address any missing data points to avoid inaccurate predictions.

  3. Standardizing Data: Make sure the data is in a consistent format, allowing the model to interpret it correctly.

  4. Balancing the Dataset: Ensure that your dataset represents all categories fairly to avoid bias in predictions.

  5. Ensuring Data Fairness: Check for any biases in your data that could lead to unfair predictions and take steps to mitigate them.

By addressing these ethical considerations, you ensure that your model is not only accurate but also fair and reliable, providing meaningful predictions.

Conclusion:

Machine learning is a powerful tool that can transform data into valuable insights and predictions. In this article, we explored the fundamentals of machine learning, focusing on supervised and unsupervised learning, and demonstrated how to set up your environment and build a simple machine learning model using Python and its libraries. By following these steps and experimenting with different algorithms and datasets, you can unlock the potential of machine learning to solve complex problems and make data-driven decisions.

In the next part of this series, we will dive deeper into advanced techniques and real-world applications of machine learning, exploring topics such as feature engineering, model evaluation, and optimization. Stay tuned for more insights and practical examples to enhance your machine-learning journey.

Additional Resources:

  • Programming with Mosh

  • Machine Learning Tutorial geeksforgeeks

版本声明 本文转载于:https://dev.to/eztosin/mastering-machine-learning-with-python-foundations-and-key-concepts-54di?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 如何在JavaScript对象中动态设置键?
    如何在JavaScript对象中动态设置键?
    如何为JavaScript对象变量创建动态键,尝试为JavaScript对象创建动态键,使用此Syntax jsObj['key' i] = 'example' 1;将不起作用。正确的方法采用方括号:他们维持一个长度属性,该属性反映了数字属性(索引)和一个数字属性的数量。标准对象没有模仿这...
    编程 发布于2025-02-19
  • 如何为PostgreSQL中的每个唯一标识符有效地检索最后一行?
    如何为PostgreSQL中的每个唯一标识符有效地检索最后一行?
    [2最后一行与数据集中的每个不同标识符关联。考虑以下数据: 1 2014-02-01 kjkj 1 2014-03-11 ajskj 3 2014-02-01 sfdg 3 2014-06-12 fdsa 为了检索数据集中每个唯一ID的最后一行信息,您可以在操作员上使用Postgres的有效效...
    编程 发布于2025-02-19
  • 如何在整个HTML文档中设计特定元素类型的第一个实例?
    如何在整个HTML文档中设计特定元素类型的第一个实例?
    [2单独使用CSS,整个HTML文档可能是一个挑战。 the:第一型伪级仅限于与其父元素中类型的第一个元素匹配。 :首个型 然后,以下CSS将在第一个段落中为添加的第一个段落样式班级:
    编程 发布于2025-02-19
  • Java是否允许多种返回类型:仔细研究通用方法?
    Java是否允许多种返回类型:仔细研究通用方法?
    在java中的多个返回类型:一个误解介绍,其中foo是自定义类。该方法声明似乎拥有两种返回类型:列表和E。但是,情况确实如此吗?通用方法:拆开神秘 [方法仅具有单一的返回类型。相反,它采用机制,如钻石符号“ ”。分解方法签名: :本节定义了一个通用类型参数,E。它表示该方法接受扩展FOO类的任何...
    编程 发布于2025-02-19
  • 如何使用PHP从XML文件中有效地检索属性值?
    如何使用PHP从XML文件中有效地检索属性值?
    从php 您的目标可能是检索“ varnum”属性值,其中提取数据的传统方法可能会使您留下PHP陷入困境。使用simplexmlelement :: attributes()函数提供了简单的解决方案。此函数可访问对XML元素作为关联数组的属性: - > attributes()为$ attr...
    编程 发布于2025-02-19
  • 为什么使用固定定位时,为什么具有100%网格板柱的网格超越身体?
    为什么使用固定定位时,为什么具有100%网格板柱的网格超越身体?
    网格超过身体,用100%grid-template-columns 问题:考虑以下CSS和HTML: position:fixed; grid-template-columns:40%60%; grid-gap:5px; 背景:#eee; 当位置未固定时,网格将正确显示。但是,当...
    编程 发布于2025-02-19
  • PHP阵列键值异常:了解07和08的好奇情况
    PHP阵列键值异常:了解07和08的好奇情况
    PHP数组键值问题,使用07&08 在给定数月的数组中,键值07和08呈现令人困惑的行为时,就会出现一个不寻常的问题。运行print_r($月份)返回意外结果:键“ 07”丢失,而键“ 08”分配给了9月的值。此问题源于PHP对领先零的解释。当一个数字带有0(例如07或08)的前缀时,PHP将...
    编程 发布于2025-02-19
  • 如何干净地删除匿名JavaScript事件处理程序?
    如何干净地删除匿名JavaScript事件处理程序?
    element.addeventlistener(event,function(){/要解决此问题,请考虑将事件处理程序存储在中心位置,例如页面的主要对象,请考虑将事件处理程序存储在中心位置,否则无法清理匿名事件处理程序。 。这允许在需要时轻松迭代和清洁处理程序。
    编程 发布于2025-02-19
  • 如何使用Python的记录模块实现自定义处理?
    如何使用Python的记录模块实现自定义处理?
    使用Python的Loggging Module 确保正确处理和登录对于疑虑和维护的稳定性至关重要Python应用程序。尽管手动捕获和记录异常是一种可行的方法,但它可能乏味且容易出错。解决此问题,Python允许您覆盖默认的异常处理机制,并将其重定向为登录模块。这提供了一种方便而系统的方法来捕获和...
    编程 发布于2025-02-19
  • 为什么使用Firefox后退按钮时JavaScript执行停止?
    为什么使用Firefox后退按钮时JavaScript执行停止?
    导航历史记录问题:JavaScript使用Firefox Back Back 此行为是由浏览器缓存JavaScript资源引起的。要解决此问题并确保在后续页面访问中执行脚本,Firefox用户应设置一个空功能以在window.onunload事件上调用。 pre> window.onload ...
    编程 发布于2025-02-19
  • 我可以将加密从McRypt迁移到OpenSSL,并使用OpenSSL迁移MCRYPT加密数据?
    我可以将加密从McRypt迁移到OpenSSL,并使用OpenSSL迁移MCRYPT加密数据?
    将我的加密库从mcrypt升级到openssl 问题:是否可以将我的加密库从McRypt升级到OpenSSL?如果是这样?使用openssl? openssl_decrypt()函数要求iv参数的长度与所使用的cipher的块大小相同。 && && && && &&华openssl_decry...
    编程 发布于2025-02-19
  • 对象拟合:IE和Edge中的封面失败,如何修复?
    对象拟合:IE和Edge中的封面失败,如何修复?
    解决此问题,我们采用了一个巧妙的CSS解决方案来解决问题:左:50%; 高度:auto; 宽度:100%; //对于水平块 ,使用绝对定位将图像定位在中心,以object-fit:object-fit:cover in IE和edge消除了问题。现在,图像将按比例扩展,保持所需的效果而不会失真。...
    编程 发布于2025-02-19
  • 为什么箭头函数在IE11中引起语法错误?如何修复它们?
    为什么箭头函数在IE11中引起语法错误?如何修复它们?
    为什么arrow functions在IE 11 中引起语法错误。 IE 11不支持箭头函数,导致语法错误。这使用传统函数语法来定义与原始箭头函数相同的逻辑。 IE 11现在将正确识别并执行代码。
    编程 发布于2025-02-19
  • 如何使用PHP将斑点(图像)正确插入MySQL?
    如何使用PHP将斑点(图像)正确插入MySQL?
    在尝试将image存储在mysql数据库中时,您可能会遇到一个可能会遇到问题。本指南将提供成功存储您的图像数据的解决方案。 essue values('$ this-> image_id','file_get_contents($ tmp_image)&#...
    编程 发布于2025-02-19
  • 如何检查对象是否具有Python中的特定属性?
    如何检查对象是否具有Python中的特定属性?
    方法来确定对象属性存在寻求一种方法来验证对象中特定属性的存在。考虑以下示例,其中尝试访问不确定属性会引起错误: >>> a = someClass() >>> A.property Trackback(最近的最新电话): 文件“ ”,第1行, AttributeError:SomeClass实...
    编程 发布于2025-02-19

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3