”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 使用 JIT 编译器让我的 Python 循环变慢?

使用 JIT 编译器让我的 Python 循环变慢?

发布于2024-09-01
浏览:527

If you haven't heard, Python loops can be slow--especially when working with large datasets. If you're trying to make calculations across millions of data points, execution time can quickly become a bottleneck. Luckily for us, Numba has a Just-in-Time (JIT) compiler that we can use to help speed up our numerical computations and loops in Python.

The other day, I found myself in need of a simple exponential smoothing function in Python. This function needed to take in array and return an array of the same length with the smoothed values. Typically, I try and avoid loops where possible in Python (especially when dealing with Pandas DataFrames). At my current level of capability, I didn't see how to avoid using a loop to exponentially smooth an array of values.

I am going to walk through the process of creating this exponential smoothing function and testing it with and without the JIT compilation. I'll briefly touch on JIT and how I made sure to code the the loop in a manner that worked with the nopython mode.

What is JIT?

JIT compilers are particularly useful with higher-level languages like Python, JavaScript, and Java. These languages are known for their flexibility and ease of use, but they can suffer from slower execution speeds compared to lower-level languages like C or C . JIT compilation helps bridge this gap by optimizing the execution of code at runtime, making it faster without sacrificing the advantages of these higher-level languages.

When using the nopython=True mode in the Numba JIT compiler, the Python interpreter is bypassed entirely, forcing Numba to compile everything down to machine code. This results in even faster execution by eliminating the overhead associated with Python's dynamic typing and other interpreter-related operations.

Building the fast exponential smoothing function

Exponential smoothing is a technique used to smooth out data by applying a weighted average over past observations. The formula for exponential smoothing is:

St=αVt (1α)St1 S_t = \alpha \cdot V_t (1 - \alpha) \cdot S_{t-1} St=α⋅Vt (1−α)⋅St−1

where:

  • StS_tSt : Represents the smoothed value at time ttt .
  • VtV_tVt : Represents the original value at time ttt from the values array.
  • α\alphaα : The smoothing factor, which determines the weight of the current value VtV_tVt in the smoothing process.
  • St1S_{t-1}St−1 : Represents the smoothed value at time t1t-1t−1 , i.e., the previous smoothed value.

The formula applies exponential smoothing, where:

  • The new smoothed value StS_tSt is a weighted average of the current value VtV_tVt and the previous smoothed value St1S_{t-1}St−1 .
  • The factor α\alphaα determines how much influence the current value VtV_tVt has on the smoothed value compared to the previous smoothed value St1S_{t-1}St−1 .

To implement this in Python, and stick to functionality that works with nopython=True mode, we will pass in an array of data values and the alpha float. I default the alpha to 0.33333333 because that fits my current use case. We will initialize an empty array to store the smoothed values in, loop and calculate, and return smoothed values. This is what it looks like:

@jit(nopython=True) 
def fast_exponential_smoothing(values, alpha=0.33333333): 

    smoothed_values = np.zeros_like(values) # Array of zeros the same length as values
    smoothed_values[0] = values[0] # Initialize the first value 

    for i in range(1, len(values)): 
        smoothed_values[i] = alpha * values[i]   (1 - alpha) * smoothed_values[i - 1]
    return smoothed_values

Simple, right? Let's see if JIT is doing anything now. First, we need to create a large array of integers. Then, we call the function, time how long it took to compute, and print the results.

# Generate a large random array of a million integers
large_array = np.random.randint(1, 100, size=1_000_000)

# Test the speed of fast_exponential_smoothing
start_time = time.time()
smoothed_result = fast_exponential_smoothing(large_array)
end_time = time.time()
print(f"Exponential Smoothing with JIT took {end_time - start_time:.6f} seconds with 1,000,000 sample array.")

This can be repeated and altered just a bit to test the function without the JIT decorator. Here are the results that I got:

Using JIT-compilers to make my Python loops slower?

Wait, what the f***?

I thought JIT was supposed to speed it up. It looks like the standard Python function beat the JIT version and a version that attempts to use no recursion. That's strange. I guess you can't just slap the JIT decorator on something and make it go faster? Perhaps simple array loops and NumPy operations are already pretty efficient? Perhaps I don't understand the use case for JIT as well as I should? Maybe we should try this on a more complex loop?

Here is the entire code python file I created for testing:

import numpy as np
from numba import jit
import time

@jit(nopython=True) 
def fast_exponential_smoothing(values, alpha=0.33333333): 

    smoothed_values = np.zeros_like(values) # Array of zeros the same length as values
    smoothed_values[0] = values[0] # Initialize the first value 

    for i in range(1, len(values)): 
        smoothed_values[i] = alpha * values[i]   (1 - alpha) * smoothed_values[i - 1]
        return smoothed_values

def fast_exponential_smoothing_nojit(values, alpha=0.33333333):

    smoothed_values = np.zeros_like(values) # Array of zeros the same length as values
    smoothed_values[0] = values[0] # Initialize the first value 

    for i in range(1, len(values)): 
        smoothed_values[i] = alpha * values[i]   (1 - alpha) * smoothed_values[i - 1]
        return smoothed_values

def non_recursive_exponential_smoothing(values, alpha=0.33333333):
    n = len(values)
    smoothed_values = np.zeros(n)

    # Initialize the first value
    smoothed_values[0] = values[0]

    # Calculate the rest of the smoothed values
    decay_factors = (1 - alpha) ** np.arange(1, n)
    cumulative_weights = alpha * decay_factors
    smoothed_values[1:] = np.cumsum(values[1:] * np.flip(cumulative_weights))   (1 - alpha) ** np.arange(1, n) * values[0]

    return smoothed_values

# Generate a large random array of a million integers
large_array = np.random.randint(1, 1000, size=10_000_000)

# Test the speed of fast_exponential_smoothing
start_time = time.time()
smoothed_result = fast_exponential_smoothing_nojit(large_array)
end_time = time.time()
print(f"Exponential Smoothing without JIT took {end_time - start_time:.6f} seconds with 1,000,000 sample array.")

# Test the speed of fast_exponential_smoothing
start_time = time.time()
smoothed_result = fast_exponential_smoothing(large_array)
end_time = time.time()
print(f"Exponential Smoothing with JIT took {end_time - start_time:.6f} seconds with 1,000,000 sample array.")

# Test the speed of fast_exponential_smoothing
start_time = time.time()
smoothed_result = non_recursive_exponential_smoothing(large_array)
end_time = time.time()
print(f"Exponential Smoothing with no recursion or JIT took {end_time - start_time:.6f} seconds with 1,000,000 sample array.")

I attempted to create the non-recursive version to see if vectorized operations across arrays would make it go faster, but it seems to be pretty damn fast as it is. These results remained the same all the way up until I didn't have enough memory to make the array of random integers.

Let me know what you think about this in the comments. I am by no means a professional developer, so I am accepting all comments, criticisms, or educational opportunities.

Until next time.

Happy coding!

版本声明 本文转载于:https://dev.to/kanndide/using-jit-compilers-to-make-my-python-loops-slower-4m66?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 如何在 PHP 中组合两个关联数组,同时保留唯一 ID 并处理重复名称?
    如何在 PHP 中组合两个关联数组,同时保留唯一 ID 并处理重复名称?
    在 PHP 中组合关联数组在 PHP 中,将两个关联数组组合成一个数组是一项常见任务。考虑以下请求:问题描述:提供的代码定义了两个关联数组,$array1 和 $array2。目标是创建一个新数组 $array3,它合并两个数组中的所有键值对。 此外,提供的数组具有唯一的 ID,而名称可能重合。要求...
    编程 发布于2024-12-25
  • 大批
    大批
    方法是可以在对象上调用的 fns 数组是对象,因此它们在 JS 中也有方法。 slice(begin):将数组的一部分提取到新数组中,而不改变原始数组。 let arr = ['a','b','c','d','e']; // Usecase: Extract till index p...
    编程 发布于2024-12-25
  • 除了“if”语句之外:还有什么地方可以在不进行强制转换的情况下使用具有显式“bool”转换的类型?
    除了“if”语句之外:还有什么地方可以在不进行强制转换的情况下使用具有显式“bool”转换的类型?
    无需强制转换即可上下文转换为 bool您的类定义了对 bool 的显式转换,使您能够在条件语句中直接使用其实例“t”。然而,这种显式转换提出了一个问题:“t”在哪里可以在不进行强制转换的情况下用作 bool?上下文转换场景C 标准指定了四种值可以根据上下文转换为的主要场景bool:语句:if、whi...
    编程 发布于2024-12-25
  • 在 Go 中使用 WebSocket 进行实时通信
    在 Go 中使用 WebSocket 进行实时通信
    构建需要实时更新的应用程序(例如聊天应用程序、实时通知或协作工具)需要比传统 HTTP 更快、更具交互性的通信方法。这就是 WebSockets 发挥作用的地方!今天,我们将探讨如何在 Go 中使用 WebSocket,以便您可以向应用程序添加实时功能。 在这篇文章中,我们将介绍: WebSocke...
    编程 发布于2024-12-25
  • 如何使用 MySQL 查找今天生日的用户?
    如何使用 MySQL 查找今天生日的用户?
    如何使用 MySQL 识别今天生日的用户使用 MySQL 确定今天是否是用户的生日涉及查找生日匹配的所有行今天的日期。这可以通过一个简单的 MySQL 查询来实现,该查询将存储为 UNIX 时间戳的生日与今天的日期进行比较。以下 SQL 查询将获取今天有生日的所有用户: FROM USERS ...
    编程 发布于2024-12-25
  • 如何修复 macOS 上 Django 中的“配置不正确:加载 MySQLdb 模块时出错”?
    如何修复 macOS 上 Django 中的“配置不正确:加载 MySQLdb 模块时出错”?
    MySQL配置不正确:相对路径的问题在Django中运行python manage.py runserver时,可能会遇到以下错误:ImproperlyConfigured: Error loading MySQLdb module: dlopen(/Library/Python/2.7/site-...
    编程 发布于2024-12-25
  • 如何将 Pandas DataFrame 字符串条目分解(拆分)为单独的行?
    如何将 Pandas DataFrame 字符串条目分解(拆分)为单独的行?
    将 Pandas DataFrame 字符串条目分解(拆分)为单独的行在 Pandas 中,一个常见的要求是将逗号分隔的值拆分为文本字符串列并为每个条目创建一个新行。这可以通过各种方法来实现。使用Series.explode()或DataFrame.explode()对于Pandas版本0.25.0...
    编程 发布于2024-12-25
  • 插入数据时如何修复“常规错误:2006 MySQL 服务器已消失”?
    插入数据时如何修复“常规错误:2006 MySQL 服务器已消失”?
    插入记录时如何解决“一般错误:2006 MySQL 服务器已消失”介绍:将数据插入 MySQL 数据库有时会导致错误“一般错误:2006 MySQL 服务器已消失”。当与服务器的连接丢失时会出现此错误,通常是由于 MySQL 配置中的两个变量之一所致。解决方案:解决此错误的关键是调整wait_tim...
    编程 发布于2024-12-25
  • HTML 格式标签
    HTML 格式标签
    HTML 格式化元素 **HTML Formatting is a process of formatting text for better look and feel. HTML provides us ability to format text without us...
    编程 发布于2024-12-25
  • 尽管代码有效,为什么 POST 请求无法捕获 PHP 中的输入?
    尽管代码有效,为什么 POST 请求无法捕获 PHP 中的输入?
    解决 PHP 中的 POST 请求故障在提供的代码片段中:action=''而不是:action="<?php echo $_SERVER['PHP_SELF'];?>";?>"检查 $_POST数组:表单提交后使用 var_dump 检查 $_POST 数...
    编程 发布于2024-12-25
  • Bootstrap 4 Beta 中的列偏移发生了什么?
    Bootstrap 4 Beta 中的列偏移发生了什么?
    Bootstrap 4 Beta:列偏移的删除和恢复Bootstrap 4 在其 Beta 1 版本中引入了重大更改柱子偏移了。然而,随着 Beta 2 的后续发布,这些变化已经逆转。从 offset-md-* 到 ml-auto在 Bootstrap 4 Beta 1 中, offset-md-*...
    编程 发布于2024-12-25
  • Java中如何使用Selenium WebDriver高效上传文件?
    Java中如何使用Selenium WebDriver高效上传文件?
    在 Java 中使用 Selenium WebDriver 上传文件:详细指南将文件上传到 Web 应用程序是软件测试期间的一项常见任务。 Selenium WebDriver 是一种流行的自动化框架,它提供了一种使用 Java 代码上传文件的简单方法。然而,重要的是要明白,在 Selenium 中...
    编程 发布于2024-12-24
  • 使用 GNU Emacs 进行 C 语言开发
    使用 GNU Emacs 进行 C 语言开发
    Emacs is designed with programming in mind, it supports languages like C, Python, and Lisp natively, offering advanced features such as syntax highli...
    编程 发布于2024-12-24
  • 如何在 PHP 中打印单引号内的变量?
    如何在 PHP 中打印单引号内的变量?
    无法直接回显带有单引号的变量需要在单引号字符串中打印变量?直接这样做是不可能的。如何在单引号内打印变量:方法 1:使用串联追加 为此,请使用点运算符将变量连接到字符串上:echo 'I love my ' . $variable . '.';此方法将变量追加到字符串中。方法 2:使用双引号或者,在字...
    编程 发布于2024-12-24
  • std::vector 与普通数组:性能何时真正重要?
    std::vector 与普通数组:性能何时真正重要?
    std::vector 与普通数组:性能评估虽然人们普遍认为 std::vector 的操作与数组类似,但最近的测试对这一概念提出了挑战。在本文中,我们将研究 std::vector 和普通数组之间的性能差异,并阐明根本原因。为了进行测试,实施了一个基准测试,其中涉及重复创建和修改大型数组像素对象。...
    编程 发布于2024-12-24

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3