”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 在Python中构建缓存

在Python中构建缓存

发布于2024-11-03
浏览:283

Building a cache in Python

Caching. Useful stuff. If you're not familiar with it, it's a way to keep data around in memory (or on disk) for fast retrieval. Think of querying a database for some information. Rather than do that every time an application asks for data, we can do it once and keep the result in a cache. Subsequent calls for the data will return the copy from cache instead of making a database query. In theory this improves the performance of your application.

Let's build a simple cache for use in Python programs.

Cache API

I'll start by creating a new module called simplecache, and defining a class Cache in it. I'll not implement anything yet, I just want to define the API my cache will use.

class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """


    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        pass


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        pass


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        pass


    def remove(self, key):
        """ Remove an item from the cache using the specified key. """
        pass


    def purge(self, all=False):
        """ Remove expired items from the cache, or all items if flag is set. """
        pass


    def close(self):
        """ Close the underlying connection used by the cache. """
        pass

So far so good. We can tell the cache to create a new in-memory or file based cache via the __init__ method. We can add items to the cache using update - which will overwrite the item if it already exists. We can get an item using a key get. Finally we can remove items by key remove, or empty the cache of expired items using purge (which optionally allows purging all items).

Where to cache the data?

So where is this class going to cache data? Sqlite ships with the Python standard library, and is ideal for this sort of thing. In fact one of the suggested use cases for Sqlite is caching. It allows us to create an in-memory or file based SQL database, which is both our use cases covered. Let's design a SQL table that can hold our cached data.

CREATE TABLE 'Cache' (
    'Key' TEXT NOT NULL UNIQUE,
    'Item' BLOB NOT NULL,
    'CreatedOn' TEXT NOT NULL,
    'TimeToLive' TEXT NOT NULL,
    PRIMARY KEY("Key"))
);

To break this down, we have a table called Cache that has four fields. The Key is a string, and will act as a unique primary key on the table. Next up our Item field is a blob of binary data - I'm thinking here that we will serialise objects added to the cache before saving them in the database. The last two fields are used to determine the lifetime of the item in the cache - CreatedOn is the timestamp of when the item was added, and TimeToLive is the length of time we need to hang on to the item for.

Constructing the cache

Let's start by importing the Sqlite library into our module.

import sqlite3

Then we need to turn our attention to the __init__ method. We have two scenarios to support: one where we are given a filename, and one where we are not.

def __init__(self, filename=None):
    """ Construct a new in-memory or file based cache."""
    if filename is None:
        self._connection = sqlite3.connect(":memory:")
    else:
        self._connection = sqlite3.connect(filename)
    self._create_schema()        

We can open a connection and keep it around for the lifetime of the class. Therefore we set the self._connection property to hold the connection instance, before calling an internal method _create_schema (more on that in a little while).

Bootstrapping the schema

If we are using an in-memory database, then our schema will not exist yet. However for file based databases that may not be the case. This may be an existing file that has already been setup with our schema. Let's see the code that handles this process.

def _create_schema(self):
    table_name = "Cache"
    cursor = self._connection.cursor()
    result = cursor.execute(
        "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
        (table_name,))
    cache_exists = result.fetchone() is not None
    if cache_exists:
        return    
    sql = """
        CREATE TABLE 'Cache' (
        'Key' TEXT NOT NULL UNIQUE,
        'Item' BLOB NOT NULL,
        'CreatedOn' TEXT NOT NULL,
        'TimeToLive' TEXT NOT NULL,
        PRIMARY KEY('Key'))
    """
    cursor.execute(sql)
    cursor.close()

First we define our table name and open a cursor to perform some database operations. Next we check for the existence of our table in the master table. If it's there we exit the method, otherwise we execute a CREATE TABLE... statement to build our table in the database.

We can now instantiate our Cache class with either an in-memory or file based database to cache objects in.

Adding to the cache

Let us now turn our attention to adding items to the cache. Recall our Update method from our API will be responsible for this. If a key already exists in the cache, we are going to replace its entry with whatever is supplied.

def update(self, key, item, ttl=60):
    """ Add or update an item in the cache using the supplied key. Optional
    ttl specifies how many seconds the item will live in the cache for. """
    sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is not None:
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        connection.commit()
    sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
    pickled_item = pickle.dumps(item)
    cursor.execute(sql, (key, pickled_item, ttl))
    self._connection.commit()
    cursor.close()

First we get the in-memory database connection. We then test for the existence of the the supplied key value in the cache. If it exists, it is removed from the cache. Next we pickle the supplied item value, turning it into a binary blob. We then insert the key, picked item, and ttl into the cache.

Getting data from the cache

Our behaviour for getting an item from the cache is mostly straight forward. We look for an entry with the specified key and unpickle the associated item. The complication in this process arises when the ttl value has expired. That is, the date of creation plus the ttl are less than the current time. If this is the case, then the item has expired and should be removed from the cache.

There is a philosophical issue with this method. There is a school of thought that says methods should either return a value (read) or perform a mutation of data (write). Here we are potentially doing both. We are deliberately introducing a side effect (deleting an expired item). I think it is OK in this case, but other programmers might argue otherwise.

def get(self, key):
    """ Get an item from the cache using the specified key. """
    sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is None:
        return
    item = pickle.loads(row[0])
    expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
    now = datetime.datetime.now()
    if expiry_date 



Removing and purging items in the cache

Removing an item is simple - in fact we have already done it (twice) in our other two methods. That's an ideal candidate for refactoring (which we will look at later on). For now, we will implement the method directly.

def remove(self, key):
    """ Remove an item from the cache using the specified key. """
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    cursor.execute(sql, (key,))
    self._connection.commit()
    cursor.close()

Purging items is a little more complex. We have two scenarios to support - purging all items and purging only expired items. Let's see how we can achieve this.

def purge(self, all=False):
    """ Remove expired items from the cache, or all items if flag is set. """
    cursor = self._connection.cursor()
    if all:
        sql = "DELETE FROM 'Cache'"
        cursor.execute(sql)
        self._connection.commit()
    else:
        sql = "SELECT Key, CreatedOn, TimeToLive from 'Cache'"
        for row in cursor.execute(sql):
            expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
            now = datetime.datetime.now()
            if expiry_date 



Deleting all is simple enough. We simply run SQL to delete everything in our Cache table. For the expired only items we need to loop through each row, compute the expiry date, and determine if it should be deleted. Again, this latter piece of code has been repeated from one of our other methods (get in this case). Another candidate for refactoring.

Refactoring

We have a working cache implementation that satisfies our original API specification. There is however some repeated code, which we can factor out into their own methods. Lets start with the delete logic that is present in get, update, remove, and purge methods. These instances can all be replaced with a call to the following new method.

def _remove_item(self, key, cursor):
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor.execute(sql, (key,))
    cursor.connection.commit()

We can see this has a big impact on our code. Four other methods are now calling the one common _remove_item method. Next let's take a look at the expiry date checking code.

def _item_has_expired(self, created, ttl):
    expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
    now = datetime.datetime.now()
    return expiry_date 



Great. That's two more places we have reduced code repetition in.

Thread safety with locks

We are almost done. For this to be a robust class, we need to ensure we are thread safe. Caches can often surface as singleton instances in an application, so thread safety is important. We will achieve this by using locks around the destructive cache operations. This is how our whole class looks with locking added. Note the with blocks around the add and delete operations. These ensure the lock is released even if something goes wrong.

#! /usr/bin/env python3


import datetime
import pickle
import sqlite3
import threading


class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """

    _lock = threading.Lock()

    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        if filename is None:
            self._connection = sqlite3.connect(":memory:")
        else:
            self._connection = sqlite3.connect(filename)
        self._create_schema()


    def _create_schema(self):
        table_name = "Cache"
        cursor = self._connection.cursor()
        result = cursor.execute(
            "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
            (table_name,))
        cache_exists = result.fetchone() is not None
        if cache_exists:
            return    
        sql = """
            CREATE TABLE 'Cache' (
            'Key' TEXT NOT NULL UNIQUE,
            'Item' BLOB NOT NULL,
            'CreatedOn' TEXT NOT NULL,
            'TimeToLive' TEXT NOT NULL,
            PRIMARY KEY('Key'))
        """
        cursor.execute(sql)
        cursor.close()


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        with self.__class__._lock:
            if row is not None:
                self._remove_item(key, cursor)
            sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
            pickled_item = pickle.dumps(item)
            cursor.execute(sql, (key, pickled_item, ttl))
            self._connection.commit()
        cursor.close()


    def _remove_item(self, key, cursor):
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        cursor.connection.commit()


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        if row is None:
            return
        item = pickle.loads(row[0])
        if self._item_has_expired(row[1], row[2]):
            with self.__class__._lock:
                self._remove_item(key, cursor)
            item = None
        cursor.close()
        return item


    def _item_has_expired(self, created, ttl):
        expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
        now = datetime.datetime.now()
        return expiry_date 



Testing the cache

Time to test our cache. We can do this be spinning up an interactive session as follows.

python -i simplecache.py

Now we can new up an in-memory cache and test our methods.

>>> c = Cache()
>>> c.update("key", "some value")
>>> c.update("key2", [1, 2, 3], 300)
>>> c.get("key")
'some vlaue'
>>> c.remove("key")
>>> c.purge()
>>> c.get("key2")
[1, 2, 3]
>>> c.purge(True)
>>> c.get("key2")
>>> c.close()
>>>

Exercises for the reader

  1. Write a suite of unit tests for the Cache class. How easy is it to test? Do you need to make any changes to accommodate testing?

  2. Make the time to live a sliding window instead of a fixed time. That is, whenever an item is retrieved from the cache, its time to live value starts over.

  3. Add a method to write the contents of the cache out to the screen.

版本声明 本文转载于:https://dev.to/robc79/building-a-cache-in-python-3moa?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 我们应该在 C++ 函数原型中使用异常说明符吗?
    我们应该在 C++ 函数原型中使用异常说明符吗?
    C 中的异常:我们应该在函数原型中指定它们吗?在 C 中,异常说明符允许函数声明它们是否可以抛出异常。然而,由于对其有效性和后果的担忧,它们的使用受到了质疑。反对使用异常说明符的原因:执行不力: 编译器并不严格强制执行异常说明符,因此违反它们可能不会导致错误。这会破坏它们的可靠性。程序终止:违反异常...
    编程 发布于2024-11-08
  • Python 的 If 语句中何时使用 and 关键字进行逻辑连接?
    Python 的 If 语句中何时使用 and 关键字进行逻辑连接?
    Python If 语句中的逻辑 AND在 Python 中使用 if 语句时,必须使用正确的逻辑运算符来计算多个条件。逻辑与运算符在许多编程语言中用 && 表示,它评估两个操作数的真实性,并且仅当两个操作数都为 true 时才返回 True。但是,在 Python 的 if 语句中,&& 不被识别...
    编程 发布于2024-11-08
  • 什么是 Redux,我们如何使用它?
    什么是 Redux,我们如何使用它?
    What is Redux, and how do we use it? Redux is like a helpful tool for managing the state of JavaScript programs. It helps keep everything organized an...
    编程 发布于2024-11-08
  • 唯一索引可以删除具有现有重复项的表中的重复项吗?如何删除?
    唯一索引可以删除具有现有重复项的表中的重复项吗?如何删除?
    通过唯一索引去重为了防止重复数据插入,错误地为字段A、B创建了普通索引, C、D,导致2000万条记录的表中存在重复记录。问题出现了:为这些字段添加唯一索引会在不影响现有字段的情况下删除重复项吗?更正索引并处理重复项添加唯一索引不带 IGNORE 修饰符的 ALTER TABLE 语句将失败,因为唯...
    编程 发布于2024-11-08
  • Java 中的 Setters 和 Record
    Java 中的 Setters 和 Record
    record是一种结构,其特点是不可变,也就是说,一旦创建了record类型的对象,它的属性不能修改,它相当于其他编程语言所说的data-class或DTO(数据传输对象)。但是,如果需要使用setter方法修改某个属性,并且考虑到记录中的每个属性都是final类型,那么如何实现呢? 为了证明这是否...
    编程 发布于2024-11-08
  • 利用剩余参数
    利用剩余参数
    我今天浏览了node.js材料,并使用了path.join方法。该方法在node.js中被广泛使用。 path.join("/foo", "bar"); // Returns: '/foo/bar' path.join('/foo', 'bar', 'baz/asdf', 'quux', '..')...
    编程 发布于2024-11-08
  • 如何从导入的 CSV 文件中删除 BOM?
    如何从导入的 CSV 文件中删除 BOM?
    从导入的 CSV 文件中删除 BOM导入 .csv 文件时,经常会遇到 BOM(字节顺序标记),它可能会干扰数据处理。可以通过从文件中删除 BOM 来解决此问题。删除 BOM 的一种方法是使用正则表达式:$new_file = preg_replace('/[\x00-\x1F\x80-\xFF]/...
    编程 发布于2024-11-08
  • ## 为什么模板基类的多重继承会导致成员函数解析不明确?
    ## 为什么模板基类的多重继承会导致成员函数解析不明确?
    消除多重继承的歧义使用模板基类处理多重继承时,会出现关于不明确成员函数解析的潜在问题。考虑以下场景:template <typename ... Types> class Base { public: template <typename T> typename st...
    编程 发布于2024-11-08
  • 如何向字典条目等类添加动态属性?
    如何向字典条目等类添加动态属性?
    向类添加动态属性在使用模拟类模拟数据库结果集的过程中,出现了一个挑战:如何分配动态属性实例的属性类似于字典的属性。这涉及创建行为类似于具有特定值的属性的属性。最初,一种有前景的方法涉及使用以下方式分配属性:setattr(self, k, property(lambda x: vs[i], self...
    编程 发布于2024-11-08
  • 使用failsafe-go 库实现微服务之间通信的弹性
    使用failsafe-go 库实现微服务之间通信的弹性
    Let's start at the beginning. What is resilience? I like the definition in this post: The intrinsic ability of a system to adjust its functioning prio...
    编程 发布于2024-11-08
  • 系统集成测试:确保无缝软件集成
    系统集成测试:确保无缝软件集成
    在软件开发的动态环境中,确保系统的各个组件或模块无缝地协同工作对于提供可靠且高性能的软件解决方案至关重要。这篇博文深入探讨了系统集成测试 (SIT),这是软件测试生命周期中的一个关键阶段,用于验证集成组件之间的交互,确保系统的整体功能和可靠性。 什么是系统集成测试? 系统集成测试 (SIT) 是软...
    编程 发布于2024-11-08
  • 事件冒泡和捕获 - 像 5 岁一样学习
    事件冒泡和捕获 - 像 5 岁一样学习
    来吧,“像五岁一样学习”只是一个短语——我不是在这里讲一个玩具故事!但我保证,如果你从头到尾仔细阅读,一切都会有意义。 事件冒泡和捕获是在 JavaScript 中触发事件时事件如何通过 DOM(文档对象模型)传播(或传播)的两个阶段。现在,这个说法需要澄清事件传播的概念。 事件传播...
    编程 发布于2024-11-08
  • 如何将变量从一个页面发送到另一个 flutter/dart
    如何将变量从一个页面发送到另一个 flutter/dart
    您好,我目前正在尝试在连接步骤期间将用户的变量条目传递到我的应用程序的主页,但我遇到错误,我不知道是什么原因导致的,在我的页面 main (gamepage) 中,在男孩的孩子中,我想显示控制器的地址,但是 flutter 告诉我这个变量没有定义!!! 你是我唯一的希望 import 'dart:c...
    编程 发布于2024-11-08
  • 如何在 HTML 中使用 PHP?
    如何在 HTML 中使用 PHP?
    要在 HTML 中使用 PHP,必须用 PHP 开始标记 . 在本文中,我们将通过示例学习如何在 HTML 中使用 PHP。 PHP(超文本预处理器) 是一种用于 Web 开发的流行服务器端脚本语言。它允许您将动态内容嵌入到您的HTML。 在 HTML 中使用 PHP 的方法 要在 HTML 中有效...
    编程 发布于2024-11-08
  • 使用 LangSmith Hub 改变您的工作流程:JavaScript 工程师的游戏规则改变者
    使用 LangSmith Hub 改变您的工作流程:JavaScript 工程师的游戏规则改变者
    分散的人工智能提示是否会减慢您的开发进程?了解 LangChain Hub 如何彻底改变您的工作流程,为 JavaScript 工程师提供无缝且高效的即时管理。 介绍 想象一下管理一个项目,其中关键信息分散在文件中。令人沮丧,对吧?这就是处理 AI 提示的开发人员面临的现实。 Lan...
    编程 发布于2024-11-08

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3