”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 在Python中构建缓存

在Python中构建缓存

发布于2024-11-03
浏览:339

Building a cache in Python

Caching. Useful stuff. If you're not familiar with it, it's a way to keep data around in memory (or on disk) for fast retrieval. Think of querying a database for some information. Rather than do that every time an application asks for data, we can do it once and keep the result in a cache. Subsequent calls for the data will return the copy from cache instead of making a database query. In theory this improves the performance of your application.

Let's build a simple cache for use in Python programs.

Cache API

I'll start by creating a new module called simplecache, and defining a class Cache in it. I'll not implement anything yet, I just want to define the API my cache will use.

class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """


    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        pass


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        pass


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        pass


    def remove(self, key):
        """ Remove an item from the cache using the specified key. """
        pass


    def purge(self, all=False):
        """ Remove expired items from the cache, or all items if flag is set. """
        pass


    def close(self):
        """ Close the underlying connection used by the cache. """
        pass

So far so good. We can tell the cache to create a new in-memory or file based cache via the __init__ method. We can add items to the cache using update - which will overwrite the item if it already exists. We can get an item using a key get. Finally we can remove items by key remove, or empty the cache of expired items using purge (which optionally allows purging all items).

Where to cache the data?

So where is this class going to cache data? Sqlite ships with the Python standard library, and is ideal for this sort of thing. In fact one of the suggested use cases for Sqlite is caching. It allows us to create an in-memory or file based SQL database, which is both our use cases covered. Let's design a SQL table that can hold our cached data.

CREATE TABLE 'Cache' (
    'Key' TEXT NOT NULL UNIQUE,
    'Item' BLOB NOT NULL,
    'CreatedOn' TEXT NOT NULL,
    'TimeToLive' TEXT NOT NULL,
    PRIMARY KEY("Key"))
);

To break this down, we have a table called Cache that has four fields. The Key is a string, and will act as a unique primary key on the table. Next up our Item field is a blob of binary data - I'm thinking here that we will serialise objects added to the cache before saving them in the database. The last two fields are used to determine the lifetime of the item in the cache - CreatedOn is the timestamp of when the item was added, and TimeToLive is the length of time we need to hang on to the item for.

Constructing the cache

Let's start by importing the Sqlite library into our module.

import sqlite3

Then we need to turn our attention to the __init__ method. We have two scenarios to support: one where we are given a filename, and one where we are not.

def __init__(self, filename=None):
    """ Construct a new in-memory or file based cache."""
    if filename is None:
        self._connection = sqlite3.connect(":memory:")
    else:
        self._connection = sqlite3.connect(filename)
    self._create_schema()        

We can open a connection and keep it around for the lifetime of the class. Therefore we set the self._connection property to hold the connection instance, before calling an internal method _create_schema (more on that in a little while).

Bootstrapping the schema

If we are using an in-memory database, then our schema will not exist yet. However for file based databases that may not be the case. This may be an existing file that has already been setup with our schema. Let's see the code that handles this process.

def _create_schema(self):
    table_name = "Cache"
    cursor = self._connection.cursor()
    result = cursor.execute(
        "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
        (table_name,))
    cache_exists = result.fetchone() is not None
    if cache_exists:
        return    
    sql = """
        CREATE TABLE 'Cache' (
        'Key' TEXT NOT NULL UNIQUE,
        'Item' BLOB NOT NULL,
        'CreatedOn' TEXT NOT NULL,
        'TimeToLive' TEXT NOT NULL,
        PRIMARY KEY('Key'))
    """
    cursor.execute(sql)
    cursor.close()

First we define our table name and open a cursor to perform some database operations. Next we check for the existence of our table in the master table. If it's there we exit the method, otherwise we execute a CREATE TABLE... statement to build our table in the database.

We can now instantiate our Cache class with either an in-memory or file based database to cache objects in.

Adding to the cache

Let us now turn our attention to adding items to the cache. Recall our Update method from our API will be responsible for this. If a key already exists in the cache, we are going to replace its entry with whatever is supplied.

def update(self, key, item, ttl=60):
    """ Add or update an item in the cache using the supplied key. Optional
    ttl specifies how many seconds the item will live in the cache for. """
    sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is not None:
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        connection.commit()
    sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
    pickled_item = pickle.dumps(item)
    cursor.execute(sql, (key, pickled_item, ttl))
    self._connection.commit()
    cursor.close()

First we get the in-memory database connection. We then test for the existence of the the supplied key value in the cache. If it exists, it is removed from the cache. Next we pickle the supplied item value, turning it into a binary blob. We then insert the key, picked item, and ttl into the cache.

Getting data from the cache

Our behaviour for getting an item from the cache is mostly straight forward. We look for an entry with the specified key and unpickle the associated item. The complication in this process arises when the ttl value has expired. That is, the date of creation plus the ttl are less than the current time. If this is the case, then the item has expired and should be removed from the cache.

There is a philosophical issue with this method. There is a school of thought that says methods should either return a value (read) or perform a mutation of data (write). Here we are potentially doing both. We are deliberately introducing a side effect (deleting an expired item). I think it is OK in this case, but other programmers might argue otherwise.

def get(self, key):
    """ Get an item from the cache using the specified key. """
    sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is None:
        return
    item = pickle.loads(row[0])
    expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
    now = datetime.datetime.now()
    if expiry_date 



Removing and purging items in the cache

Removing an item is simple - in fact we have already done it (twice) in our other two methods. That's an ideal candidate for refactoring (which we will look at later on). For now, we will implement the method directly.

def remove(self, key):
    """ Remove an item from the cache using the specified key. """
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    cursor.execute(sql, (key,))
    self._connection.commit()
    cursor.close()

Purging items is a little more complex. We have two scenarios to support - purging all items and purging only expired items. Let's see how we can achieve this.

def purge(self, all=False):
    """ Remove expired items from the cache, or all items if flag is set. """
    cursor = self._connection.cursor()
    if all:
        sql = "DELETE FROM 'Cache'"
        cursor.execute(sql)
        self._connection.commit()
    else:
        sql = "SELECT Key, CreatedOn, TimeToLive from 'Cache'"
        for row in cursor.execute(sql):
            expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
            now = datetime.datetime.now()
            if expiry_date 



Deleting all is simple enough. We simply run SQL to delete everything in our Cache table. For the expired only items we need to loop through each row, compute the expiry date, and determine if it should be deleted. Again, this latter piece of code has been repeated from one of our other methods (get in this case). Another candidate for refactoring.

Refactoring

We have a working cache implementation that satisfies our original API specification. There is however some repeated code, which we can factor out into their own methods. Lets start with the delete logic that is present in get, update, remove, and purge methods. These instances can all be replaced with a call to the following new method.

def _remove_item(self, key, cursor):
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor.execute(sql, (key,))
    cursor.connection.commit()

We can see this has a big impact on our code. Four other methods are now calling the one common _remove_item method. Next let's take a look at the expiry date checking code.

def _item_has_expired(self, created, ttl):
    expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
    now = datetime.datetime.now()
    return expiry_date 



Great. That's two more places we have reduced code repetition in.

Thread safety with locks

We are almost done. For this to be a robust class, we need to ensure we are thread safe. Caches can often surface as singleton instances in an application, so thread safety is important. We will achieve this by using locks around the destructive cache operations. This is how our whole class looks with locking added. Note the with blocks around the add and delete operations. These ensure the lock is released even if something goes wrong.

#! /usr/bin/env python3


import datetime
import pickle
import sqlite3
import threading


class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """

    _lock = threading.Lock()

    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        if filename is None:
            self._connection = sqlite3.connect(":memory:")
        else:
            self._connection = sqlite3.connect(filename)
        self._create_schema()


    def _create_schema(self):
        table_name = "Cache"
        cursor = self._connection.cursor()
        result = cursor.execute(
            "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
            (table_name,))
        cache_exists = result.fetchone() is not None
        if cache_exists:
            return    
        sql = """
            CREATE TABLE 'Cache' (
            'Key' TEXT NOT NULL UNIQUE,
            'Item' BLOB NOT NULL,
            'CreatedOn' TEXT NOT NULL,
            'TimeToLive' TEXT NOT NULL,
            PRIMARY KEY('Key'))
        """
        cursor.execute(sql)
        cursor.close()


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        with self.__class__._lock:
            if row is not None:
                self._remove_item(key, cursor)
            sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
            pickled_item = pickle.dumps(item)
            cursor.execute(sql, (key, pickled_item, ttl))
            self._connection.commit()
        cursor.close()


    def _remove_item(self, key, cursor):
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        cursor.connection.commit()


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        if row is None:
            return
        item = pickle.loads(row[0])
        if self._item_has_expired(row[1], row[2]):
            with self.__class__._lock:
                self._remove_item(key, cursor)
            item = None
        cursor.close()
        return item


    def _item_has_expired(self, created, ttl):
        expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
        now = datetime.datetime.now()
        return expiry_date 



Testing the cache

Time to test our cache. We can do this be spinning up an interactive session as follows.

python -i simplecache.py

Now we can new up an in-memory cache and test our methods.

>>> c = Cache()
>>> c.update("key", "some value")
>>> c.update("key2", [1, 2, 3], 300)
>>> c.get("key")
'some vlaue'
>>> c.remove("key")
>>> c.purge()
>>> c.get("key2")
[1, 2, 3]
>>> c.purge(True)
>>> c.get("key2")
>>> c.close()
>>>

Exercises for the reader

  1. Write a suite of unit tests for the Cache class. How easy is it to test? Do you need to make any changes to accommodate testing?

  2. Make the time to live a sliding window instead of a fixed time. That is, whenever an item is retrieved from the cache, its time to live value starts over.

  3. Add a method to write the contents of the cache out to the screen.

版本声明 本文转载于:https://dev.to/robc79/building-a-cache-in-python-3moa?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 如何配置Pytesseract以使用数字输出的单位数字识别?
    如何配置Pytesseract以使用数字输出的单位数字识别?
    Pytesseract OCR具有单位数字识别和仅数字约束 在pytesseract的上下文中,在配置tesseract以识别单位数字和限制单个数字和限制输出对数字可能会提出质疑。 To address this issue, we delve into the specifics of Te...
    编程 发布于2025-04-11
  • 为什么不````''{margin:0; }`始终删除CSS中的最高边距?
    为什么不````''{margin:0; }`始终删除CSS中的最高边距?
    在CSS 问题:不正确的代码: 全球范围将所有余量重置为零,如提供的代码所建议的,可能会导致意外的副作用。解决特定的保证金问题是更建议的。 例如,在提供的示例中,将以下代码添加到CSS中,将解决余量问题: body H1 { 保证金顶:-40px; } 此方法更精确,避免了由全局保证金重置引...
    编程 发布于2025-04-11
  • HTML格式标签
    HTML格式标签
    HTML 格式化元素 **HTML Formatting is a process of formatting text for better look and feel. HTML provides us ability to format text without us...
    编程 发布于2025-04-11
  • 对象拟合:IE和Edge中的封面失败,如何修复?
    对象拟合:IE和Edge中的封面失败,如何修复?
    To resolve this issue, we employ a clever CSS solution that solves the problem:position: absolute;top: 50%;left: 50%;transform: translate(-50%, -50%)...
    编程 发布于2025-04-11
  • 如何修复\“常规错误:2006 MySQL Server在插入数据时已经消失\”?
    如何修复\“常规错误:2006 MySQL Server在插入数据时已经消失\”?
    How to Resolve "General error: 2006 MySQL server has gone away" While Inserting RecordsIntroduction:Inserting data into a MySQL database can...
    编程 发布于2025-04-11
  • 如何同步迭代并从PHP中的两个等级阵列打印值?
    如何同步迭代并从PHP中的两个等级阵列打印值?
    同步的迭代和打印值来自相同大小的两个数组使用两个数组相等大小的selectbox时,一个包含country代码的数组,另一个包含乡村代码,另一个包含其相应名称的数组,可能会因不当提供了exply for for for the uncore for the forsion for for ytry...
    编程 发布于2025-04-11
  • 如何有效地选择熊猫数据框中的列?
    如何有效地选择熊猫数据框中的列?
    在处理数据操作任务时,在Pandas DataFrames 中选择列时,选择特定列的必要条件是必要的。在Pandas中,选择列的各种选项。选项1:使用列名 如果已知列索引,请使用ILOC函数选择它们。请注意,python索引基于零。 df1 = df.iloc [:,0:2]#使用索引0和1 c...
    编程 发布于2025-04-11
  • 如何在无序集合中为元组实现通用哈希功能?
    如何在无序集合中为元组实现通用哈希功能?
    在未订购的集合中的元素要纠正此问题,一种方法是手动为特定元组类型定义哈希函数,例如: template template template 。 struct std :: hash { size_t operator()(std :: tuple const&tuple)const {...
    编程 发布于2025-04-11
  • 如何从PHP中的Unicode字符串中有效地产生对URL友好的sl。
    如何从PHP中的Unicode字符串中有效地产生对URL友好的sl。
    为有效的slug生成首先,该函数用指定的分隔符替换所有非字母或数字字符。此步骤可确保slug遵守URL惯例。随后,它采用ICONV函数将文本简化为us-ascii兼容格式,从而允许更广泛的字符集合兼容性。接下来,该函数使用正则表达式删除了不需要的字符,例如特殊字符和空格。此步骤可确保slug仅包含...
    编程 发布于2025-04-11
  • 为什么不使用CSS`content'属性显示图像?
    为什么不使用CSS`content'属性显示图像?
    在Firefox extemers属性为某些图像很大,&& && && &&华倍华倍[华氏华倍华氏度]很少见,却是某些浏览属性很少,尤其是特定于Firefox的某些浏览器未能在使用内容属性引用时未能显示图像的情况。这可以在提供的CSS类中看到:。googlepic { 内容:url(&#...
    编程 发布于2025-04-11
  • 为什么PHP的DateTime :: Modify('+1个月')会产生意外的结果?
    为什么PHP的DateTime :: Modify('+1个月')会产生意外的结果?
    使用php dateTime修改月份:发现预期的行为在使用PHP的DateTime类时,添加或减去几个月可能并不总是会产生预期的结果。正如文档所警告的那样,“当心”这些操作的“不像看起来那样直观。 考虑文档中给出的示例:这是内部发生的事情: 现在在3月3日添加另一个月,因为2月在2001年只有2...
    编程 发布于2025-04-11
  • 为什么在我的Linux服务器上安装Archive_Zip后,我找不到“ class \” class \'ziparchive \'错误?
    为什么在我的Linux服务器上安装Archive_Zip后,我找不到“ class \” class \'ziparchive \'错误?
    class'ziparchive'在Linux Server上安装Archive_zip时找不到错误 commant in lin ins in cland ins in lin.11 on a lin.1 in a lin.11错误:致命错误:在... cass中找不到类z...
    编程 发布于2025-04-11
  • 在Ubuntu/linux上安装mysql-python时,如何修复\“ mysql_config \”错误?
    在Ubuntu/linux上安装mysql-python时,如何修复\“ mysql_config \”错误?
    mysql-python安装错误:“ mysql_config找不到”“ 由于缺少MySQL开发库而出现此错误。解决此问题,建议在Ubuntu上使用该分发的存储库。使用以下命令安装Python-MysqldB: sudo apt-get安装python-mysqldb sudo pip in...
    编程 发布于2025-04-11
  • 在程序退出之前,我需要在C ++中明确删除堆的堆分配吗?
    在程序退出之前,我需要在C ++中明确删除堆的堆分配吗?
    在C中的显式删除 在C中的动态内存分配时,开发人员通常会想知道是否有必要在heap-procal extrable exit exit上进行手动调用“ delete”操作员,但开发人员通常会想知道是否需要手动调用“ delete”操作员。本文深入研究了这个主题。 在C主函数中,使用了动态分配变量(H...
    编程 发布于2025-04-11
  • \“(1)vs.(;;):编译器优化是否消除了性能差异?\”
    \“(1)vs.(;;):编译器优化是否消除了性能差异?\”
    答案: 在大多数现代编译器中,while(1)和(1)和(;;)之间没有性能差异。编译器: perl: 1 输入 - > 2 2 NextState(Main 2 -E:1)V-> 3 9 Leaveloop VK/2-> A 3 toterloop(next-> 8 last-> 9 ...
    编程 发布于2025-04-11

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3