”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 在Python中构建缓存

在Python中构建缓存

发布于2024-11-03
浏览:128

Building a cache in Python

Caching. Useful stuff. If you're not familiar with it, it's a way to keep data around in memory (or on disk) for fast retrieval. Think of querying a database for some information. Rather than do that every time an application asks for data, we can do it once and keep the result in a cache. Subsequent calls for the data will return the copy from cache instead of making a database query. In theory this improves the performance of your application.

Let's build a simple cache for use in Python programs.

Cache API

I'll start by creating a new module called simplecache, and defining a class Cache in it. I'll not implement anything yet, I just want to define the API my cache will use.

class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """


    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        pass


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        pass


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        pass


    def remove(self, key):
        """ Remove an item from the cache using the specified key. """
        pass


    def purge(self, all=False):
        """ Remove expired items from the cache, or all items if flag is set. """
        pass


    def close(self):
        """ Close the underlying connection used by the cache. """
        pass

So far so good. We can tell the cache to create a new in-memory or file based cache via the __init__ method. We can add items to the cache using update - which will overwrite the item if it already exists. We can get an item using a key get. Finally we can remove items by key remove, or empty the cache of expired items using purge (which optionally allows purging all items).

Where to cache the data?

So where is this class going to cache data? Sqlite ships with the Python standard library, and is ideal for this sort of thing. In fact one of the suggested use cases for Sqlite is caching. It allows us to create an in-memory or file based SQL database, which is both our use cases covered. Let's design a SQL table that can hold our cached data.

CREATE TABLE 'Cache' (
    'Key' TEXT NOT NULL UNIQUE,
    'Item' BLOB NOT NULL,
    'CreatedOn' TEXT NOT NULL,
    'TimeToLive' TEXT NOT NULL,
    PRIMARY KEY("Key"))
);

To break this down, we have a table called Cache that has four fields. The Key is a string, and will act as a unique primary key on the table. Next up our Item field is a blob of binary data - I'm thinking here that we will serialise objects added to the cache before saving them in the database. The last two fields are used to determine the lifetime of the item in the cache - CreatedOn is the timestamp of when the item was added, and TimeToLive is the length of time we need to hang on to the item for.

Constructing the cache

Let's start by importing the Sqlite library into our module.

import sqlite3

Then we need to turn our attention to the __init__ method. We have two scenarios to support: one where we are given a filename, and one where we are not.

def __init__(self, filename=None):
    """ Construct a new in-memory or file based cache."""
    if filename is None:
        self._connection = sqlite3.connect(":memory:")
    else:
        self._connection = sqlite3.connect(filename)
    self._create_schema()        

We can open a connection and keep it around for the lifetime of the class. Therefore we set the self._connection property to hold the connection instance, before calling an internal method _create_schema (more on that in a little while).

Bootstrapping the schema

If we are using an in-memory database, then our schema will not exist yet. However for file based databases that may not be the case. This may be an existing file that has already been setup with our schema. Let's see the code that handles this process.

def _create_schema(self):
    table_name = "Cache"
    cursor = self._connection.cursor()
    result = cursor.execute(
        "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
        (table_name,))
    cache_exists = result.fetchone() is not None
    if cache_exists:
        return    
    sql = """
        CREATE TABLE 'Cache' (
        'Key' TEXT NOT NULL UNIQUE,
        'Item' BLOB NOT NULL,
        'CreatedOn' TEXT NOT NULL,
        'TimeToLive' TEXT NOT NULL,
        PRIMARY KEY('Key'))
    """
    cursor.execute(sql)
    cursor.close()

First we define our table name and open a cursor to perform some database operations. Next we check for the existence of our table in the master table. If it's there we exit the method, otherwise we execute a CREATE TABLE... statement to build our table in the database.

We can now instantiate our Cache class with either an in-memory or file based database to cache objects in.

Adding to the cache

Let us now turn our attention to adding items to the cache. Recall our Update method from our API will be responsible for this. If a key already exists in the cache, we are going to replace its entry with whatever is supplied.

def update(self, key, item, ttl=60):
    """ Add or update an item in the cache using the supplied key. Optional
    ttl specifies how many seconds the item will live in the cache for. """
    sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is not None:
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        connection.commit()
    sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
    pickled_item = pickle.dumps(item)
    cursor.execute(sql, (key, pickled_item, ttl))
    self._connection.commit()
    cursor.close()

First we get the in-memory database connection. We then test for the existence of the the supplied key value in the cache. If it exists, it is removed from the cache. Next we pickle the supplied item value, turning it into a binary blob. We then insert the key, picked item, and ttl into the cache.

Getting data from the cache

Our behaviour for getting an item from the cache is mostly straight forward. We look for an entry with the specified key and unpickle the associated item. The complication in this process arises when the ttl value has expired. That is, the date of creation plus the ttl are less than the current time. If this is the case, then the item has expired and should be removed from the cache.

There is a philosophical issue with this method. There is a school of thought that says methods should either return a value (read) or perform a mutation of data (write). Here we are potentially doing both. We are deliberately introducing a side effect (deleting an expired item). I think it is OK in this case, but other programmers might argue otherwise.

def get(self, key):
    """ Get an item from the cache using the specified key. """
    sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is None:
        return
    item = pickle.loads(row[0])
    expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
    now = datetime.datetime.now()
    if expiry_date 



Removing and purging items in the cache

Removing an item is simple - in fact we have already done it (twice) in our other two methods. That's an ideal candidate for refactoring (which we will look at later on). For now, we will implement the method directly.

def remove(self, key):
    """ Remove an item from the cache using the specified key. """
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    cursor.execute(sql, (key,))
    self._connection.commit()
    cursor.close()

Purging items is a little more complex. We have two scenarios to support - purging all items and purging only expired items. Let's see how we can achieve this.

def purge(self, all=False):
    """ Remove expired items from the cache, or all items if flag is set. """
    cursor = self._connection.cursor()
    if all:
        sql = "DELETE FROM 'Cache'"
        cursor.execute(sql)
        self._connection.commit()
    else:
        sql = "SELECT Key, CreatedOn, TimeToLive from 'Cache'"
        for row in cursor.execute(sql):
            expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
            now = datetime.datetime.now()
            if expiry_date 



Deleting all is simple enough. We simply run SQL to delete everything in our Cache table. For the expired only items we need to loop through each row, compute the expiry date, and determine if it should be deleted. Again, this latter piece of code has been repeated from one of our other methods (get in this case). Another candidate for refactoring.

Refactoring

We have a working cache implementation that satisfies our original API specification. There is however some repeated code, which we can factor out into their own methods. Lets start with the delete logic that is present in get, update, remove, and purge methods. These instances can all be replaced with a call to the following new method.

def _remove_item(self, key, cursor):
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor.execute(sql, (key,))
    cursor.connection.commit()

We can see this has a big impact on our code. Four other methods are now calling the one common _remove_item method. Next let's take a look at the expiry date checking code.

def _item_has_expired(self, created, ttl):
    expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
    now = datetime.datetime.now()
    return expiry_date 



Great. That's two more places we have reduced code repetition in.

Thread safety with locks

We are almost done. For this to be a robust class, we need to ensure we are thread safe. Caches can often surface as singleton instances in an application, so thread safety is important. We will achieve this by using locks around the destructive cache operations. This is how our whole class looks with locking added. Note the with blocks around the add and delete operations. These ensure the lock is released even if something goes wrong.

#! /usr/bin/env python3


import datetime
import pickle
import sqlite3
import threading


class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """

    _lock = threading.Lock()

    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        if filename is None:
            self._connection = sqlite3.connect(":memory:")
        else:
            self._connection = sqlite3.connect(filename)
        self._create_schema()


    def _create_schema(self):
        table_name = "Cache"
        cursor = self._connection.cursor()
        result = cursor.execute(
            "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
            (table_name,))
        cache_exists = result.fetchone() is not None
        if cache_exists:
            return    
        sql = """
            CREATE TABLE 'Cache' (
            'Key' TEXT NOT NULL UNIQUE,
            'Item' BLOB NOT NULL,
            'CreatedOn' TEXT NOT NULL,
            'TimeToLive' TEXT NOT NULL,
            PRIMARY KEY('Key'))
        """
        cursor.execute(sql)
        cursor.close()


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        with self.__class__._lock:
            if row is not None:
                self._remove_item(key, cursor)
            sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
            pickled_item = pickle.dumps(item)
            cursor.execute(sql, (key, pickled_item, ttl))
            self._connection.commit()
        cursor.close()


    def _remove_item(self, key, cursor):
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        cursor.connection.commit()


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        if row is None:
            return
        item = pickle.loads(row[0])
        if self._item_has_expired(row[1], row[2]):
            with self.__class__._lock:
                self._remove_item(key, cursor)
            item = None
        cursor.close()
        return item


    def _item_has_expired(self, created, ttl):
        expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
        now = datetime.datetime.now()
        return expiry_date 



Testing the cache

Time to test our cache. We can do this be spinning up an interactive session as follows.

python -i simplecache.py

Now we can new up an in-memory cache and test our methods.

>>> c = Cache()
>>> c.update("key", "some value")
>>> c.update("key2", [1, 2, 3], 300)
>>> c.get("key")
'some vlaue'
>>> c.remove("key")
>>> c.purge()
>>> c.get("key2")
[1, 2, 3]
>>> c.purge(True)
>>> c.get("key2")
>>> c.close()
>>>

Exercises for the reader

  1. Write a suite of unit tests for the Cache class. How easy is it to test? Do you need to make any changes to accommodate testing?

  2. Make the time to live a sliding window instead of a fixed time. That is, whenever an item is retrieved from the cache, its time to live value starts over.

  3. Add a method to write the contents of the cache out to the screen.

版本声明 本文转载于:https://dev.to/robc79/building-a-cache-in-python-3moa?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • Java的Map.Entry和SimpleEntry如何简化键值对管理?
    Java的Map.Entry和SimpleEntry如何简化键值对管理?
    A Comprehensive Collection for Value Pairs: Introducing Java's Map.Entry and SimpleEntryIn Java, when defining a collection where each element com...
    编程 发布于2025-07-10
  • PHP与C++函数重载处理的区别
    PHP与C++函数重载处理的区别
    作为经验丰富的C开发人员脱离谜题,您可能会遇到功能超载的概念。这个概念虽然在C中普遍,但在PHP中构成了独特的挑战。让我们深入研究PHP功能过载的复杂性,并探索其提供的可能性。在PHP中理解php的方法在PHP中,函数超载的概念(如C等语言)不存在。函数签名仅由其名称定义,而与他们的参数列表无关。...
    编程 发布于2025-07-10
  • 如何在鼠标单击时编程选择DIV中的所有文本?
    如何在鼠标单击时编程选择DIV中的所有文本?
    在鼠标上选择div文本单击带有文本内容,用户如何使用单个鼠标单击单击div中的整个文本?这允许用户轻松拖放所选的文本或直接复制它。 在单个鼠标上单击的div元素中选择文本,您可以使用以下Javascript函数: function selecttext(canduterid){ if(do...
    编程 发布于2025-07-10
  • 如何克服PHP的功能重新定义限制?
    如何克服PHP的功能重新定义限制?
    克服PHP的函数重新定义限制 但是,PHP工具腰带中有一个隐藏的宝石:runkit扩展。它使您能够灵活地重新定义函数。 runkit_function_renction_rename() runkit_function_redefine() //重新定义'this'以返回“新和改...
    编程 发布于2025-07-10
  • HTML格式标签
    HTML格式标签
    HTML 格式化元素 **HTML Formatting is a process of formatting text for better look and feel. HTML provides us ability to format text without us...
    编程 发布于2025-07-10
  • Python高效去除文本中HTML标签方法
    Python高效去除文本中HTML标签方法
    在Python中剥离HTML标签,以获取原始的文本表示Achieving Text-Only Extraction with Python's MLStripperTo streamline the stripping process, the Python standard librar...
    编程 发布于2025-07-10
  • 如何在Chrome中居中选择框文本?
    如何在Chrome中居中选择框文本?
    选择框的文本对齐:局部chrome-inly-ly-ly-lyly solument 您可能希望将文本中心集中在选择框中,以获取优化的原因或提高可访问性。但是,在CSS中的选择元素中手动添加一个文本 - 对属性可能无法正常工作。初始尝试 state)</option> < op...
    编程 发布于2025-07-10
  • 如何在Java字符串中有效替换多个子字符串?
    如何在Java字符串中有效替换多个子字符串?
    在java 中有效地替换多个substring,需要在需要替换一个字符串中的多个substring的情况下,很容易求助于重复应用字符串的刺激力量。 However, this can be inefficient for large strings or when working with nu...
    编程 发布于2025-07-10
  • Java中Lambda表达式为何需要“final”或“有效final”变量?
    Java中Lambda表达式为何需要“final”或“有效final”变量?
    Lambda Expressions Require "Final" or "Effectively Final" VariablesThe error message "Variable used in lambda expression shou...
    编程 发布于2025-07-10
  • 如何使用Java.net.urlConnection和Multipart/form-data编码使用其他参数上传文件?
    如何使用Java.net.urlConnection和Multipart/form-data编码使用其他参数上传文件?
    使用http request 上传文件上传到http server,同时也提交其他参数,java.net.net.urlconnection and Multipart/form-data Encoding是普遍的。 Here's a breakdown of the process:Mu...
    编程 发布于2025-07-10
  • C++成员函数指针正确传递方法
    C++成员函数指针正确传递方法
    如何将成员函数置于c 的函数时,接受成员函数指针的函数时,必须同时提供对象的指针,并提供指针和指针到函数。需要具有一定签名的功能指针。要通过成员函数,您需要同时提供对象指针(此)和成员函数指针。这可以通过修改Menubutton :: SetButton()(如下所示:[&& && && &&华)...
    编程 发布于2025-07-10
  • 为什么我会收到MySQL错误#1089:错误的前缀密钥?
    为什么我会收到MySQL错误#1089:错误的前缀密钥?
    mySQL错误#1089:错误的前缀键错误descript [#1089-不正确的前缀键在尝试在表中创建一个prefix键时会出现。前缀键旨在索引字符串列的特定前缀长度长度,可以更快地搜索这些前缀。了解prefix keys `这将在整个Movie_ID列上创建标准主键。主密钥对于唯一识别...
    编程 发布于2025-07-10
  • 为什么HTML无法打印页码及解决方案
    为什么HTML无法打印页码及解决方案
    无法在html页面上打印页码? @page规则在@Media内部和外部都无济于事。 HTML:Customization:@page { margin: 10%; @top-center { font-family: sans-serif; font-weight: bo...
    编程 发布于2025-07-10
  • Python读取CSV文件UnicodeDecodeError终极解决方法
    Python读取CSV文件UnicodeDecodeError终极解决方法
    在试图使用已内置的CSV模块读取Python中时,CSV文件中的Unicode Decode Decode Decode Decode decode Error读取,您可能会遇到错误的错误:无法解码字节 在位置2-3中:截断\ uxxxxxxxx逃脱当CSV文件包含特殊字符或Unicode的路径逃...
    编程 发布于2025-07-10
  • PHP阵列键值异常:了解07和08的好奇情况
    PHP阵列键值异常:了解07和08的好奇情况
    PHP数组键值问题,使用07&08 在给定数月的数组中,键值07和08呈现令人困惑的行为时,就会出现一个不寻常的问题。运行print_r($月)返回意外结果:键“ 07”丢失,而键“ 08”分配给了9月的值。此问题源于PHP对领先零的解释。当一个数字带有0(例如07或08)的前缀时,PHP将其...
    编程 发布于2025-07-10

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3