」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 在Python中建構緩存

在Python中建構緩存

發佈於2024-11-03
瀏覽:785

Building a cache in Python

Caching. Useful stuff. If you're not familiar with it, it's a way to keep data around in memory (or on disk) for fast retrieval. Think of querying a database for some information. Rather than do that every time an application asks for data, we can do it once and keep the result in a cache. Subsequent calls for the data will return the copy from cache instead of making a database query. In theory this improves the performance of your application.

Let's build a simple cache for use in Python programs.

Cache API

I'll start by creating a new module called simplecache, and defining a class Cache in it. I'll not implement anything yet, I just want to define the API my cache will use.

class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """


    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        pass


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        pass


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        pass


    def remove(self, key):
        """ Remove an item from the cache using the specified key. """
        pass


    def purge(self, all=False):
        """ Remove expired items from the cache, or all items if flag is set. """
        pass


    def close(self):
        """ Close the underlying connection used by the cache. """
        pass

So far so good. We can tell the cache to create a new in-memory or file based cache via the __init__ method. We can add items to the cache using update - which will overwrite the item if it already exists. We can get an item using a key get. Finally we can remove items by key remove, or empty the cache of expired items using purge (which optionally allows purging all items).

Where to cache the data?

So where is this class going to cache data? Sqlite ships with the Python standard library, and is ideal for this sort of thing. In fact one of the suggested use cases for Sqlite is caching. It allows us to create an in-memory or file based SQL database, which is both our use cases covered. Let's design a SQL table that can hold our cached data.

CREATE TABLE 'Cache' (
    'Key' TEXT NOT NULL UNIQUE,
    'Item' BLOB NOT NULL,
    'CreatedOn' TEXT NOT NULL,
    'TimeToLive' TEXT NOT NULL,
    PRIMARY KEY("Key"))
);

To break this down, we have a table called Cache that has four fields. The Key is a string, and will act as a unique primary key on the table. Next up our Item field is a blob of binary data - I'm thinking here that we will serialise objects added to the cache before saving them in the database. The last two fields are used to determine the lifetime of the item in the cache - CreatedOn is the timestamp of when the item was added, and TimeToLive is the length of time we need to hang on to the item for.

Constructing the cache

Let's start by importing the Sqlite library into our module.

import sqlite3

Then we need to turn our attention to the __init__ method. We have two scenarios to support: one where we are given a filename, and one where we are not.

def __init__(self, filename=None):
    """ Construct a new in-memory or file based cache."""
    if filename is None:
        self._connection = sqlite3.connect(":memory:")
    else:
        self._connection = sqlite3.connect(filename)
    self._create_schema()        

We can open a connection and keep it around for the lifetime of the class. Therefore we set the self._connection property to hold the connection instance, before calling an internal method _create_schema (more on that in a little while).

Bootstrapping the schema

If we are using an in-memory database, then our schema will not exist yet. However for file based databases that may not be the case. This may be an existing file that has already been setup with our schema. Let's see the code that handles this process.

def _create_schema(self):
    table_name = "Cache"
    cursor = self._connection.cursor()
    result = cursor.execute(
        "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
        (table_name,))
    cache_exists = result.fetchone() is not None
    if cache_exists:
        return    
    sql = """
        CREATE TABLE 'Cache' (
        'Key' TEXT NOT NULL UNIQUE,
        'Item' BLOB NOT NULL,
        'CreatedOn' TEXT NOT NULL,
        'TimeToLive' TEXT NOT NULL,
        PRIMARY KEY('Key'))
    """
    cursor.execute(sql)
    cursor.close()

First we define our table name and open a cursor to perform some database operations. Next we check for the existence of our table in the master table. If it's there we exit the method, otherwise we execute a CREATE TABLE... statement to build our table in the database.

We can now instantiate our Cache class with either an in-memory or file based database to cache objects in.

Adding to the cache

Let us now turn our attention to adding items to the cache. Recall our Update method from our API will be responsible for this. If a key already exists in the cache, we are going to replace its entry with whatever is supplied.

def update(self, key, item, ttl=60):
    """ Add or update an item in the cache using the supplied key. Optional
    ttl specifies how many seconds the item will live in the cache for. """
    sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is not None:
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        connection.commit()
    sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
    pickled_item = pickle.dumps(item)
    cursor.execute(sql, (key, pickled_item, ttl))
    self._connection.commit()
    cursor.close()

First we get the in-memory database connection. We then test for the existence of the the supplied key value in the cache. If it exists, it is removed from the cache. Next we pickle the supplied item value, turning it into a binary blob. We then insert the key, picked item, and ttl into the cache.

Getting data from the cache

Our behaviour for getting an item from the cache is mostly straight forward. We look for an entry with the specified key and unpickle the associated item. The complication in this process arises when the ttl value has expired. That is, the date of creation plus the ttl are less than the current time. If this is the case, then the item has expired and should be removed from the cache.

There is a philosophical issue with this method. There is a school of thought that says methods should either return a value (read) or perform a mutation of data (write). Here we are potentially doing both. We are deliberately introducing a side effect (deleting an expired item). I think it is OK in this case, but other programmers might argue otherwise.

def get(self, key):
    """ Get an item from the cache using the specified key. """
    sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    result = cursor.execute(sql, (key,))
    row = result.fetchone()
    if row is None:
        return
    item = pickle.loads(row[0])
    expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
    now = datetime.datetime.now()
    if expiry_date 



Removing and purging items in the cache

Removing an item is simple - in fact we have already done it (twice) in our other two methods. That's an ideal candidate for refactoring (which we will look at later on). For now, we will implement the method directly.

def remove(self, key):
    """ Remove an item from the cache using the specified key. """
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor = self._connection.cursor()
    cursor.execute(sql, (key,))
    self._connection.commit()
    cursor.close()

Purging items is a little more complex. We have two scenarios to support - purging all items and purging only expired items. Let's see how we can achieve this.

def purge(self, all=False):
    """ Remove expired items from the cache, or all items if flag is set. """
    cursor = self._connection.cursor()
    if all:
        sql = "DELETE FROM 'Cache'"
        cursor.execute(sql)
        self._connection.commit()
    else:
        sql = "SELECT Key, CreatedOn, TimeToLive from 'Cache'"
        for row in cursor.execute(sql):
            expiry_date = datetime.datetime.fromisoformat(row[1])   datetime.timedelta(seconds=int(row[2]))
            now = datetime.datetime.now()
            if expiry_date 



Deleting all is simple enough. We simply run SQL to delete everything in our Cache table. For the expired only items we need to loop through each row, compute the expiry date, and determine if it should be deleted. Again, this latter piece of code has been repeated from one of our other methods (get in this case). Another candidate for refactoring.

Refactoring

We have a working cache implementation that satisfies our original API specification. There is however some repeated code, which we can factor out into their own methods. Lets start with the delete logic that is present in get, update, remove, and purge methods. These instances can all be replaced with a call to the following new method.

def _remove_item(self, key, cursor):
    sql = "DELETE FROM 'Cache' WHERE Key = ?"
    cursor.execute(sql, (key,))
    cursor.connection.commit()

We can see this has a big impact on our code. Four other methods are now calling the one common _remove_item method. Next let's take a look at the expiry date checking code.

def _item_has_expired(self, created, ttl):
    expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
    now = datetime.datetime.now()
    return expiry_date 



Great. That's two more places we have reduced code repetition in.

Thread safety with locks

We are almost done. For this to be a robust class, we need to ensure we are thread safe. Caches can often surface as singleton instances in an application, so thread safety is important. We will achieve this by using locks around the destructive cache operations. This is how our whole class looks with locking added. Note the with blocks around the add and delete operations. These ensure the lock is released even if something goes wrong.

#! /usr/bin/env python3


import datetime
import pickle
import sqlite3
import threading


class Cache:
    """ A simple caching class that works with an in-memory or file based
    cache. """

    _lock = threading.Lock()

    def __init__(self, filename=None):
        """ Construct a new in-memory or file based cache."""
        if filename is None:
            self._connection = sqlite3.connect(":memory:")
        else:
            self._connection = sqlite3.connect(filename)
        self._create_schema()


    def _create_schema(self):
        table_name = "Cache"
        cursor = self._connection.cursor()
        result = cursor.execute(
            "SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
            (table_name,))
        cache_exists = result.fetchone() is not None
        if cache_exists:
            return    
        sql = """
            CREATE TABLE 'Cache' (
            'Key' TEXT NOT NULL UNIQUE,
            'Item' BLOB NOT NULL,
            'CreatedOn' TEXT NOT NULL,
            'TimeToLive' TEXT NOT NULL,
            PRIMARY KEY('Key'))
        """
        cursor.execute(sql)
        cursor.close()


    def update(self, key, item, ttl=60):
        """ Add or update an item in the cache using the supplied key. Optional
        ttl specifies how many seconds the item will live in the cache for. """
        sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        with self.__class__._lock:
            if row is not None:
                self._remove_item(key, cursor)
            sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
            pickled_item = pickle.dumps(item)
            cursor.execute(sql, (key, pickled_item, ttl))
            self._connection.commit()
        cursor.close()


    def _remove_item(self, key, cursor):
        sql = "DELETE FROM 'Cache' WHERE Key = ?"
        cursor.execute(sql, (key,))
        cursor.connection.commit()


    def get(self, key):
        """ Get an item from the cache using the specified key. """
        sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
        cursor = self._connection.cursor()
        result = cursor.execute(sql, (key,))
        row = result.fetchone()
        if row is None:
            return
        item = pickle.loads(row[0])
        if self._item_has_expired(row[1], row[2]):
            with self.__class__._lock:
                self._remove_item(key, cursor)
            item = None
        cursor.close()
        return item


    def _item_has_expired(self, created, ttl):
        expiry_date = datetime.datetime.fromisoformat(created)   datetime.timedelta(seconds=int(ttl))
        now = datetime.datetime.now()
        return expiry_date 



Testing the cache

Time to test our cache. We can do this be spinning up an interactive session as follows.

python -i simplecache.py

Now we can new up an in-memory cache and test our methods.

>>> c = Cache()
>>> c.update("key", "some value")
>>> c.update("key2", [1, 2, 3], 300)
>>> c.get("key")
'some vlaue'
>>> c.remove("key")
>>> c.purge()
>>> c.get("key2")
[1, 2, 3]
>>> c.purge(True)
>>> c.get("key2")
>>> c.close()
>>>

Exercises for the reader

  1. Write a suite of unit tests for the Cache class. How easy is it to test? Do you need to make any changes to accommodate testing?

  2. Make the time to live a sliding window instead of a fixed time. That is, whenever an item is retrieved from the cache, its time to live value starts over.

  3. Add a method to write the contents of the cache out to the screen.

版本聲明 本文轉載於:https://dev.to/robc79/building-a-cache-in-python-3moa?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 資料分析師清單
    資料分析師清單
    SQL 清單 Excel女士清單 Power BI 清單 Tableau 清單 Python 清單 請關注此 WhatsApp 頻道以獲取更多資源
    程式設計 發佈於2024-11-08
  • 如何在 Go 中將 YAML 欄位動態解析為有限結構集?
    如何在 Go 中將 YAML 欄位動態解析為有限結構集?
    在 Go 中將 YAML 欄位動態解析為有限結構體集簡介在 Go 中將 YAML 解析為結構體非常簡單。但是,當 YAML 欄位可以表示多個可能的結構時,任務就會變得更加複雜。本文探討了使用 Go 的 YAML 套件的動態方法。 使用 YAML v2 進行動態解組對於 Yaml v2,可以使用以下方...
    程式設計 發佈於2024-11-08
  • 為什麼我的 C++ 程式碼中會出現「vtable」和「typeinfo」未定義符號錯誤?
    為什麼我的 C++ 程式碼中會出現「vtable」和「typeinfo」未定義符號錯誤?
    未定義的符號:「vtable」和「typeinfo」在提供的程式碼中,出現連結錯誤並顯示下列訊息: Undefined symbols: "vtable for Obstacle", referenced from: Obstacle::Obstacle()in ...
    程式設計 發佈於2024-11-08
  • 如何在 Python 中執行指數和對數曲線擬合?
    如何在 Python 中執行指數和對數曲線擬合?
    曲線擬合:Python 中的指數和對數方法雖然Python 中可以使用polyfit() 輕鬆進行多項式曲線擬合,但本指南探討了指數和對數曲線的方法擬合。 對數擬合擬合 y 形式的直線= A B log x,只需執行 y 對 log x 的多項式擬合。 import numpy as np x = ...
    程式設計 發佈於2024-11-08
  • 大批
    大批
    方法是可以在物件上呼叫的 fns 數組是對象,因此它們在 JS 中也有方法。 slice(begin):將陣列的一部分提取到新數組中,而不改變原始數組。 let arr = ['a','b','c','d','e']; // Usecase: Extract till index ...
    程式設計 發佈於2024-11-08
  • 如何實現ES6模組的條件導入?
    如何實現ES6模組的條件導入?
    ES6模組的條件導入在ES6中,'import'和'export'關鍵字只能出現在模組的頂層模組。這可以防止條件導入,這是許多應用程式中的常見要求。這個問題探討了這個問題的解決方案。 最初,使用者嘗試使用條件語句匯入模組,但這導致了語法錯誤。然後,使用者使用 Syst...
    程式設計 發佈於2024-11-08
  • 如何在 Node.js 中使用 Promises 非同步處理 MySQL 回傳值?
    如何在 Node.js 中使用 Promises 非同步處理 MySQL 回傳值?
    在Node.js 中利用Promise 處理MySQL 回傳值從Python 過渡到Node.js,Node.js 的非同步特性使得Node.js 的非同步特性變得更加重要。 Node.js 可能會帶來挑戰。考慮一個場景,您需要從 MySQL 函數傳回一個值,例如 getLastRecord(nam...
    程式設計 發佈於2024-11-08
  • 我們應該在 C++ 函數原型中使用異常說明符嗎?
    我們應該在 C++ 函數原型中使用異常說明符嗎?
    C 中的異常:我們應該在函數原型中指定它們嗎? 在 C 中,例外說明符允許函數宣告它們是否可以拋出例外。然而,由於對其有效性和後果的擔憂,它們的使用受到了質疑。 反對使用異常說明符的原因:執行不力: 編譯器並未嚴格強制執行異常說明符,因此違反它們可能不會導致錯誤。這會破壞它們的可靠性。 程式終止:違...
    程式設計 發佈於2024-11-08
  • Python 的 If 語句中何時使用 and 關鍵字進行邏輯連結?
    Python 的 If 語句中何時使用 and 關鍵字進行邏輯連結?
    Python If 語句中的邏輯 AND在 Python 中使用 if 語句時,必須使用正確的邏輯運算子來計算多個條件。邏輯與運算子在許多程式語言中以 && 表示,它評估兩個運算元的真實性,並且僅當兩個運算元都為 true 時才傳回 True。 但是,在 Python 的 if 語句中,&& 不被辨...
    程式設計 發佈於2024-11-08
  • 什麼是 Redux,我們要如何使用它?
    什麼是 Redux,我們要如何使用它?
    What is Redux, and how do we use it? Redux is like a helpful tool for managing the state of JavaScript programs. It helps keep everything organized an...
    程式設計 發佈於2024-11-08
  • 唯一索引可以刪除具有現有重複項的表中的重複項嗎?
    唯一索引可以刪除具有現有重複項的表中的重複項嗎?
    通過唯一索引去重通過唯一索引去重為了防止重複資料插入,錯誤地為字段A、B創建了普通索引,C、D ,導致2000萬筆記錄的表中存在重複記錄。問題出現了:為這些欄位新增唯一索引會在不影響現有欄位的情況下刪除重複項嗎? 更正索引並處理重複項添加唯一索引不帶 IGNORE 修飾符的 ALTER TABLE ...
    程式設計 發佈於2024-11-08
  • Java 中的 Setters 和 Record
    Java 中的 Setters 和 Record
    record是一種結構,其特點是不可變,也就是說,一旦創建了record類型的對象,它的屬性不能修改,它相當於其他程式語言所說的data-class或DTO(資料傳輸物件)。但是,如果需要使用setter方法來修改某個屬性,並且考慮到記錄中的每個屬性都是final類型,那麼如何實現呢? 為了證明這...
    程式設計 發佈於2024-11-08
  • 利用剩餘參數
    利用剩餘參數
    我今天瀏覽了node.js資料,並使用了path.join方法。該方法在node.js中被廣泛使用。 path.join("/foo", "bar"); // Returns: '/foo/bar' path.join('/foo', 'bar', 'baz/asdf', 'quux', '..'...
    程式設計 發佈於2024-11-08
  • 如何從匯入的 CSV 檔案中刪除 BOM?
    如何從匯入的 CSV 檔案中刪除 BOM?
    從匯入的CSV 檔案中刪除BOM匯入.csv 檔案時,常會遇到BOM(位元組順序標記),它可能會幹擾數據處理。可以透過從檔案中刪除 BOM 來解決此問題。 刪除BOM 的一種方法是使用正規表示式:$new_file = preg_replace('/[\x00-\x1F\x80-\xFF]/', '...
    程式設計 發佈於2024-11-08
  • 如何在 C++ 中傳遞給非主函數的陣列上使用基於範圍的 for 迴圈?
    如何在 C++ 中傳遞給非主函數的陣列上使用基於範圍的 for 迴圈?
    傳遞給非主函數的數組上基於範圍的for 循環在C 中,基於範圍的for 循環可以是用於迭代數組。但是,當數組傳遞給非主函數時,它會衰減為指針,從而失去其大小資訊。 要解決此問題並啟用基於範圍的 for 循環,數組應該被引用而不是作為指標傳遞。這保留了數組的大小資訊。以下是示範正確方法的修改範例:vo...
    程式設計 發佈於2024-11-08

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3