」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > Step Functions 中間件:自動儲存和載入來自 Amazon S3 的有效負載

Step Functions 中間件:自動儲存和載入來自 Amazon S3 的有效負載

發佈於2024-08-23
瀏覽:633

I would like to introduce middy-store, a new library I built over the last couple of months. I've been pondering this idea for a while, going back to this feature request I opened more than a year ago. middy-store is a middleware for Middy that automatically stores and loads payloads from and to a Store like Amazon S3 or potentially other services.

Motivation

AWS services have certain limits that one must be aware of. For example, AWS Lambda has a payload limit of 6MB for synchronous invocations and 256KB for asynchronous invocations. AWS Step Functions allows for a maximum input or output size of 256KB of data as a UTF-8 encoded string. If you exceed this limit when returning data, you will encounter the infamous States.DataLimitExceeded exception.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

The usual workaround for this limitation is to check the size of your payload and save it temporarily in persistent storage such as Amazon S3. Then, you return the object URL or ARN for S3. The next Lambda checks if there is a URL or ARN in the input and loads the payload from S3. As one can imagine, this results in a lot of boilerplate code to store and load the payload from and to Amazon S3, which has to be repeated in every Lambda.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

This becomes even more cumbersome when you only want to save part of the payload to S3 and leave the rest as is. For example, when working with Step Functions, the payload could contain control flow data for states like Choice or Map, which has to be accessed directly. This means the first Lambda saves a partial payload to S3, and the next Lambda has to load the partial payload from S3 and merge it with the rest of the payload. This requires ensuring that the types are consistent across multiple functions, which is, of course, very error-prone.

How it works

middy-store is a middleware for Middy. It's attached to a Lambda function and is called twice during a Lambda invocation: before and after the Lambda handler() runs. It receives the input before the handler runs and receives the output from the handler after it has finished.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

Let's start at the end with the output after a successful invocation to make it easier to follow: middy-store receives the output (the payload) from the handler() function and checks the size. To calculate the size, it stringifies the payload, if it is an object, and uses Buffer.byteLength() to calculate the UTF-8 encoded string size. If the size is larger than a certain configurable threshold, the payload is stored in a Store like Amazon S3. The reference to the stored payload (e.g., an S3 URL or ARN) is then returned as the output instead of the original output.

Now let's look at the next Lambda function (e.g. in a state machine), which will receive this output as its input. This time we are looking at the input before the handler() is invoked: middy-store receives the input to the handler and searches for a reference to a stored payload. If it finds one, the payload is loaded from the Store and returned as the input to the handler. The handler uses the payload as if it was passed directly to it.

Here's an example to illustrate how middy-store works:

/* ./src/functions/handler1.ts */
export const handler1 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Return 1MB of random data as a base64 encoded string as output 
    return randomBytes(1024 * 1024).toString('base64');
  });

/* ./src/functions/handler2.ts */
export const handler2 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Print the size of the input
    return console.log(`Size: ${Buffer.from(input, "base64").byteLength / 1024 / 1024} MB`);
  });

/* ./src/workflow.ts */
// First Lambda returns a large output
// It automatically uploads the data to S3 
const output1 = await handler1({});

// Output is a reference to the S3 object: { "@middy-store": "s3://bucket/key"}
console.log(output1); 

// Second Lambda receives the output as input
// It automatically downloads the data from S3
const output2 = await handler2(output1);

What's a Store?

In general, a Store is any service that allows you to store and load arbitrary payloads, like Amazon S3 or other persistent storage systems. Databases like DynamoDB can also act as a Store. The Store receives a payload from the Lambda handler, serializes it (if it's an object), and stores it in persistent storage. When the next Lambda handler needs the payload, the Store loads the payload from the storage, deserializes and returns it.

middy-store interacts with a Store through a StoreInterface interface, which every Store has to implement. The interface defines the functions canStore() and store() to store payloads, and canLoad() and load() to load payloads.

interface StoreInterface {
  name: string;
  canLoad: (args: LoadArgs) => boolean;
  load: (args: LoadArgs) => Promise;
  canStore: (args: StoreArgs) => boolean;
  store: (args: StoreArgs) => Promise;
}
  • canStore() serves as a guard to check if the Store can store a given payload. It receives the payload and its byte size and checks if the payload fits within the maximum size limits of the Store. For example, a Store backed by DynamoDB has a maximum item size of 400KB, while an S3 store has effectively no limit on the payload size it can store.

  • store() receives a payload and stores it in its underlying storage system. It returns a reference to the payload, which is a unique identifier to identify the stored payload within the underlying service. For example, the Amazon S3 Store uses an S3 URI in the format s3:/// as a reference, while other Amazon services might use ARNs.

  • canLoad() acts like a filter to check if the Store can load a certain reference. It receives the reference to a stored payload and checks if it's a valid identifier for the underlying storage system. For example, the Amazon S3 Store checks if the reference is a valid S3 URI, while a DynamoDB Store would check if it's a valid ARN.

  • load() receives the reference to a stored payload and loads the payload from storage. Depending on the Store, the payload will be deserialized into its original type according to the metadata that was stored alongside it. For example, a payload of type application/json will get parsed back into a JSON object, while a plain string of type text/plain will remain unaltered.

Single and Multiple Stores

Most of the time, you will only need one Store, like Amazon S3, which can effectively store any payload. However, middy-store lets you work with multiple Stores at the same time. This can be useful if you want to store different types of payloads in different Stores. For example, you might want to store large payloads in S3 and small payloads in DynamoDB.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

middy-store accepts an Array in the options to provide one or more Stores. When middy-store runs before the handler and finds a reference in the input, it will iterate over the Stores and call canLoad() with the reference for each Store. The first Store that returns true will be used to load the payload with load().

On the other hand, when middy-store runs after the handler and the output is larger than the maximum allowed size, it will iterate over the Stores and call canStore() for each Store. The first Store that returns true will be used to store the payload with store().

Therefore, it is important to note that the order of the Stores in the array is important.

References

When a payload is stored in a Store, middy-store will return a reference to the stored payload. The reference is a unique identifier to find the stored payload in the Store. The value of the identifier depends on the Store and its configuration. For example, the Amazon S3 Store will use an S3 URI by default. However, it can also be configured to return other formats like an ARN arn:aws:s3:::/, an HTTP endpoint https://.s3.us-west-1.amazonaws.com/, or a structured object with the bucket and key.

The output from the handler after middy-store will contain the reference to the stored payload:

/* Output with reference */
{
  "@middy-store": "s3://bucket/key"
}

middy-store embeds the reference from the Store in the output as an object with a key "@middy-store". This allows middy-store to quickly find all references when the next Lambda function is called and load the payloads from the Store before the handler runs. In case you are wondering, middy-store recursively iterates through the input object and searches for the "@middy-store" key. That means the input can contain multiple references, even from different Stores, and middy-store will find and load them.

Selecting a Payload

By default, middy-store will store the entire output of the handler as a payload in the Store. However, you can also select only a part of the output to be stored. This is useful for workflows like AWS Step Functions, where you might need some of the data for control flow, e.g., a Choice state.

middy-store accepts a selector in its storingOptions config. The selector is a string path to the relevant value in the output that should be stored.

Here's an example:

const output = {
  a: {
    b: ['foo', 'bar', 'baz'],
  },
};

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        selector: '',          /* select the entire output as payload */
        // selector: 'a';      /* selects the payload at the path 'a' */
        // selector: 'a.b';    /* selects the payload at the path 'a.b' */
        // selector: 'a.b[0]'; /* selects the payload at the path 'a.b[0]' */
        // selector: 'a.b[*]'; /* selects the payloads at the paths 'a.b[0]', 'a.b[1]', 'a.b[2]', etc. */
      }
    })
  )
  .handler(async () => output);

await handler({});

The default selector is an empty string (or undefined), which selects the entire output as a payload. In this case, middy-store will return an object with only one property, which is the reference to the stored payload.

/* selector: '' */
{
  "@middy-store": "s3://bucket/key"
}

The selectors a, a.b, or a.b[0] select the value at the path and store only this part in the Store. The reference to the stored payload will be inserted at the path in the output, thereby replacing the original value.

/* selector: 'a' */
{
  a: {
    "@middy-store": "s3://bucket/key"
  }
}
/* selector: 'a.b' */
{
  a: {
    b: {
      "@middy-store": "s3://bucket/key"
    }
  }
}
/* selector: 'a.b[0]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      'bar', 
      'baz'
    ]
  }
}

A selector ending with [*] like a.b[*] acts like an iterator. It will select the array at a.b and store each element in the array in the Store separately. Each element will be replaced with the reference to the stored payload.

/* selector: 'a.b[*]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }
    ]
  }
}

Size Limit

middy-store will calculate the size of the entire output returned from the handler. The size is calculated by stringifying the output, if it's not already a string, and calculating the UTF-8 encoded size of the string in bytes. It will then compare this size to the configured minSize in the storingOptions config. If the output size is equal to or greater than the minSize, it will store the output or a part of it in the Store.

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        minSize: Sizes.STEP_FUNCTIONS,  /* 256KB */
        // minSize: Sizes.LAMBDA_SYNC,  /* 6MB */
        // minSize: Sizes.LAMBDA_ASYNC, /* 256KB */
        // minSize: 1024 * 1024,        /* 1MB */
        // minSize: Sizes.ZERO,         /* 0 */
        // minSize: Sizes.INFINITY,     /* Infinity */
        // minSize: Sizes.kb(512),      /* 512KB */
        // minSize: Sizes.mb(1),        /* 1MB */
      }
    })
  )
  .handler(async () => output);

await handler({});

middy-store provides a Sizes helper with some predefined limits for Lambda and Step Functions. If minSize is not specified, it will use Sizes.STEP_FUNCTIONS with 256KB as the default minimum size. The Sizes.ZERO (equal to the number 0) means that middy-store will always store the payload in a Store, ignoring the actual output size. On the other hand, Sizes.INFINITY (equal to Math.POSITIVE_INFINITY) means that it will never store the payload in a Store.

Stores

Currently, there is only one Store implementation for Amazon S3, but I'm planning to implement a Store backed by DynamoDB and DAX. DynamoDB, with its Time-To-Live (TTL) feature, provides a great option for short-term payloads that only need to exist during the execution of a workflow like Step Functions.

Amazon S3

The middy-store-s3 package provides a store implementation for Amazon S3. It uses the official @aws-sdk/client-s3 package to interact with S3.

import { middyStore } from 'middy-store';
import { S3Store } from 'middy-store-s3';

const handler = middy()
  .use(
    middyStore({
      stores: [
        new S3Store({
          config: { region: "us-east-1" },
          bucket: "bucket",
          key: () => randomUUID(),
          format: "arn",
        }),
      ],
    }),
  )
  .handler(async (input) => {
    return { /* ... */ };
  });

The S3Store only requires a bucket where the payloads are being stored. The key is optional and defaults to randomUUID(). The format configures the style of the reference that is returned after a payload is stored. The supported formats include arn, object, or one of the URL formats from the amazon-s3-url package. It's important to note that S3Store can load any of these formats; the format config only concerns the returned reference. The config is the S3 client configuration and is optional. If not set, the S3 client will resolve the config (credentials, region, etc.) from the environment or file system.

Custom Store

A new Store can be implemented as a class or a plain object, as long as it provides the required functions from the StoreInterface interface.

Here's an example of a Store to store and load payloads as base64 encoded data URLs:

import { StoreInterface, middyStore } from 'middy-store';

const base64Store: StoreInterface = {
  name: "base64",
  /* Reference must be a string starting with "data:text/plain;base64," */
  canLoad: ({ reference }) => {
    return (
      typeof reference === "string" &&
      reference.startsWith("data:text/plain;base64,")
    );
  },
  /* Decode base64 string and parse into object */
  load: async ({ reference }) => {
    const base64 = reference.replace("data:text/plain;base64,", "");
    return Buffer.from(base64, "base64").toString();
  },
  /* Payload must be a string or an object */
  canStore: ({ payload }) => {
    return typeof payload === "string" || typeof payload === "object";
  },
  /* Stringify object and encode as base64 string */
  store: async ({ payload }) => {
    const base64 = Buffer.from(JSON.stringify(payload)).toString("base64");
    return `data:text/plain;base64,${base64}`;
  },
};

const handler = middy()
  .use(
    middyStore({
      stores: [base64Store],
      storingOptions: {
        minSize: Sizes.ZERO, /* Always store the data */ 
      }
    }),
  )
  .handler(async (input) => {
    /* Random text with 100 words */ 
    return `Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.`;
  });

const output = await handler(null, context);

/* Prints: { '@middy-store': 'data:text/plain;base64,IkxvcmVtIGlwc3VtIGRvbG9yIHNpdC...' } */
console.log(output);

This example is the perfect way to try middy-store, because it doesn't rely on external resources like an S3 bucket. You will find it in the repository at examples/custom-store and should be able to run it locally.

Contributions and Feedback

I've been tinkering with the API design for a while, and it's definitely not stable yet. I would love to get feedback on the current state as well as suggestions for changes or improvements. If you are eager to contribute to this project, please go ahead and submit feature requests or pull requests.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 zirkelc / middy-store

Middleware for Step Functions: Automatically Store and Load Payloads

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

Middleware middy-store

middy-store is a middleware for Lambda that automatically stores and loads payloads from and to a Store like Amazon S3 or potentially other services.

Installation

You will need @middy/core >= v5 to use middy-store Please be aware that the API is not stable yet and might change in the future. To avoid accidental breaking changes, please pin the version of middy-store and its sub-packages in your package.json to an exact version.

npm install --save-exact @middy/core middy-store middy-store-s3 
Enter fullscreen modeExit fullscreen mode

Motivation

AWS services have certain limits that one must be aware of. For example, AWS Lambda has a payload limit of 6MB for synchronous invocations and 256KB for asynchronous invocations. AWS Step Functions allows for a maximum input or output size of 256KB of data as a UTF-8 encoded string. If you exceed this limit when returning data, you will encounter the infamous States.DataLimitExceeded exception.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

The usual workaround for this…

View on GitHub
版本聲明 本文轉載於:https://dev.to/aws-builders/middleware-for-step-functions-automatically-store-and-load-payloads-from-amazon-s3-18g4?1如有侵犯,請洽study_golang @163.com刪除
最新教學 更多>
  • 哪個 PHP 函式庫提供卓越的 SQL 注入防護:PDO 還是 mysql_real_escape_string?
    哪個 PHP 函式庫提供卓越的 SQL 注入防護:PDO 還是 mysql_real_escape_string?
    PDO vs. mysql_real_escape_string:綜合指南查詢轉義對於防止 SQL 注入至關重要。雖然 mysql_real_escape_string 提供了轉義查詢的基本方法,但 PDO 成為了一種具有眾多優點的卓越解決方案。 什麼是 PDO? PHP 資料物件 (PDO) 是一...
    程式設計 發佈於2024-11-06
  • React 入門:初學者的路線圖
    React 入門:初學者的路線圖
    大家好! ? 我剛開始學習 React.js 的旅程。這是一次令人興奮(有時甚至具有挑戰性!)的冒險,我想分享一下幫助我開始的步驟,以防您也開始研究 React。這是我的處理方法: 1.掌握 JavaScript 基礎 在開始使用 React 之前,我確保溫習一下我的 JavaScript 技能,...
    程式設計 發佈於2024-11-06
  • 如何引用 JavaScript 物件中的內部值?
    如何引用 JavaScript 物件中的內部值?
    如何在JavaScript 物件中引用內部值在JavaScript 中,存取引用同一物件中其他值的物件中的值有時可能具有挑戰性。考慮以下程式碼片段:var obj = { key1: "it ", key2: key1 " works!" }; a...
    程式設計 發佈於2024-11-06
  • Python 列表方法快速指南及範例
    Python 列表方法快速指南及範例
    介紹 Python 清單用途廣泛,並附帶各種內建方法,有助於有效地操作和處理資料。以下是所有主要清單方法的快速參考以及簡短的範例。 1. 追加(項目) 將項目新增至清單末端。 lst = [1, 2, 3] lst.append(4) # [1, 2, 3, ...
    程式設計 發佈於2024-11-06
  • C++ 中何時需要使用者定義的複製建構函式?
    C++ 中何時需要使用者定義的複製建構函式?
    何時需要使用者定義的複製建構子? 複製建構子是 C 物件導向程式設計的組成部分,提供了一種基於現有實例初始化物件的方法。雖然編譯器通常會為類別產生預設的複製建構函數,但在某些情況下需要進行自訂。 需要使用者定義複製建構子的情況當預設複製建構子不夠時,程式設計師會選擇使用者定義的複製建構子來實作自訂複...
    程式設計 發佈於2024-11-06
  • 試...捕捉 V/s 安全分配 (?=):現代發展的福音還是詛咒?
    試...捕捉 V/s 安全分配 (?=):現代發展的福音還是詛咒?
    最近,我發現了 JavaScript 中引入的新安全賦值運算子 (?.=),我對它的簡單性著迷。 ? 安全賦值運算子 (SAO) 是傳統 try...catch 區塊的簡寫替代方案。它允許您內聯捕獲錯誤,而無需為每個操作編寫明確的錯誤處理程式碼。這是一個例子: const [error, resp...
    程式設計 發佈於2024-11-06
  • 如何在Python中優化固定寬度檔案解析?
    如何在Python中優化固定寬度檔案解析?
    優化固定寬度文件解析為了有效地解析固定寬度文件,可以考慮利用Python的struct模組。此方法利用 C 來提高速度,如下例所示:import struct fieldwidths = (2, -10, 24) fmtstring = ' '.join('{}{}'.format(abs(fw),...
    程式設計 發佈於2024-11-06
  • 蠅量級
    蠅量級
    結構模式之一旨在透過與相似物件共享盡可能多的資料來減少記憶體使用。 在處理大量相似物件時特別有用,為每個物件建立一個新實例在記憶體消耗方面會非常昂貴。 關鍵概念: 內在狀態:多個物件之間共享的狀態獨立於上下文,並且在不同物件之間保持相同。 外部狀態:每個物件唯一的、從客戶端傳遞的狀態。此狀態可...
    程式設計 發佈於2024-11-06
  • 解鎖您的 MySQL 掌握:MySQL 實作實驗室課程
    解鎖您的 MySQL 掌握:MySQL 實作實驗室課程
    透過全面的 MySQL 實作實驗室課程提升您的 MySQL 技能並成為資料庫專家。這種實踐學習體驗旨在引導您完成一系列實踐練習,使您能夠克服複雜的 SQL 挑戰並優化資料庫效能。 深入了解 MySQL 無論您是想要建立強大 MySQL 基礎的初學者,還是想要提升專業知識的經驗豐富的...
    程式設計 發佈於2024-11-06
  • 資料夾
    資料夾
    ? ?大家好,我是尼克? ? 利用專家工程解決方案提升您的專案 探索我的產品組合,了解我如何將尖端技術、強大的問題解決能力和創新熱情結合起來,建立可擴展的高效能應用程式。無論您是尋求增強開發流程還是解決複雜的技術挑戰,我都可以幫助您實現願景。看看我的工作,讓我們合作做一些非凡的事情! 在這裡聯絡...
    程式設計 發佈於2024-11-06
  • 透過 Gmail 發送電子郵件時如何修復「SMTP Connect() 失敗」錯誤?
    透過 Gmail 發送電子郵件時如何修復「SMTP Connect() 失敗」錯誤?
    SMTP 連線失敗:解決「SMTP Connect() 失敗」錯誤嘗試使用Gmail 發送電子郵件時,您可能會遇到錯誤訊息指出「SMTP -> 錯誤:無法連線到伺服器:連線逾時(110)\nSMTP Connect()失敗。 要解決此問題,您需要修改負責發送電子郵件的 PHP 程式碼。具體來說,刪除...
    程式設計 發佈於2024-11-06
  • 如何使用 Pillow 在 Python 中水平連接多個映像?
    如何使用 Pillow 在 Python 中水平連接多個映像?
    以Python水平連接影像水平組合多個影像是影像處理中的常見任務。 Python 提供了強大的工具來使用 Pillow 函式庫來實現此目的。 問題描述考慮三個尺寸為 148 x 95 的方形 JPEG 影像。目標是水平連接這些影像影像,同時避免結果輸出中出現任何部分影像。 建議的解決方案以下程式碼片...
    程式設計 發佈於2024-11-06
  • REST API 設計與命名約定指南
    REST API 設計與命名約定指南
    有效地設計RESTful API對於創建可擴展、可維護且易於使用的系統至關重要。雖然存在某些標準,但許多標準並不是嚴格的規則,而是指導 API 設計的最佳實踐。一種廣泛使用的 API 架構模式是 MVC(模型-視圖-控制器),但它本身並不能解決 API 設計的更精細方面,例如命名和結構。在本文中,我...
    程式設計 發佈於2024-11-06
  • Java 中具有給定總和的子數組的不同方法
    Java 中具有給定總和的子數組的不同方法
    尋找具有給定總和的子數組是編碼面試和競爭性程式設計中經常出現的常見問題。這個問題可以使用各種技術來解決,每種技術在時間複雜度和空間複雜度方面都有自己的權衡。在本文中,我們將探索多種方法來解決在 Java 中尋找具有給定總和的子數組的問題。 問題陳述 給定一個整數數組和一個目標和,在數組中找到一個連...
    程式設計 發佈於2024-11-06

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3