”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > Step Functions 中间件:自动存储和加载来自 Amazon S3 的有效负载

Step Functions 中间件:自动存储和加载来自 Amazon S3 的有效负载

发布于2024-08-23
浏览:234

I would like to introduce middy-store, a new library I built over the last couple of months. I've been pondering this idea for a while, going back to this feature request I opened more than a year ago. middy-store is a middleware for Middy that automatically stores and loads payloads from and to a Store like Amazon S3 or potentially other services.

Motivation

AWS services have certain limits that one must be aware of. For example, AWS Lambda has a payload limit of 6MB for synchronous invocations and 256KB for asynchronous invocations. AWS Step Functions allows for a maximum input or output size of 256KB of data as a UTF-8 encoded string. If you exceed this limit when returning data, you will encounter the infamous States.DataLimitExceeded exception.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

The usual workaround for this limitation is to check the size of your payload and save it temporarily in persistent storage such as Amazon S3. Then, you return the object URL or ARN for S3. The next Lambda checks if there is a URL or ARN in the input and loads the payload from S3. As one can imagine, this results in a lot of boilerplate code to store and load the payload from and to Amazon S3, which has to be repeated in every Lambda.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

This becomes even more cumbersome when you only want to save part of the payload to S3 and leave the rest as is. For example, when working with Step Functions, the payload could contain control flow data for states like Choice or Map, which has to be accessed directly. This means the first Lambda saves a partial payload to S3, and the next Lambda has to load the partial payload from S3 and merge it with the rest of the payload. This requires ensuring that the types are consistent across multiple functions, which is, of course, very error-prone.

How it works

middy-store is a middleware for Middy. It's attached to a Lambda function and is called twice during a Lambda invocation: before and after the Lambda handler() runs. It receives the input before the handler runs and receives the output from the handler after it has finished.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

Let's start at the end with the output after a successful invocation to make it easier to follow: middy-store receives the output (the payload) from the handler() function and checks the size. To calculate the size, it stringifies the payload, if it is an object, and uses Buffer.byteLength() to calculate the UTF-8 encoded string size. If the size is larger than a certain configurable threshold, the payload is stored in a Store like Amazon S3. The reference to the stored payload (e.g., an S3 URL or ARN) is then returned as the output instead of the original output.

Now let's look at the next Lambda function (e.g. in a state machine), which will receive this output as its input. This time we are looking at the input before the handler() is invoked: middy-store receives the input to the handler and searches for a reference to a stored payload. If it finds one, the payload is loaded from the Store and returned as the input to the handler. The handler uses the payload as if it was passed directly to it.

Here's an example to illustrate how middy-store works:

/* ./src/functions/handler1.ts */
export const handler1 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Return 1MB of random data as a base64 encoded string as output 
    return randomBytes(1024 * 1024).toString('base64');
  });

/* ./src/functions/handler2.ts */
export const handler2 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Print the size of the input
    return console.log(`Size: ${Buffer.from(input, "base64").byteLength / 1024 / 1024} MB`);
  });

/* ./src/workflow.ts */
// First Lambda returns a large output
// It automatically uploads the data to S3 
const output1 = await handler1({});

// Output is a reference to the S3 object: { "@middy-store": "s3://bucket/key"}
console.log(output1); 

// Second Lambda receives the output as input
// It automatically downloads the data from S3
const output2 = await handler2(output1);

What's a Store?

In general, a Store is any service that allows you to store and load arbitrary payloads, like Amazon S3 or other persistent storage systems. Databases like DynamoDB can also act as a Store. The Store receives a payload from the Lambda handler, serializes it (if it's an object), and stores it in persistent storage. When the next Lambda handler needs the payload, the Store loads the payload from the storage, deserializes and returns it.

middy-store interacts with a Store through a StoreInterface interface, which every Store has to implement. The interface defines the functions canStore() and store() to store payloads, and canLoad() and load() to load payloads.

interface StoreInterface {
  name: string;
  canLoad: (args: LoadArgs) => boolean;
  load: (args: LoadArgs) => Promise;
  canStore: (args: StoreArgs) => boolean;
  store: (args: StoreArgs) => Promise;
}
  • canStore() serves as a guard to check if the Store can store a given payload. It receives the payload and its byte size and checks if the payload fits within the maximum size limits of the Store. For example, a Store backed by DynamoDB has a maximum item size of 400KB, while an S3 store has effectively no limit on the payload size it can store.

  • store() receives a payload and stores it in its underlying storage system. It returns a reference to the payload, which is a unique identifier to identify the stored payload within the underlying service. For example, the Amazon S3 Store uses an S3 URI in the format s3:/// as a reference, while other Amazon services might use ARNs.

  • canLoad() acts like a filter to check if the Store can load a certain reference. It receives the reference to a stored payload and checks if it's a valid identifier for the underlying storage system. For example, the Amazon S3 Store checks if the reference is a valid S3 URI, while a DynamoDB Store would check if it's a valid ARN.

  • load() receives the reference to a stored payload and loads the payload from storage. Depending on the Store, the payload will be deserialized into its original type according to the metadata that was stored alongside it. For example, a payload of type application/json will get parsed back into a JSON object, while a plain string of type text/plain will remain unaltered.

Single and Multiple Stores

Most of the time, you will only need one Store, like Amazon S3, which can effectively store any payload. However, middy-store lets you work with multiple Stores at the same time. This can be useful if you want to store different types of payloads in different Stores. For example, you might want to store large payloads in S3 and small payloads in DynamoDB.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

middy-store accepts an Array in the options to provide one or more Stores. When middy-store runs before the handler and finds a reference in the input, it will iterate over the Stores and call canLoad() with the reference for each Store. The first Store that returns true will be used to load the payload with load().

On the other hand, when middy-store runs after the handler and the output is larger than the maximum allowed size, it will iterate over the Stores and call canStore() for each Store. The first Store that returns true will be used to store the payload with store().

Therefore, it is important to note that the order of the Stores in the array is important.

References

When a payload is stored in a Store, middy-store will return a reference to the stored payload. The reference is a unique identifier to find the stored payload in the Store. The value of the identifier depends on the Store and its configuration. For example, the Amazon S3 Store will use an S3 URI by default. However, it can also be configured to return other formats like an ARN arn:aws:s3:::/, an HTTP endpoint https://.s3.us-west-1.amazonaws.com/, or a structured object with the bucket and key.

The output from the handler after middy-store will contain the reference to the stored payload:

/* Output with reference */
{
  "@middy-store": "s3://bucket/key"
}

middy-store embeds the reference from the Store in the output as an object with a key "@middy-store". This allows middy-store to quickly find all references when the next Lambda function is called and load the payloads from the Store before the handler runs. In case you are wondering, middy-store recursively iterates through the input object and searches for the "@middy-store" key. That means the input can contain multiple references, even from different Stores, and middy-store will find and load them.

Selecting a Payload

By default, middy-store will store the entire output of the handler as a payload in the Store. However, you can also select only a part of the output to be stored. This is useful for workflows like AWS Step Functions, where you might need some of the data for control flow, e.g., a Choice state.

middy-store accepts a selector in its storingOptions config. The selector is a string path to the relevant value in the output that should be stored.

Here's an example:

const output = {
  a: {
    b: ['foo', 'bar', 'baz'],
  },
};

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        selector: '',          /* select the entire output as payload */
        // selector: 'a';      /* selects the payload at the path 'a' */
        // selector: 'a.b';    /* selects the payload at the path 'a.b' */
        // selector: 'a.b[0]'; /* selects the payload at the path 'a.b[0]' */
        // selector: 'a.b[*]'; /* selects the payloads at the paths 'a.b[0]', 'a.b[1]', 'a.b[2]', etc. */
      }
    })
  )
  .handler(async () => output);

await handler({});

The default selector is an empty string (or undefined), which selects the entire output as a payload. In this case, middy-store will return an object with only one property, which is the reference to the stored payload.

/* selector: '' */
{
  "@middy-store": "s3://bucket/key"
}

The selectors a, a.b, or a.b[0] select the value at the path and store only this part in the Store. The reference to the stored payload will be inserted at the path in the output, thereby replacing the original value.

/* selector: 'a' */
{
  a: {
    "@middy-store": "s3://bucket/key"
  }
}
/* selector: 'a.b' */
{
  a: {
    b: {
      "@middy-store": "s3://bucket/key"
    }
  }
}
/* selector: 'a.b[0]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      'bar', 
      'baz'
    ]
  }
}

A selector ending with [*] like a.b[*] acts like an iterator. It will select the array at a.b and store each element in the array in the Store separately. Each element will be replaced with the reference to the stored payload.

/* selector: 'a.b[*]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }
    ]
  }
}

Size Limit

middy-store will calculate the size of the entire output returned from the handler. The size is calculated by stringifying the output, if it's not already a string, and calculating the UTF-8 encoded size of the string in bytes. It will then compare this size to the configured minSize in the storingOptions config. If the output size is equal to or greater than the minSize, it will store the output or a part of it in the Store.

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        minSize: Sizes.STEP_FUNCTIONS,  /* 256KB */
        // minSize: Sizes.LAMBDA_SYNC,  /* 6MB */
        // minSize: Sizes.LAMBDA_ASYNC, /* 256KB */
        // minSize: 1024 * 1024,        /* 1MB */
        // minSize: Sizes.ZERO,         /* 0 */
        // minSize: Sizes.INFINITY,     /* Infinity */
        // minSize: Sizes.kb(512),      /* 512KB */
        // minSize: Sizes.mb(1),        /* 1MB */
      }
    })
  )
  .handler(async () => output);

await handler({});

middy-store provides a Sizes helper with some predefined limits for Lambda and Step Functions. If minSize is not specified, it will use Sizes.STEP_FUNCTIONS with 256KB as the default minimum size. The Sizes.ZERO (equal to the number 0) means that middy-store will always store the payload in a Store, ignoring the actual output size. On the other hand, Sizes.INFINITY (equal to Math.POSITIVE_INFINITY) means that it will never store the payload in a Store.

Stores

Currently, there is only one Store implementation for Amazon S3, but I'm planning to implement a Store backed by DynamoDB and DAX. DynamoDB, with its Time-To-Live (TTL) feature, provides a great option for short-term payloads that only need to exist during the execution of a workflow like Step Functions.

Amazon S3

The middy-store-s3 package provides a store implementation for Amazon S3. It uses the official @aws-sdk/client-s3 package to interact with S3.

import { middyStore } from 'middy-store';
import { S3Store } from 'middy-store-s3';

const handler = middy()
  .use(
    middyStore({
      stores: [
        new S3Store({
          config: { region: "us-east-1" },
          bucket: "bucket",
          key: () => randomUUID(),
          format: "arn",
        }),
      ],
    }),
  )
  .handler(async (input) => {
    return { /* ... */ };
  });

The S3Store only requires a bucket where the payloads are being stored. The key is optional and defaults to randomUUID(). The format configures the style of the reference that is returned after a payload is stored. The supported formats include arn, object, or one of the URL formats from the amazon-s3-url package. It's important to note that S3Store can load any of these formats; the format config only concerns the returned reference. The config is the S3 client configuration and is optional. If not set, the S3 client will resolve the config (credentials, region, etc.) from the environment or file system.

Custom Store

A new Store can be implemented as a class or a plain object, as long as it provides the required functions from the StoreInterface interface.

Here's an example of a Store to store and load payloads as base64 encoded data URLs:

import { StoreInterface, middyStore } from 'middy-store';

const base64Store: StoreInterface = {
  name: "base64",
  /* Reference must be a string starting with "data:text/plain;base64," */
  canLoad: ({ reference }) => {
    return (
      typeof reference === "string" &&
      reference.startsWith("data:text/plain;base64,")
    );
  },
  /* Decode base64 string and parse into object */
  load: async ({ reference }) => {
    const base64 = reference.replace("data:text/plain;base64,", "");
    return Buffer.from(base64, "base64").toString();
  },
  /* Payload must be a string or an object */
  canStore: ({ payload }) => {
    return typeof payload === "string" || typeof payload === "object";
  },
  /* Stringify object and encode as base64 string */
  store: async ({ payload }) => {
    const base64 = Buffer.from(JSON.stringify(payload)).toString("base64");
    return `data:text/plain;base64,${base64}`;
  },
};

const handler = middy()
  .use(
    middyStore({
      stores: [base64Store],
      storingOptions: {
        minSize: Sizes.ZERO, /* Always store the data */ 
      }
    }),
  )
  .handler(async (input) => {
    /* Random text with 100 words */ 
    return `Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.`;
  });

const output = await handler(null, context);

/* Prints: { '@middy-store': 'data:text/plain;base64,IkxvcmVtIGlwc3VtIGRvbG9yIHNpdC...' } */
console.log(output);

This example is the perfect way to try middy-store, because it doesn't rely on external resources like an S3 bucket. You will find it in the repository at examples/custom-store and should be able to run it locally.

Contributions and Feedback

I've been tinkering with the API design for a while, and it's definitely not stable yet. I would love to get feedback on the current state as well as suggestions for changes or improvements. If you are eager to contribute to this project, please go ahead and submit feature requests or pull requests.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 zirkelc / middy-store

Middleware for Step Functions: Automatically Store and Load Payloads

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

Middleware middy-store

middy-store is a middleware for Lambda that automatically stores and loads payloads from and to a Store like Amazon S3 or potentially other services.

Installation

You will need @middy/core >= v5 to use middy-store Please be aware that the API is not stable yet and might change in the future. To avoid accidental breaking changes, please pin the version of middy-store and its sub-packages in your package.json to an exact version.

npm install --save-exact @middy/core middy-store middy-store-s3 
Enter fullscreen modeExit fullscreen mode

Motivation

AWS services have certain limits that one must be aware of. For example, AWS Lambda has a payload limit of 6MB for synchronous invocations and 256KB for asynchronous invocations. AWS Step Functions allows for a maximum input or output size of 256KB of data as a UTF-8 encoded string. If you exceed this limit when returning data, you will encounter the infamous States.DataLimitExceeded exception.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

The usual workaround for this…

View on GitHub
版本声明 本文转载于:https://dev.to/aws-builders/middleware-for-step-functions-automatically-store-and-load-payloads-from-amazon-s3-18g4?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • MySQL 中的数据库分片:综合指南
    MySQL 中的数据库分片:综合指南
    随着数据库变得越来越大、越来越复杂,有效地控制性能和扩展就出现了。数据库分片是用于克服这些障碍的一种方法。称为“分片”的数据库分区将大型数据库划分为更小、更易于管理的段(称为“分片”)。通过将每个分片分布在多个服务器上(每个服务器保存总数据的一小部分),可以提高可扩展性和吞吐量。 在本文中,我们将探...
    编程 发布于2024-11-06
  • 如何将 Python 日期时间对象转换为秒?
    如何将 Python 日期时间对象转换为秒?
    在 Python 中将日期时间对象转换为秒在 Python 中使用日期时间对象时,通常需要将它们转换为秒以适应各种情况分析目的。但是,toordinal() 方法可能无法提供所需的输出,因为它仅区分具有不同日期的日期。要准确地将日期时间对象转换为秒,特别是对于 1970 年 1 月 1 日的特定日期...
    编程 发布于2024-11-06
  • 如何使用 Laravel Eloquent 的 firstOrNew() 方法有效优化 CRUD 操作?
    如何使用 Laravel Eloquent 的 firstOrNew() 方法有效优化 CRUD 操作?
    使用 Laravel Eloquent 优化 CRUD 操作在 Laravel 中使用数据库时,插入或更新记录是很常见的。为了实现这一点,开发人员经常求助于条件语句,在决定执行插入或更新之前检查记录是否存在。firstOrNew() 方法幸运的是, Eloquent 通过firstOrNew() 方...
    编程 发布于2024-11-06
  • 为什么在 PHP 中重写方法参数违反了严格的标准?
    为什么在 PHP 中重写方法参数违反了严格的标准?
    在 PHP 中重写方法参数:违反严格标准在面向对象编程中,里氏替换原则 (LSP) 规定:子类型的对象可以替换其父对象,而不改变程序的行为。然而,在 PHP 中,用不同的参数签名覆盖方法被认为是违反严格标准的。为什么这是违规?PHP 是弱类型语言,这意味着编译器无法在编译时确定变量的确切类型。这意味...
    编程 发布于2024-11-06
  • 哪个 PHP 库提供卓越的 SQL 注入防护:PDO 还是 mysql_real_escape_string?
    哪个 PHP 库提供卓越的 SQL 注入防护:PDO 还是 mysql_real_escape_string?
    PDO vs. mysql_real_escape_string:综合指南查询转义对于防止 SQL 注入至关重要。虽然 mysql_real_escape_string 提供了转义查询的基本方法,但 PDO 成为了一种具有众多优点的卓越解决方案。什么是 PDO?PHP 数据对象 (PDO) 是一个数...
    编程 发布于2024-11-06
  • React 入门:初学者的路线图
    React 入门:初学者的路线图
    大家好! ? 我刚刚开始学习 React.js 的旅程。这是一次令人兴奋(有时甚至具有挑战性!)的冒险,我想分享一下帮助我开始的步骤,以防您也开始研究 React。这是我的处理方法: 1.掌握 JavaScript 基础知识 在开始使用 React 之前,我确保温习一下我的 JavaScript 技...
    编程 发布于2024-11-06
  • 如何引用 JavaScript 对象中的内部值?
    如何引用 JavaScript 对象中的内部值?
    如何在 JavaScript 对象中引用内部值在 JavaScript 中,访问引用同一对象中其他值的对象中的值有时可能具有挑战性。考虑以下代码片段:var obj = { key1: "it ", key2: key1 " works!" }; ...
    编程 发布于2024-11-06
  • Python 列表方法快速指南及示例
    Python 列表方法快速指南及示例
    介绍 Python 列表用途广泛,并附带各种内置方法,有助于有效地操作和处理数据。下面是所有主要列表方法的快速参考以及简短的示例。 1. 追加(项目) 将项目添加到列表末尾。 lst = [1, 2, 3] lst.append(4) # [1, 2, 3, 4]...
    编程 发布于2024-11-06
  • C++ 中何时需要用户定义的复制构造函数?
    C++ 中何时需要用户定义的复制构造函数?
    何时需要用户定义的复制构造函数?复制构造函数是 C 面向对象编程的组成部分,提供了一种基于现有实例初始化对象的方法。虽然编译器通常会为类生成默认的复制构造函数,但在某些情况下需要进行自定义。需要用户定义复制构造函数的情况当默认复制构造函数不够时,程序员会选择用户定义的复制构造函数来实现自定义复制行为...
    编程 发布于2024-11-06
  • 尝试...捕获 V/s 安全分配 (?=):现代发展的福音还是诅咒?
    尝试...捕获 V/s 安全分配 (?=):现代发展的福音还是诅咒?
    最近,我发现了 JavaScript 中引入的新安全赋值运算符 (?.=),我对它的简单性着迷。 ? 安全赋值运算符 (SAO) 是传统 try...catch 块的简写替代方案。它允许您内联捕获错误,而无需为每个操作编写显式的错误处理代码。这是一个例子: const [error, respons...
    编程 发布于2024-11-06
  • 如何在Python中优化固定宽度文件解析?
    如何在Python中优化固定宽度文件解析?
    优化固定宽度文件解析为了有效地解析固定宽度文件,可以考虑利用Python的struct模块。此方法利用 C 来提高速度,如以下示例所示:import struct fieldwidths = (2, -10, 24) fmtstring = ' '.join('{}{}'.format(abs(fw...
    编程 发布于2024-11-06
  • 蝇量级
    蝇量级
    结构模式之一旨在通过与相似对象共享尽可能多的数据来减少内存使用。 在处理大量相似对象时特别有用,为每个对象创建一个新实例在内存消耗方面会非常昂贵。 关键概念: 内在状态:多个对象之间共享的状态独立于上下文,并且在不同对象之间保持相同。 外部状态:每个对象唯一的、从客户端传递的状态。此状态可能会有所不...
    编程 发布于2024-11-06
  • 解锁您的 MySQL 掌握:MySQL 实践实验室课程
    解锁您的 MySQL 掌握:MySQL 实践实验室课程
    通过全面的 MySQL 实践实验室课程提高您的 MySQL 技能并成为数据库专家。这种实践学习体验旨在指导您完成一系列实践练习,使您能够克服复杂的 SQL 挑战并优化数据库性能。 深入了解 MySQL 无论您是想要建立强大 MySQL 基础的初学者,还是想要提升专业知识的经验丰富的开...
    编程 发布于2024-11-06
  • 文件夹
    文件夹
    ? ?大家好,我是尼克?? 利用专家工程解决方案提升您的项目 探索我的产品组合,了解我如何将尖端技术、强大的问题解决能力和创新热情结合起来,构建可扩展的高性能应用程序。无论您是寻求增强开发流程还是解决复杂的技术挑战,我都可以帮助您实现愿景。看看我的工作,让我们合作做一些非凡的事情! 在这里联系我:作...
    编程 发布于2024-11-06

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3