”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > Step Functions 中间件:自动存储和加载来自 Amazon S3 的有效负载

Step Functions 中间件:自动存储和加载来自 Amazon S3 的有效负载

发布于2024-08-23
浏览:384

I would like to introduce middy-store, a new library I built over the last couple of months. I've been pondering this idea for a while, going back to this feature request I opened more than a year ago. middy-store is a middleware for Middy that automatically stores and loads payloads from and to a Store like Amazon S3 or potentially other services.

Motivation

AWS services have certain limits that one must be aware of. For example, AWS Lambda has a payload limit of 6MB for synchronous invocations and 256KB for asynchronous invocations. AWS Step Functions allows for a maximum input or output size of 256KB of data as a UTF-8 encoded string. If you exceed this limit when returning data, you will encounter the infamous States.DataLimitExceeded exception.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

The usual workaround for this limitation is to check the size of your payload and save it temporarily in persistent storage such as Amazon S3. Then, you return the object URL or ARN for S3. The next Lambda checks if there is a URL or ARN in the input and loads the payload from S3. As one can imagine, this results in a lot of boilerplate code to store and load the payload from and to Amazon S3, which has to be repeated in every Lambda.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

This becomes even more cumbersome when you only want to save part of the payload to S3 and leave the rest as is. For example, when working with Step Functions, the payload could contain control flow data for states like Choice or Map, which has to be accessed directly. This means the first Lambda saves a partial payload to S3, and the next Lambda has to load the partial payload from S3 and merge it with the rest of the payload. This requires ensuring that the types are consistent across multiple functions, which is, of course, very error-prone.

How it works

middy-store is a middleware for Middy. It's attached to a Lambda function and is called twice during a Lambda invocation: before and after the Lambda handler() runs. It receives the input before the handler runs and receives the output from the handler after it has finished.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

Let's start at the end with the output after a successful invocation to make it easier to follow: middy-store receives the output (the payload) from the handler() function and checks the size. To calculate the size, it stringifies the payload, if it is an object, and uses Buffer.byteLength() to calculate the UTF-8 encoded string size. If the size is larger than a certain configurable threshold, the payload is stored in a Store like Amazon S3. The reference to the stored payload (e.g., an S3 URL or ARN) is then returned as the output instead of the original output.

Now let's look at the next Lambda function (e.g. in a state machine), which will receive this output as its input. This time we are looking at the input before the handler() is invoked: middy-store receives the input to the handler and searches for a reference to a stored payload. If it finds one, the payload is loaded from the Store and returned as the input to the handler. The handler uses the payload as if it was passed directly to it.

Here's an example to illustrate how middy-store works:

/* ./src/functions/handler1.ts */
export const handler1 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Return 1MB of random data as a base64 encoded string as output 
    return randomBytes(1024 * 1024).toString('base64');
  });

/* ./src/functions/handler2.ts */
export const handler2 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Print the size of the input
    return console.log(`Size: ${Buffer.from(input, "base64").byteLength / 1024 / 1024} MB`);
  });

/* ./src/workflow.ts */
// First Lambda returns a large output
// It automatically uploads the data to S3 
const output1 = await handler1({});

// Output is a reference to the S3 object: { "@middy-store": "s3://bucket/key"}
console.log(output1); 

// Second Lambda receives the output as input
// It automatically downloads the data from S3
const output2 = await handler2(output1);

What's a Store?

In general, a Store is any service that allows you to store and load arbitrary payloads, like Amazon S3 or other persistent storage systems. Databases like DynamoDB can also act as a Store. The Store receives a payload from the Lambda handler, serializes it (if it's an object), and stores it in persistent storage. When the next Lambda handler needs the payload, the Store loads the payload from the storage, deserializes and returns it.

middy-store interacts with a Store through a StoreInterface interface, which every Store has to implement. The interface defines the functions canStore() and store() to store payloads, and canLoad() and load() to load payloads.

interface StoreInterface {
  name: string;
  canLoad: (args: LoadArgs) => boolean;
  load: (args: LoadArgs) => Promise;
  canStore: (args: StoreArgs) => boolean;
  store: (args: StoreArgs) => Promise;
}
  • canStore() serves as a guard to check if the Store can store a given payload. It receives the payload and its byte size and checks if the payload fits within the maximum size limits of the Store. For example, a Store backed by DynamoDB has a maximum item size of 400KB, while an S3 store has effectively no limit on the payload size it can store.

  • store() receives a payload and stores it in its underlying storage system. It returns a reference to the payload, which is a unique identifier to identify the stored payload within the underlying service. For example, the Amazon S3 Store uses an S3 URI in the format s3:/// as a reference, while other Amazon services might use ARNs.

  • canLoad() acts like a filter to check if the Store can load a certain reference. It receives the reference to a stored payload and checks if it's a valid identifier for the underlying storage system. For example, the Amazon S3 Store checks if the reference is a valid S3 URI, while a DynamoDB Store would check if it's a valid ARN.

  • load() receives the reference to a stored payload and loads the payload from storage. Depending on the Store, the payload will be deserialized into its original type according to the metadata that was stored alongside it. For example, a payload of type application/json will get parsed back into a JSON object, while a plain string of type text/plain will remain unaltered.

Single and Multiple Stores

Most of the time, you will only need one Store, like Amazon S3, which can effectively store any payload. However, middy-store lets you work with multiple Stores at the same time. This can be useful if you want to store different types of payloads in different Stores. For example, you might want to store large payloads in S3 and small payloads in DynamoDB.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

middy-store accepts an Array in the options to provide one or more Stores. When middy-store runs before the handler and finds a reference in the input, it will iterate over the Stores and call canLoad() with the reference for each Store. The first Store that returns true will be used to load the payload with load().

On the other hand, when middy-store runs after the handler and the output is larger than the maximum allowed size, it will iterate over the Stores and call canStore() for each Store. The first Store that returns true will be used to store the payload with store().

Therefore, it is important to note that the order of the Stores in the array is important.

References

When a payload is stored in a Store, middy-store will return a reference to the stored payload. The reference is a unique identifier to find the stored payload in the Store. The value of the identifier depends on the Store and its configuration. For example, the Amazon S3 Store will use an S3 URI by default. However, it can also be configured to return other formats like an ARN arn:aws:s3:::/, an HTTP endpoint https://.s3.us-west-1.amazonaws.com/, or a structured object with the bucket and key.

The output from the handler after middy-store will contain the reference to the stored payload:

/* Output with reference */
{
  "@middy-store": "s3://bucket/key"
}

middy-store embeds the reference from the Store in the output as an object with a key "@middy-store". This allows middy-store to quickly find all references when the next Lambda function is called and load the payloads from the Store before the handler runs. In case you are wondering, middy-store recursively iterates through the input object and searches for the "@middy-store" key. That means the input can contain multiple references, even from different Stores, and middy-store will find and load them.

Selecting a Payload

By default, middy-store will store the entire output of the handler as a payload in the Store. However, you can also select only a part of the output to be stored. This is useful for workflows like AWS Step Functions, where you might need some of the data for control flow, e.g., a Choice state.

middy-store accepts a selector in its storingOptions config. The selector is a string path to the relevant value in the output that should be stored.

Here's an example:

const output = {
  a: {
    b: ['foo', 'bar', 'baz'],
  },
};

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        selector: '',          /* select the entire output as payload */
        // selector: 'a';      /* selects the payload at the path 'a' */
        // selector: 'a.b';    /* selects the payload at the path 'a.b' */
        // selector: 'a.b[0]'; /* selects the payload at the path 'a.b[0]' */
        // selector: 'a.b[*]'; /* selects the payloads at the paths 'a.b[0]', 'a.b[1]', 'a.b[2]', etc. */
      }
    })
  )
  .handler(async () => output);

await handler({});

The default selector is an empty string (or undefined), which selects the entire output as a payload. In this case, middy-store will return an object with only one property, which is the reference to the stored payload.

/* selector: '' */
{
  "@middy-store": "s3://bucket/key"
}

The selectors a, a.b, or a.b[0] select the value at the path and store only this part in the Store. The reference to the stored payload will be inserted at the path in the output, thereby replacing the original value.

/* selector: 'a' */
{
  a: {
    "@middy-store": "s3://bucket/key"
  }
}
/* selector: 'a.b' */
{
  a: {
    b: {
      "@middy-store": "s3://bucket/key"
    }
  }
}
/* selector: 'a.b[0]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      'bar', 
      'baz'
    ]
  }
}

A selector ending with [*] like a.b[*] acts like an iterator. It will select the array at a.b and store each element in the array in the Store separately. Each element will be replaced with the reference to the stored payload.

/* selector: 'a.b[*]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }
    ]
  }
}

Size Limit

middy-store will calculate the size of the entire output returned from the handler. The size is calculated by stringifying the output, if it's not already a string, and calculating the UTF-8 encoded size of the string in bytes. It will then compare this size to the configured minSize in the storingOptions config. If the output size is equal to or greater than the minSize, it will store the output or a part of it in the Store.

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        minSize: Sizes.STEP_FUNCTIONS,  /* 256KB */
        // minSize: Sizes.LAMBDA_SYNC,  /* 6MB */
        // minSize: Sizes.LAMBDA_ASYNC, /* 256KB */
        // minSize: 1024 * 1024,        /* 1MB */
        // minSize: Sizes.ZERO,         /* 0 */
        // minSize: Sizes.INFINITY,     /* Infinity */
        // minSize: Sizes.kb(512),      /* 512KB */
        // minSize: Sizes.mb(1),        /* 1MB */
      }
    })
  )
  .handler(async () => output);

await handler({});

middy-store provides a Sizes helper with some predefined limits for Lambda and Step Functions. If minSize is not specified, it will use Sizes.STEP_FUNCTIONS with 256KB as the default minimum size. The Sizes.ZERO (equal to the number 0) means that middy-store will always store the payload in a Store, ignoring the actual output size. On the other hand, Sizes.INFINITY (equal to Math.POSITIVE_INFINITY) means that it will never store the payload in a Store.

Stores

Currently, there is only one Store implementation for Amazon S3, but I'm planning to implement a Store backed by DynamoDB and DAX. DynamoDB, with its Time-To-Live (TTL) feature, provides a great option for short-term payloads that only need to exist during the execution of a workflow like Step Functions.

Amazon S3

The middy-store-s3 package provides a store implementation for Amazon S3. It uses the official @aws-sdk/client-s3 package to interact with S3.

import { middyStore } from 'middy-store';
import { S3Store } from 'middy-store-s3';

const handler = middy()
  .use(
    middyStore({
      stores: [
        new S3Store({
          config: { region: "us-east-1" },
          bucket: "bucket",
          key: () => randomUUID(),
          format: "arn",
        }),
      ],
    }),
  )
  .handler(async (input) => {
    return { /* ... */ };
  });

The S3Store only requires a bucket where the payloads are being stored. The key is optional and defaults to randomUUID(). The format configures the style of the reference that is returned after a payload is stored. The supported formats include arn, object, or one of the URL formats from the amazon-s3-url package. It's important to note that S3Store can load any of these formats; the format config only concerns the returned reference. The config is the S3 client configuration and is optional. If not set, the S3 client will resolve the config (credentials, region, etc.) from the environment or file system.

Custom Store

A new Store can be implemented as a class or a plain object, as long as it provides the required functions from the StoreInterface interface.

Here's an example of a Store to store and load payloads as base64 encoded data URLs:

import { StoreInterface, middyStore } from 'middy-store';

const base64Store: StoreInterface = {
  name: "base64",
  /* Reference must be a string starting with "data:text/plain;base64," */
  canLoad: ({ reference }) => {
    return (
      typeof reference === "string" &&
      reference.startsWith("data:text/plain;base64,")
    );
  },
  /* Decode base64 string and parse into object */
  load: async ({ reference }) => {
    const base64 = reference.replace("data:text/plain;base64,", "");
    return Buffer.from(base64, "base64").toString();
  },
  /* Payload must be a string or an object */
  canStore: ({ payload }) => {
    return typeof payload === "string" || typeof payload === "object";
  },
  /* Stringify object and encode as base64 string */
  store: async ({ payload }) => {
    const base64 = Buffer.from(JSON.stringify(payload)).toString("base64");
    return `data:text/plain;base64,${base64}`;
  },
};

const handler = middy()
  .use(
    middyStore({
      stores: [base64Store],
      storingOptions: {
        minSize: Sizes.ZERO, /* Always store the data */ 
      }
    }),
  )
  .handler(async (input) => {
    /* Random text with 100 words */ 
    return `Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.`;
  });

const output = await handler(null, context);

/* Prints: { '@middy-store': 'data:text/plain;base64,IkxvcmVtIGlwc3VtIGRvbG9yIHNpdC...' } */
console.log(output);

This example is the perfect way to try middy-store, because it doesn't rely on external resources like an S3 bucket. You will find it in the repository at examples/custom-store and should be able to run it locally.

Contributions and Feedback

I've been tinkering with the API design for a while, and it's definitely not stable yet. I would love to get feedback on the current state as well as suggestions for changes or improvements. If you are eager to contribute to this project, please go ahead and submit feature requests or pull requests.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 zirkelc / middy-store

Middleware for Step Functions: Automatically Store and Load Payloads

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

Middleware middy-store

middy-store is a middleware for Lambda that automatically stores and loads payloads from and to a Store like Amazon S3 or potentially other services.

Installation

You will need @middy/core >= v5 to use middy-store Please be aware that the API is not stable yet and might change in the future. To avoid accidental breaking changes, please pin the version of middy-store and its sub-packages in your package.json to an exact version.

npm install --save-exact @middy/core middy-store middy-store-s3 
Enter fullscreen mode Exit fullscreen mode

Motivation

AWS services have certain limits that one must be aware of. For example, AWS Lambda has a payload limit of 6MB for synchronous invocations and 256KB for asynchronous invocations. AWS Step Functions allows for a maximum input or output size of 256KB of data as a UTF-8 encoded string. If you exceed this limit when returning data, you will encounter the infamous States.DataLimitExceeded exception.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

The usual workaround for this…

View on GitHub
版本声明 本文转载于:https://dev.to/aws-builders/middleware-for-step-functions-automatically-store-and-load-payloads-from-amazon-s3-18g4?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • HTML格式标签
    HTML格式标签
    HTML 格式化元素 **HTML Formatting is a process of formatting text for better look and feel. HTML provides us ability to format text without us...
    编程 发布于2025-07-10
  • 为什么HTML无法打印页码及解决方案
    为什么HTML无法打印页码及解决方案
    无法在html页面上打印页码? @page规则在@Media内部和外部都无济于事。 HTML:Customization:@page { margin: 10%; @top-center { font-family: sans-serif; font-weight: bo...
    编程 发布于2025-07-10
  • Python读取CSV文件UnicodeDecodeError终极解决方法
    Python读取CSV文件UnicodeDecodeError终极解决方法
    在试图使用已内置的CSV模块读取Python中时,CSV文件中的Unicode Decode Decode Decode Decode decode Error读取,您可能会遇到错误的错误:无法解码字节 在位置2-3中:截断\ uxxxxxxxx逃脱当CSV文件包含特殊字符或Unicode的路径逃...
    编程 发布于2025-07-10
  • PHP阵列键值异常:了解07和08的好奇情况
    PHP阵列键值异常:了解07和08的好奇情况
    PHP数组键值问题,使用07&08 在给定数月的数组中,键值07和08呈现令人困惑的行为时,就会出现一个不寻常的问题。运行print_r($月)返回意外结果:键“ 07”丢失,而键“ 08”分配给了9月的值。此问题源于PHP对领先零的解释。当一个数字带有0(例如07或08)的前缀时,PHP将其...
    编程 发布于2025-07-10
  • 如何使用替换指令在GO MOD中解析模块路径差异?
    如何使用替换指令在GO MOD中解析模块路径差异?
    在使用GO MOD时,在GO MOD 中克服模块路径差异时,可能会遇到冲突,其中3个Party Package将另一个PAXPANCE带有导入式套件之间的另一个软件包,并在导入式套件之间导入另一个软件包。如回声消息所证明的那样: go.etcd.io/bbolt [&&&&&&&&&&&&&&&&...
    编程 发布于2025-07-10
  • `console.log`显示修改后对象值异常的原因
    `console.log`显示修改后对象值异常的原因
    foo = [{id:1},{id:2},{id:3},{id:4},{id:id:5},],]; console.log('foo1',foo,foo.length); foo.splice(2,1); console.log('foo2', foo, foo....
    编程 发布于2025-07-10
  • 如何避免Go语言切片时的内存泄漏?
    如何避免Go语言切片时的内存泄漏?
    ,a [j:] ...虽然通常有效,但如果使用指针,可能会导致内存泄漏。这是因为原始的备份阵列保持完整,这意味着新切片外部指针引用的任何对象仍然可能占据内存。 copy(a [i:] 对于k,n:= len(a)-j i,len(a); k
    编程 发布于2025-07-10
  • 为什么使用固定定位时,为什么具有100%网格板柱的网格超越身体?
    为什么使用固定定位时,为什么具有100%网格板柱的网格超越身体?
    网格超过身体,用100%grid-template-columns 为什么在grid-template-colms中具有100%的显示器,当位置设置为设置的位置时,grid-template-colly修复了?问题: 考虑以下CSS和html: class =“ snippet-code”> g...
    编程 发布于2025-07-10
  • PHP与C++函数重载处理的区别
    PHP与C++函数重载处理的区别
    作为经验丰富的C开发人员脱离谜题,您可能会遇到功能超载的概念。这个概念虽然在C中普遍,但在PHP中构成了独特的挑战。让我们深入研究PHP功能过载的复杂性,并探索其提供的可能性。在PHP中理解php的方法在PHP中,函数超载的概念(如C等语言)不存在。函数签名仅由其名称定义,而与他们的参数列表无关。...
    编程 发布于2025-07-10
  • 如何在Java中正确显示“ DD/MM/YYYY HH:MM:SS.SS”格式的当前日期和时间?
    如何在Java中正确显示“ DD/MM/YYYY HH:MM:SS.SS”格式的当前日期和时间?
    如何在“ dd/mm/yyyy hh:mm:mm:ss.ss”格式“ gormat 解决方案: args)抛出异常{ 日历cal = calendar.getInstance(); SimpleDateFormat SDF =新的SimpleDateFormat(“...
    编程 发布于2025-07-09
  • 如何从PHP中的Unicode字符串中有效地产生对URL友好的sl。
    如何从PHP中的Unicode字符串中有效地产生对URL友好的sl。
    为有效的slug生成首先,该函数用指定的分隔符替换所有非字母或数字字符。此步骤可确保slug遵守URL惯例。随后,它采用ICONV函数将文本简化为us-ascii兼容格式,从而允许更广泛的字符集合兼容性。接下来,该函数使用正则表达式删除了不需要的字符,例如特殊字符和空格。此步骤可确保slug仅包含...
    编程 发布于2025-07-09
  • 如何使用Depimal.parse()中的指数表示法中的数字?
    如何使用Depimal.parse()中的指数表示法中的数字?
    在尝试使用Decimal.parse(“ 1.2345e-02”中的指数符号表示法表示的字符串时,您可能会遇到错误。这是因为默认解析方法无法识别指数符号。 成功解析这样的字符串,您需要明确指定它代表浮点数。您可以使用numbersTyles.Float样式进行此操作,如下所示:[&& && && ...
    编程 发布于2025-07-09
  • 在UTF8 MySQL表中正确将Latin1字符转换为UTF8的方法
    在UTF8 MySQL表中正确将Latin1字符转换为UTF8的方法
    在UTF8表中将latin1字符转换为utf8 ,您遇到了一个问题,其中含义的字符(例如,“jáuòiñe”)在utf8 table tabled tablesset中被extect(例如,“致电。为了解决此问题,您正在尝试使用“ mb_convert_encoding”和“ iconv”转换受...
    编程 发布于2025-07-09
  • 用户本地时间格式及时区偏移显示指南
    用户本地时间格式及时区偏移显示指南
    在用户的语言环境格式中显示日期/时间,并使用时间偏移在向最终用户展示日期和时间时,以其localzone and格式显示它们至关重要。这确保了不同地理位置的清晰度和无缝用户体验。以下是使用JavaScript实现此目的的方法。方法:推荐方法是处理客户端的Javascript中的日期/时间格式化和时...
    编程 发布于2025-07-09
  • 为什么我会收到MySQL错误#1089:错误的前缀密钥?
    为什么我会收到MySQL错误#1089:错误的前缀密钥?
    mySQL错误#1089:错误的前缀键错误descript [#1089-不正确的前缀键在尝试在表中创建一个prefix键时会出现。前缀键旨在索引字符串列的特定前缀长度长度,可以更快地搜索这些前缀。了解prefix keys `这将在整个Movie_ID列上创建标准主键。主密钥对于唯一识别...
    编程 发布于2025-07-09

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3