”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 简化 NestJS 中的文件上传:无需磁盘存储即可高效内存中解析 CSV 和 XLSX

简化 NestJS 中的文件上传:无需磁盘存储即可高效内存中解析 CSV 和 XLSX

发布于2024-11-06
浏览:942

Effortless File Parsing in NestJS: Manage CSV and XLSX Uploads in Memory for Speed, Security, and Scalability

Introduction

Handling file uploads in a web application is a common task, but dealing with different file types and ensuring they are processed correctly can be challenging. Often, developers need to parse uploaded files without saving them to the server, which is especially important for reducing server storage costs and ensuring that sensitive data is not unnecessarily retained. In this article, we’ll walk through the process of creating a custom NestJS module to handle file uploads specifically for CSV and XLS/XLSX files, and we’ll parse these files in memory using Node.js streams, so no static files are created on the server.

Why NestJS?

NestJS is a progressive Node.js framework that leverages TypeScript and provides an out-of-the-box application architecture that enables you to build highly testable, scalable, loosely coupled, and easily maintainable applications. By using NestJS, we can take advantage of its modular structure, powerful dependency injection system, and extensive ecosystem.

Step 1: Setting Up the Project

Before we dive into the code, let’s set up a new NestJS project. If you haven’t already, install the NestJS CLI:

npm install -g @nestjs/cli

Create a new NestJS project:

nest new your-super-name

Navigate into the project directory:

cd your-super-name

Step 2: Installing Required Packages

We’ll need to install some additional packages to handle file uploads and parsing:

npm install @nestjs/platform-express multer exceljsfile-type
  • Multer: A middleware for handling multipart/form-data, which is primarily used for uploading files.
  • Exlesjs: A powerful library for parsing CSV/XLS/XLSX files.
  • File-Type: A library for detecting the file type of a stream or buffer.

Step 3: Creating the Multer Storage Engine Without Saving Files

To customize the file upload process, we’ll create a custom Multer storage engine. This engine will ensure that only CSV and XLS/XLSX files are accepted, parse them in memory using Node.js streams, and return the parsed data without saving any files to disk.

Create a new file for our engine:

import { PassThrough } from 'stream';
import * as fileType from 'file-type';
import { BadRequestException } from '@nestjs/common';
import { Request } from 'express';
import { Workbook } from 'exceljs';
import { createParserCsvOrXlsx } from './parser-factory.js';

const ALLOWED_MIME_TYPES = [
  'text/csv',
  'application/vnd.ms-excel',
  'text/comma-separated-values',
  'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
  'application/vnd.ms-excel',
] as const;

export class CsvOrXlsxMulterEngine {
  private destKey: string;
  private maxFileSize: number;
  constructor(opts: { destKey: string; maxFileSize: number }) {
    this.destKey = opts.destKey;
    this.maxFileSize = opts.maxFileSize;
  }
  async _handleFile(req: Request, file: any, cb: any) {
    try {
      const contentLength = Number(req.headers['content-length']);
      if (
        typeof contentLength === 'number' &&
        contentLength > this.maxFileSize
      ) {
        throw new Error(`Max file size is ${this.maxFileSize} bytes.`);
      }
      const fileStream = await fileType.fileTypeStream(file.stream);
      const mime = fileStream.fileType?.mime ?? file.mimetype;
      if (!ALLOWED_MIME_TYPES.includes(mime)) {
        throw new BadRequestException('File must be *.csv or *.xlsx');
      }
      const replacementStream = new PassThrough();
      fileStream.pipe(replacementStream);
      const parser = createParserCsvOrXlsx(mime);
      const data = await parser.read(replacementStream);
      cb(null, {
        [this.destKey]:
          mime === 'text/csv' ? data : (data as Workbook).getWorksheet(),
      });
    } catch (error) {
      cb(error);
    }
  }
  _removeFile(req: Request, file: any, cb: any) {
    cb(null);
  }
}

This custom storage engine checks the file’s MIME type and ensures it’s either a CSV or XLS/XLSX file. It then processes the file entirely in memory using Node.js streams, so no temporary files are created on the server. This approach is both efficient and secure, especially when dealing with sensitive data.

Step 4: Creating the Parser Factory

The parser factory is responsible for determining the appropriate parser based on the file type.

Create a new file for our parser:

import excel from 'exceljs';

export function createParserCsvOrXlsx(mime: string) {
  const workbook = new excel.Workbook();
  return [
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    'application/vnd.ms-excel',
  ].includes(mime)
    ? workbook.xlsx
    : workbook.csv;
}

This factory function checks the MIME type and returns the appropriate parser (either xlsx or csv).

Step 5: Configuring Multer in the NestJS Controller

Next, let’s create a controller to handle file uploads using our custom storage engine.

Generate a new controller:

nest g controller files

In the files.controller.ts, configure the file upload using Multer and the custom storage engine:

import {
  Controller,
  Post,
  UploadedFile,
  UseInterceptors,
} from '@nestjs/common';
import { FileInterceptor } from '@nestjs/platform-express';
import { Worksheet } from 'exceljs';
import { CsvOrXlsxMulterEngine } from '../../shared/multer-engines/csv-xlsx/engine.js';
import { FilesService } from './files.service.js';

const MAX_FILE_SIZE_IN_MiB = 1000000000; // Only for test

@Controller('files')
export class FilesController {
  constructor(private readonly filesService: FilesService) {}
  @UseInterceptors(
    FileInterceptor('file', {
      storage: new CsvOrXlsxMulterEngine({
        maxFileSize: MAX_FILE_SIZE_IN_MiB,
        destKey: 'worksheet',
      }),
    }),
  )
  @Post()
  create(@UploadedFile() data: { worksheet: Worksheet }) {
    return this.filesService.format(data.worksheet);
  }
}

This controller sets up an endpoint to handle file uploads. The uploaded file is processed by the CsvOrXlsxMulterEngine, and the parsed data is returned in the response without ever being saved to disk.

Step 6: Setting Up the Module

Finally, we need to set up a module to include our controller.

Generate a new module:

nest g module files

In the files.module.ts, import the controller:

import { Module } from '@nestjs/common';
import { FilesController } from './files.controller.js';
import { FilesService } from './files.service.js';

@Module({
  providers: [FilesService],
  controllers: [FilesController],
})
export class FilesModule {}

Make sure to import this module into your AppModule:

Step 7: Testing the File Upload with HTML

To test the file upload functionality, we can create a simple HTML page that allows users to upload CSV or XLS/XLSX files. This page will send the file to our /api/files endpoint, where it will be parsed and processed in memory.

Here’s the basic HTML file for testing the file upload:



    File Upload

Upload a File (CSV or XLSX)



To render the HTML page for file uploads, we first need to install an additional NestJS module called @nestjs/serve-static. You can do this by running the following command:

npm install @nestjs/serve-static

After installing, we need to configure this module in AppModule:

import { Module } from '@nestjs/common';
import { join } from 'path';
import { ServeStaticModule } from '@nestjs/serve-static';
import { FilesModule } from './modules/files/files.module.js';

@Module({
  imports: [
    FilesModule,
    ServeStaticModule.forRoot({
      rootPath: join(new URL('..', import.meta.url).pathname, 'public'),
      serveRoot: '/',
    }),
  ],
})
export class AppModule {}

This setup will allow us to serve static files from the public directory. Now, we can open the file upload page by navigating to http://localhost:3000 in your browser.

Streamline File Uploads in NestJS: Efficient In-Memory Parsing for CSV & XLSX Without Disk Storage

Upload Your File

To upload a file, follow these steps:

  1. Choose a file by clicking on the ‘Choose file’ button.
  2. Click on the ‘Upload’ button to start the upload process.

Once the file is uploaded successfully, you should see a confirmation that the file has been uploaded and formatted.

Streamline File Uploads in NestJS: Efficient In-Memory Parsing for CSV & XLSX Without Disk Storage

Note: I haven’t included code for formatting the uploaded file, as this depends on the library you choose for processing CSV or XLS/XLSX files. You can view the complete implementation on GitHub.
Comparing Pros and Cons of In-Memory File Processing
When deciding whether to use in-memory file processing or saving files to disk, it’s important to understand the trade-offs.

Pros of In-Memory Processing:

No Temporary Files on Disk:

  • Security: Sensitive data isn’t left on the server’s disk, reducing the risk of data leaks.
  • Resource Efficiency: The server doesn’t need to allocate disk space for temporary files, which can be particularly useful in environments with limited storage.

Faster Processing:

  • Performance: Parsing files in memory can be faster since it eliminates the overhead of writing and reading files from disk.
  • Reduced I/O Operations: Fewer disk I/O operations means lower latency and potent ially higher throughput for file processing.

Simplified Cleanup:

  • No Cleanup Required: Since files aren’t saved to disk, there’s no need to manage or clean up temporary files, simplifying the codebase.

Cons of In-Memory Processing:

Memory Usage:

  • High Memory Consumption: Large files can consume significant amounts of memory, which might lead to out-of-memory errors if the server doesn’t have enough resources.
  • Scalability: Handling large files or multiple file uploads simultaneously may require careful memory management and scaling strategies.

File Size Limitations:

  • Limited by Memory: The maximum file size that can be processed is limited by the available memory on the server. This can be a signific ant drawback for applications dealing with very large files.

Complexity in Error Handling:

  • Error Management: Managing errors in streaming data can be more complex than handling files on disk, especially in cases where partial data might need to be recovered or analyzed.

When to Use In-Memory Processing:

Small to Medium Files: If your application deals with relatively small files, in-memory processing can offer speed and simplicity.

Security-Sensitive Applications: When handling sensitive data that shouldn’t be stored on disk, in-memory processing can reduce the risk of data breaches.

High-Performance Scenarios: Applications that require high throughput and minimal latency may benefit from the reduced overhead of in-memory processing.

When to Consider Disk-Based Processing:

Large Files: If your application needs to process very large files, disk-based processing may be necessary to avoid running out of memory.

Resource-Constrained Environments: In cases where server memory is limited, processing files on disk can prevent memory exhaustion and allow for better resource management.

Persistent Storage Needs: If you need to retain a copy of the uploaded file for auditing, backup, or later retrieval, saving files to disk is necessary.

Integration with External Storage Services: For large files, consider uploading them to external storage services like AWS S3, Google Cloud

  • Storage, or Azure Blob Storage. These services allow you to offload storage from your server, and you can process the files in the cloud or retrieve them for in-memory processing as needed.

Scalability: Cloud storage solutions can handle massive files and provide redundancy, ensuring that your data is safe and easily accessible from multiple geographic locations.

Cost Efficiency: Using cloud storage can be more cost-effective for handling large files, as it reduces the need for local server resources and provides pay-as-you-go pricing.

Conclusion

In this article, we’ve created a custom file upload module in NestJS that handles CSV and XLS/XLSX files, parses them in memory, and returns the parsed data without saving any files to disk. This approach leverages the power of Node.js streams, making it both efficient and secure, as no temporary files are left on the server.

We’ve also explored the pros and cons of in-memory file processing versus saving files to disk. While in-memory processing offers speed, security, and simplicity, it’s important to consider the memory usage and potential file size limitations before adopting this approach.

Whether you’re building an enterprise application or a small project, handling file uploads and parsing correctly is crucial. With this setup, you’re well on your way to mastering file uploads in NestJS without worrying about unnecessary server storage or data security issues.

Feel free to share your thoughts and improvements in the comments section below!

If you enjoyed this article or found these tools useful, make sure to follow me on Dev.to for more insights and tips on coding and development. I regularly share helpful content to make your coding journey smoother.

Follow me on X (Twitter), where I share more interesting thoughts, updates, and discussions about programming and tech! Don't miss out - click those follow buttons.

You can also follow me on LinkedIn for professional insights, updates on my latest projects, and discussions about coding, tech trends, and more. Don't miss out on valuable content that can help you level up your development skills - let's connect!

版本声明 本文转载于:https://dev.to/damir_maham/streamline-file-uploads-in-nestjs-efficient-in-memory-parsing-for-csv-xlsx-without-disk-storage-145g?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 哪个 PHP 库提供卓越的 SQL 注入防护:PDO 还是 mysql_real_escape_string?
    哪个 PHP 库提供卓越的 SQL 注入防护:PDO 还是 mysql_real_escape_string?
    PDO vs. mysql_real_escape_string:综合指南查询转义对于防止 SQL 注入至关重要。虽然 mysql_real_escape_string 提供了转义查询的基本方法,但 PDO 成为了一种具有众多优点的卓越解决方案。什么是 PDO?PHP 数据对象 (PDO) 是一个数...
    编程 发布于2024-11-06
  • React 入门:初学者的路线图
    React 入门:初学者的路线图
    大家好! ? 我刚刚开始学习 React.js 的旅程。这是一次令人兴奋(有时甚至具有挑战性!)的冒险,我想分享一下帮助我开始的步骤,以防您也开始研究 React。这是我的处理方法: 1.掌握 JavaScript 基础知识 在开始使用 React 之前,我确保温习一下我的 JavaScript 技...
    编程 发布于2024-11-06
  • 如何引用 JavaScript 对象中的内部值?
    如何引用 JavaScript 对象中的内部值?
    如何在 JavaScript 对象中引用内部值在 JavaScript 中,访问引用同一对象中其他值的对象中的值有时可能具有挑战性。考虑以下代码片段:var obj = { key1: "it ", key2: key1 " works!" }; ...
    编程 发布于2024-11-06
  • Python 列表方法快速指南及示例
    Python 列表方法快速指南及示例
    介绍 Python 列表用途广泛,并附带各种内置方法,有助于有效地操作和处理数据。下面是所有主要列表方法的快速参考以及简短的示例。 1. 追加(项目) 将项目添加到列表末尾。 lst = [1, 2, 3] lst.append(4) # [1, 2, 3, 4]...
    编程 发布于2024-11-06
  • C++ 中何时需要用户定义的复制构造函数?
    C++ 中何时需要用户定义的复制构造函数?
    何时需要用户定义的复制构造函数?复制构造函数是 C 面向对象编程的组成部分,提供了一种基于现有实例初始化对象的方法。虽然编译器通常会为类生成默认的复制构造函数,但在某些情况下需要进行自定义。需要用户定义复制构造函数的情况当默认复制构造函数不够时,程序员会选择用户定义的复制构造函数来实现自定义复制行为...
    编程 发布于2024-11-06
  • 尝试...捕获 V/s 安全分配 (?=):现代发展的福音还是诅咒?
    尝试...捕获 V/s 安全分配 (?=):现代发展的福音还是诅咒?
    最近,我发现了 JavaScript 中引入的新安全赋值运算符 (?.=),我对它的简单性着迷。 ? 安全赋值运算符 (SAO) 是传统 try...catch 块的简写替代方案。它允许您内联捕获错误,而无需为每个操作编写显式的错误处理代码。这是一个例子: const [error, respons...
    编程 发布于2024-11-06
  • 如何在Python中优化固定宽度文件解析?
    如何在Python中优化固定宽度文件解析?
    优化固定宽度文件解析为了有效地解析固定宽度文件,可以考虑利用Python的struct模块。此方法利用 C 来提高速度,如以下示例所示:import struct fieldwidths = (2, -10, 24) fmtstring = ' '.join('{}{}'.format(abs(fw...
    编程 发布于2024-11-06
  • 蝇量级
    蝇量级
    结构模式之一旨在通过与相似对象共享尽可能多的数据来减少内存使用。 在处理大量相似对象时特别有用,为每个对象创建一个新实例在内存消耗方面会非常昂贵。 关键概念: 内在状态:多个对象之间共享的状态独立于上下文,并且在不同对象之间保持相同。 外部状态:每个对象唯一的、从客户端传递的状态。此状态可能会有所不...
    编程 发布于2024-11-06
  • 解锁您的 MySQL 掌握:MySQL 实践实验室课程
    解锁您的 MySQL 掌握:MySQL 实践实验室课程
    通过全面的 MySQL 实践实验室课程提高您的 MySQL 技能并成为数据库专家。这种实践学习体验旨在指导您完成一系列实践练习,使您能够克服复杂的 SQL 挑战并优化数据库性能。 深入了解 MySQL 无论您是想要建立强大 MySQL 基础的初学者,还是想要提升专业知识的经验丰富的开...
    编程 发布于2024-11-06
  • 文件夹
    文件夹
    ? ?大家好,我是尼克?? 利用专家工程解决方案提升您的项目 探索我的产品组合,了解我如何将尖端技术、强大的问题解决能力和创新热情结合起来,构建可扩展的高性能应用程序。无论您是寻求增强开发流程还是解决复杂的技术挑战,我都可以帮助您实现愿景。看看我的工作,让我们合作做一些非凡的事情! 在这里联系我:作...
    编程 发布于2024-11-06
  • 通过 Gmail 发送电子邮件时如何修复“SMTP Connect() 失败”错误?
    通过 Gmail 发送电子邮件时如何修复“SMTP Connect() 失败”错误?
    SMTP 连接失败:解决“SMTP Connect() 失败”错误尝试使用 Gmail 发送电子邮件时,您可能会遇到错误消息指出“SMTP -> 错误:无法连接到服务器:连接超时 (110)\nSMTP Connect() 失败。消息未发送。\n邮件程序错误:SMTP Connect() 失败。”此...
    编程 发布于2024-11-06
  • 如何使用 Pillow 在 Python 中水平连接多个图像?
    如何使用 Pillow 在 Python 中水平连接多个图像?
    用Python水平连接图像水平组合多个图像是图像处理中的常见任务。 Python 提供了强大的工具来使用 Pillow 库来实现此目的。问题描述考虑三个尺寸为 148 x 95 的方形 JPEG 图像。目标是水平连接这些图像图像,同时避免结果输出中出现任何部分图像。建议的解决方案以下代码片段解决了该...
    编程 发布于2024-11-06
  • REST API 设计和命名约定指南
    REST API 设计和命名约定指南
    有效地设计RESTful API对于创建可扩展、可维护且易于使用的系统至关重要。虽然存在某些标准,但许多标准并不是严格的规则,而是指导 API 设计的最佳实践。一种广泛使用的 API 架构模式是 MVC(模型-视图-控制器),但它本身并不能解决 API 设计的更精细方面,例如命名和结构。在本文中,我...
    编程 发布于2024-11-06
  • Java 中具有给定总和的子数组的不同方法
    Java 中具有给定总和的子数组的不同方法
    查找具有给定总和的子数组是编码面试和竞争性编程中经常出现的常见问题。这个问题可以使用各种技术来解决,每种技术在时间复杂度和空间复杂度方面都有自己的权衡。在本文中,我们将探索多种方法来解决在 Java 中查找具有给定总和的子数组的问题。 问题陈述 给定一个整数数组和一个目标和,在数组中找到一个连续的子...
    编程 发布于2024-11-06

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3