”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 维护开源备份工具:见解等

维护开源备份工具:见解等

发布于2024-07-31
浏览:174

Backup strategies might seem like a solved problem, yet system administrators often struggle with questions about how to backup data properly, where to store it, and how to standardize the backup process across different software environments. In 2011, we developed custom backup scripts that efficiently handled backups for our client's web projects. These scripts served us well for many years, storing backups in both our storage and external repositories as needed. However, as our software ecosystem grew and diversified, our scripts fell short, lacking support for new technologies like Redis and MySQL/PostgreSQL. The scripts also became cumbersome, with no monitoring system other than email alerts.

Our once compact scripts evolved into a complex and unmanageable system. Updating these scripts for different customers became challenging, particularly when they used customized versions. By early last year, we realized we needed a more modern solution.

In this article, we will explain all the difficulties we faced while developing nxs-backup and share our experiences and challenges. You can also test the tool on your project and share your experience, we would be very interested to hear from you. Now, let's get started!

We listed our requirements for a new system:

  • Backup data of the most commonly used software: (Files: discrete and incremental; MySQL; PostgreSQL; MongoDB; Redis);
  • Store backups in popular repositories: (FTP; SSH; SMB; NFS; WebDAV; S3);
  • Receive alerts in case of problems during the backup process;
  • Have a unified configuration file to manage backups centrally;
  • Add support for new software by connecting external modules;
  • Specify extra options for collecting dumps;
  • Be able to restore backups with standard tools;
  • Ease of initial configuration. All these requirements were listed based on our needs about 5 years ago. Unfortunately, not all of them were released.

We looked at open-source solutions that already existed even before creating our first version of nxs-backup. But they all had their flaws. For example, Bacula is overloaded with unnecessary functions for us, initial configuration is — rather a laborious occupation due to a lot of manual work (for example, for writing/searching scripts of database backups), and to recover copies need to use special utilities, etc.

No surprise that we faced the same problem while having an idea of rewriting our tool. The possibility of the fact that in four years something has changed and new tools have appeared online was not that high, but still.

We studied a couple of new tools that were not considered before. But, as discussed earlier, these also did not suit us. Because they did not fully meet our requirements.

We finally came to two important conclusions:

  1. None of the existing solutions was fully suitable for us;
  2. It seems we’ve had enough experience and craziness to write our solution for the first time. And we basically could do that again. So that’s what we did.

Before exploring the new version, let’s take a look at what we had before and why it was not enough for us.

The old version supported such DBs as MySQL, PostgreSQL, Redis, MongoDB, discrete and incremental copying of files, multiple remote storages (S3; SMB; NFS; FTP; SSH; WebDAV) and had such features as backup rotation, logging, e-mail notifications, and external modules.

Now, more on what we were concerned about.

Run a binary file without restarting the source file on any Linux

Over time, the list of systems we work with has grown considerably. Now we serve projects that use other than standard deb and rpm compatible distributions such as Arch, Suse, Alt, etc.

Recent systems had difficulty running nxs-backup because we only collected deb and rpm packages and supported a limited list of system versions. Somewhere we re-plucked the whole package, somewhere just binary, somewhere we just had to run the source code.

Working with the old version was very inconvenient for engineers, due to the need to work with the source. Not to mention that installation and updating in such mode take more time. Instead of setting up 10 servers per hour, you only had to spend an hour on one server.

We’ve known for a long time that it’s much better when you have a binary without system dependencies that you can run on any distribution and not experience problems with different versions of libraries and architectural differences in systems. We wanted this tool to be the same.

Minimize docker image with nxs-backup and support ENV in configuration files

Lately, so many projects are working in a containerized environment. These projects also require backups, and we run nxs-backup in containers. For containerized environments, it’s very important to minimize the image size and be able to work with environment variables.

The old version did not provide an opportunity to work with environment variables. The main problem was that passwords had to be stored directly in the config. Because of this, instead of a set of variables containing only passwords, you have to put the whole config into a variable. Editing large environment variables requires more concentration from engineers and makes troubleshooting a bit more difficult.

Also, when working with the old version, we had to use an already large Debian image, in which we needed to add several libraries and applications for correct backups.

Even using a slim version of the image we got a minimum size of ~250Mb, which is quite a lot for one small utility. In some cases, this affected the starting process of the collection because of how long the image was pulled onto the node. We wanted to get an image that wasn’t larger than 50 MB.

Work with remote storage without fuse

Another problem for container environments is using fuse to mount remote storage.

While you are running backups on the host, this is still acceptable: you have installed the right packages and enabled fuse in the kernel, and now it works.

Things get interesting when you need fuse in a container. Without an upgrade of privileges with direct access to the core of the host system, the problem is not solved, and this is a significant decrease in the security level.

This needs to be coordinated, not all customers agree to weaken security policies. That’s why we had to make a terrible amount of workarounds we don’t even want to recall. Furthermore, the additional layer increases the probability of failure and requires additional monitoring of the state of the mounted resources. It is safer and more stable to work with remote storage using their API directly.

Monitoring status and sending notifications not only to email

Today, teams are less and less using email in their daily work. It is understandable because it’s much faster to discuss the issue in a group chat or on a group call. Telegram, Slack, Mattermost, MS Teams, and other similar products are widely distributed by that.

We also have a bot, which sends various alerts and notifies us about them. And of course, we’d like to see reports of backups crashing in the workspace like Telegram, not email, among hundreds of other emails. By the way, some customers also want to see information about failures in their Slack or other messenger.

In addition, you long want to be able to track the status and see the details of the work in real-time. To do this, you need to change the format of the application, turning it into a demon.

Insufficient performance

Another acute pain was insufficient performance in certain scenarios.

One of the clients has a huge file dump of almost a terabyte and all of it is small files — text, pictures, etc. We’re collecting incremental copies of this stuff, and have the following problem — a yearly copy takes THREE days. Yeah, well, the old version just can’t digest that volume in less than a day.

Given the circumstances, we are, in fact, unable to recover data on a specific date, which we do not like at all.

Initially, we implemented our backup solution in Python due to its simplicity and flexibility. However, as demands grew, the Python-based solution became inadequate. After a thorough discussion, we decided to rewrite the system in Go for several reasons:

  1. Compilation and Dependencies: Go's AOT compiler produces a universal, dependency-free binary, simplifying deployment across different systems;
  2. Performance: Go's inherent multithreading capabilities promised better performance;
  3. Team Expertise: We had more developers experienced in Go than in Python.

Finding a solution

All of the above problems, to a greater or lesser extent, caused quite a palpable pain to the IT department, causing them to spend precious time on certainly important things, but these costs could have been avoided. Moreover, in certain situations certain risks were created for business owners — the probability of being without data for a certain day, although extremely low, but not zero. We refused to accept the state of affairs.

Nxs-backup 3.0

The result of our work was a new version of nxs-backup v 3.0 which recently had an update to v3.8.0
Key features of the new version:

  • Implement the corresponding interfaces of all storage facilities and all types of backups. Jobs and storage are initialized at the start, and not while the work is running;
  • Work with remote storage via API. For this, we use various libraries;
  • Use environment variables in configs, thanks to the go-nxs-appctx mini-application framework that we use in our projects;
  • Send log events via hooks. You can configure different levels and receive only errors or events of the desired level;
  • Specify not only the period of time for backup, but also a specific number of backups;
  • Backups now simply run on your Linux starting with the 2.6 kernel. This made it much easier to work with non-standard systems and faster to build Docker images. The image itself was reduced to 23 MB (with additional MySQL and SQL clients included);
  • Ability to collect, export, and save different metrics in Prometheus-compatible format.
  • Limiting resource consumption for local disk rate and remote storage.

We have tried to keep most of the configurations and application logic, but some changes are present. All of them are related to the optimization and correction of defects in the previous version.

For example, we put the connection parameters to the remote repositories into the basic configuration so that we don’t prescribe them for different types of backups each time.

Below is an example of the basic configuration for backups. It contains general settings such as notification channels, remote storage, logging, and job list. This is the basic main config with mail notification, we strongly recommend using email notifications as the default method. If you need more features you can see the reference in the documentation.

server_name: wp-server
project_name: My Best Project

loglevel: info

notifications:
  mail:
    enabled: true
    smtp_server: smtp.gmail.com
    smtp_port: 465
    smtp_user: [email protected]
    smtp_password: some5Tr0n9P@s5worD
    recipients:
      - [email protected]
      - [email protected]
  webhooks: []
storage_connects: []
jobs: []
include_jobs_configs: [ "conf.d/*.conf" ]

A few words about pitfalls

We expected to face certain challenges. It would be foolish to think otherwise. But two problems caused the strongest butthurt.

Image description

Memory leak or non-optimal algorithm

Even in the previous version of nxs-backup we used our own implementation of file archiving. The logic of this solution was to try to avoid using external tools to create backups, and working with files was the easiest step possible.

In practice, the solution proved to be workable, although not particularly effective on a large number of files, as could be seen from the tests. Back then we wrote it off to Python’s specifics and hoped to see a significant difference when we switched to Go.

When we finally got to the load testing of the new version, we got disappointing results. There were no performance gains and memory consumption was even higher than before. We were looking for a solution. Read a lot of articles and research on this topic, but they all said that the use of «filepath.Walk» and «filepath.WalkDir» is the best option. The performance of these methods only increases with the release of new versions of the language.

In an attempt to optimize memory consumption, we have even made mistakes in creating incremental copies. By the way, broken options were actually more effective. For obvious reasons, we did not use them.

Eventually, it all stuck to the number of files to be processed. We tested 10 million. Garbage Collector does not seem to be able to clear this amount of generated variables.

Eventually, realizing that we could bury too much time here, we decided to abandon our implementation in favor of a time-tested and truly effective solution — GNU tar.

We may come back to the idea of self-implementation later when we come up with a more efficient solution to handle tens of millions of files.

Such a different ftp

Another problem came up when working with ftp. It turned out that different servers behave differently for the same requests.

And it’s a really serious problem when for the same request you get either a normal answer, or an error that doesn’t seem to have anything to do with your request, or you don’t get a bug when you expect it.

So, we had to give up using the library “prasad83/goftp” in favor of a simpler “jlaffaye/ftp”, because the first could not work correctly with the Selectel server. The error was that when connecting, the first one tried to get the list of files in the working directory and got the error of access rights to the higher directory. With “jlaffaye/ftp” such a problem does not exist, because it is simpler and does not send any requests to the server.

The next problem was a disconnect when there were no requests. Not all servers behave this way, but some do. So we had to check before each request whether the connector had fallen off and reconnected.

The cherry on top was the problem of getting files from the server, or to be clear, an attempt to get a file that did not exist. Some servers give an error when trying to access such a file, others return a valid io.Reader interface object that can even be read, only you get an empty cut of bytes.

All of these situations have been discovered empirically and have to be handled on their own side.

Conclusions

Most importantly, we fixed the problems of the old version, the things that affected the work of engineers and created certain risks for business.

We still have unrealized “wants” from the last version, such as:

  • Backup encryption;
  • Restore from backup using nxs-backup tools;
  • Web interface to manage the list of jobs and their settings.

This list is now extended with new ones:

  • Own job scheduler. Use customized settings instead of system crones;
  • New backup types (Clickhouse, Elastic, lvm, etc).

And, of course, we will be happy to know the community’s opinion. What other development opportunities do you see? What options would you add?

You can read the documentation and learn more about nxs-backup on its website, there is also a troubleshooting section on our website if you want to leave any issues.

We already made a poll in our Telegram channel about upcoming features. Follow us to participate in such activities and contribute to the development of the tool!

See you next time!

版本声明 本文转载于:https://dev.to/nixys/maintaining-an-open-source-backup-tool-insights-and-more-1n1e?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 如何扩展 JavaScript 中的内置错误对象?
    如何扩展 JavaScript 中的内置错误对象?
    扩展 JavaScript 中的 Error要扩展 JavaScript 中的内置 Error 对象,您可以使用 extends 关键字定义 Error 的子类。这允许您使用附加属性或方法创建自定义错误。在 ES6 中,您可以定义自定义错误类,如下所示:class MyError extends E...
    编程 发布于2024-11-03
  • 将测试集中在域上。 PHPUnit 示例
    将测试集中在域上。 PHPUnit 示例
    介绍 很多时候,开发人员尝试测试 100%(或几乎 100%)的代码。显然,这是每个团队应该为他们的项目达到的目标,但从我的角度来看,只应该完全测试整个代码的一部分:您的域。 域基本上是代码中定义项目实际功能的部分。例如,当您将实体持久保存到数据库时,您的域不负责将其持久保存在数据...
    编程 发布于2024-11-03
  • 如何使用 SQL 搜索列中的多个值?
    如何使用 SQL 搜索列中的多个值?
    使用 SQL 在列中搜索多个值构建搜索机制时,通常需要在同一列中搜索多个值场地。例如,假设您有一个搜索字符串,例如“Sony TV with FullHD support”,并且想要使用该字符串查询数据库,将其分解为单个单词。通过利用 IN 或 LIKE 运算符,您可以实现此功能。使用 IN 运算符...
    编程 发布于2024-11-03
  • 如何安全地从 Windows 注册表读取值:分步指南
    如何安全地从 Windows 注册表读取值:分步指南
    如何安全地从 Windows 注册表读取值检测注册表项是否存在确定注册表项是否存在:LONG lRes = RegOpenKeyExW(HKEY_LOCAL_MACHINE, L"SOFTWARE\\Perl", 0, KEY_READ, &hKey); if (lRes...
    编程 发布于2024-11-03
  • Staat源码中的useBoundStoreWithEqualityFn有解释。
    Staat源码中的useBoundStoreWithEqualityFn有解释。
    在这篇文章中,我们将了解Zustand源码中useBoundStoreWithEqualityFn函数是如何使用的。 上述代码摘自https://github.com/pmndrs/zustand/blob/main/src/traditional.ts#L80 useBoundStoreWithE...
    编程 发布于2024-11-03
  • 如何使用 Go 安全地连接 SQL 查询中的字符串?
    如何使用 Go 安全地连接 SQL 查询中的字符串?
    在 Go 中的 SQL 查询中连接字符串虽然文本 SQL 查询提供了一种简单的数据库查询方法,但了解将字符串文字与值连接的正确方法至关重要以避免语法错误和类型不匹配。提供的查询语法:query := `SELECT column_name FROM table_name WHERE ...
    编程 发布于2024-11-03
  • 如何在 Python 中以编程方式从 Windows 剪贴板检索文本?
    如何在 Python 中以编程方式从 Windows 剪贴板检索文本?
    以编程方式访问 Windows 剪贴板以在 Python 中进行文本检索Windows 剪贴板充当数据的临时存储,从而实现跨应用程序的无缝数据共享。本文探讨如何使用 Python 从 Windows 剪贴板检索文本数据。使用 win32clipboard 模块要从 Python 访问剪贴板,我们可以...
    编程 发布于2024-11-03
  • 使用 MySQL 存储过程时如何访问 PHP 中的 OUT 参数?
    使用 MySQL 存储过程时如何访问 PHP 中的 OUT 参数?
    使用 MySQL 存储过程访问 PHP 中的 OUT 参数使用 PHP 在 MySQL 中处理存储过程时,获取由于文档有限,“OUT”参数可能是一个挑战。然而,这个过程可以通过利用 mysqli PHP API 来实现。使用 mysqli考虑一个名为“myproc”的存储过程,带有一个 IN 参数(...
    编程 发布于2024-11-03
  • 在 Kotlin 中处理 null + null:会发生什么?
    在 Kotlin 中处理 null + null:会发生什么?
    在 Kotlin 中处理 null null:会发生什么? 在 Kotlin 中进行开发时,您一定会遇到涉及 null 值的场景。 Kotlin 的 null 安全方法众所周知,但是当您尝试添加 null null 时会发生什么?让我们来探讨一下这个看似简单却发人深省的情况! ...
    编程 发布于2024-11-03
  • Python 字符串文字中“r”前缀的含义是什么?
    Python 字符串文字中“r”前缀的含义是什么?
    揭示“r”前缀在字符串文字中的作用在Python中创建字符串文字时,你可能遇到过神秘的“r” ” 前缀。此前缀具有特定的含义,可能会影响字符串的解释,尤其是在处理正则表达式时。“r”前缀表示该字符串应被视为“原始”字符串。这意味着Python将忽略字符串中的所有转义序列,从而允许您按字面意思表示字符...
    编程 发布于2024-11-03
  • 如何解决旧版 Google Chrome 的 Selenium Python 中的“无法找到 Chrome 二进制文件”错误?
    如何解决旧版 Google Chrome 的 Selenium Python 中的“无法找到 Chrome 二进制文件”错误?
    在旧版 Google Chrome 中无法使用 Selenium Python 查找 Chrome 二进制错误在旧版 Google Chrome 中使用 Python 中的 Selenium 时,您可能会遇到以下错误:WebDriverException: unknown error: cannot...
    编程 发布于2024-11-03
  • `.git-blame-ignore-revs` 忽略批量格式更改。
    `.git-blame-ignore-revs` 忽略批量格式更改。
    .git-blame-ignore-revs 是 2.23 版本中引入的一项 Git 功能,允许您忽略 git Blame 结果中的特定提交。这对于在不改变代码实际功能的情况下更改大量行的批量提交特别有用,例如格式更改、重命名或在代码库中应用编码标准。通过忽略这些非功能性更改,gitblame 可以...
    编程 发布于2024-11-03
  • 掌握函数参数:JavaScript 中的少即是多
    掌握函数参数:JavaScript 中的少即是多
    嘿,开发者们! ?今天,让我们深入探讨编写干净、可维护的 JavaScript 的一个关键方面:管理函数参数 太多参数的问题 你遇到过这样的函数吗? function createMenu(title, body, buttonText, cancellable, theme, fo...
    编程 发布于2024-11-03
  • 如何使用 FastAPI WebSockets 维护 Jinja2 模板中的实时评论列表?
    如何使用 FastAPI WebSockets 维护 Jinja2 模板中的实时评论列表?
    使用 FastAPI WebSockets 更新 Jinja2 模板中的项目列表在评论系统中,维护最新的评论列表至关重要提供无缝的用户体验。当添加新评论时,它应该反映在模板中,而不需要手动重新加载。在Jinja2中,更新评论列表通常是通过API调用来实现的。然而,这种方法可能会引入延迟并损害用户界面...
    编程 发布于2024-11-03
  • 掌握 SQL 查询:&#教师薪资格式查询&# 项目
    掌握 SQL 查询:&#教师薪资格式查询&# 项目
    您是否希望提高 SQL 技能并学习如何有效管理 MySQL 数据库? LabEx 提供的教师薪资格式查询项目就是您的最佳选择。这个综合项目将指导您完成在大学数据库中查询和格式化教职员工工资的过程,为您提供必要的知识和技能,以在数据管理工作中脱颖而出。 介绍 在这个引人入胜的项目中,您...
    编程 发布于2024-11-03

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3