Python を使用して 2 つのディレクトリ間でファイルを同期する

表紙 > プログラミング > Python を使用して 2 つのディレクトリ間でファイルを同期する

Python を使用して 2 つのディレクトリ間でファイルを同期する

2024 年 8 月 16 日に公開

ブラウズ：704

Synchronizing Files Between Two Directories Using Python

Synchronizing files between directories is a common task for managing backups, ensuring consistency across multiple storage locations, or simply keeping data organized.

While there are many tools available to do this, creating a Python script to handle directory synchronization offers flexibility and control.

This guide will walk you through a Python script designed to synchronize files between two directories.

Introduction to the Script

The script begins by importing several essential Python libraries.

These include os for interacting with the operating system, shutilfor high-level file operations, filecmpfor comparing files, argparsefor parsing command-line arguments, and tqdmfor displaying progress bars during lengthy operations.

These libraries work together to create a robust solution for directory synchronization.

import os
import shutil
import filecmp
import argparse
from tqdm import tqdm

The scripts uses mainly Python built-in modules, but for the progress bar is uses the tqdmlibrary, which needs to the installed with:

pip install tqdm

Checking and Preparing Directories

Before starting the synchronization, the script needs to check if the source directory exists.

If the destination directory doesn't exist, the script will create it.

This step is important to make sure the synchronization process can run smoothly without any issues caused by missing directories.

# Function to check if the source and destination directories exist
def check_directories(src_dir, dst_dir):
    # Check if the source directory exists
    if not os.path.exists(src_dir):
        print(f"\nSource directory '{src_dir}' does not exist.")
        return False
    # Create the destination directory if it does not exist
    if not os.path.exists(dst_dir):
        os.makedirs(dst_dir)
        print(f"\nDestination directory '{dst_dir}' created.")
    return True

The check_directories function makes sure that both the source and destination directories are ready for synchronization. Here's how it works:

The function uses os.path.exists() to check if the directories exist.
If the source directory is missing, the script tells the user and stops running.
If the destination directory is missing, the script creates it automatically using os.makedirs(). This ensures that the necessary directory structure is in place.

Synchronizing Files Between Directories

The main job of the script is to synchronize files between the source and destination directories.

The sync_directories function handles this task by first going through the source directory to gather a list of all files and subdirectories.

The os.walk function helps by generating file names in the directory tree, allowing the script to capture every file and folder within the source directory.

# Function to synchronize files between two directories
def sync_directories(src_dir, dst_dir, delete=False):
    # Get a list of all files and directories in the source directory
    files_to_sync = []
    for root, dirs, files in os.walk(src_dir):
        for directory in dirs:
            files_to_sync.append(os.path.join(root, directory))
        for file in files:
            files_to_sync.append(os.path.join(root, file))

    # Iterate over each file in the source directory with a progress bar
    with tqdm(total=len(files_to_sync), desc="Syncing files", unit="file") as pbar:
        # Iterate over each file in the source directory
        for source_path in files_to_sync:
            # Get the corresponding path in the replica directory
            replica_path = os.path.join(dst_dir, os.path.relpath(source_path, src_dir))

            # Check if path is a directory and create it in the replica directory if it does not exist
            if os.path.isdir(source_path):
                if not os.path.exists(replica_path):
                    os.makedirs(replica_path)
            # Copy all files from the source directory to the replica directory
            else:
                # Check if the file exists in the replica directory and if it is different from the source file
                if not os.path.exists(replica_path) or not filecmp.cmp(source_path, replica_path, shallow=False):
                    # Set the description of the progress bar and print the file being copied
                    pbar.set_description(f"Processing '{source_path}'")
                    print(f"\nCopying {source_path} to {replica_path}")

                    # Copy the file from the source directory to the replica directory
                    shutil.copy2(source_path, replica_path)

            # Update the progress bar
            pbar.update(1)

Once the list of files and directories is compiled, the script uses a progress bar provided by tqdm to give the user feedback on the synchronization process.

For each file and directory in the source, the script calculates the corresponding path in the destination.

If the path is a directory, the script ensures it exists in the destination.

If the path is a file, the script checks whether the file already exists in the destination and whether it is identical to the source file.

If the file is missing or different, the script copies it to the destination.

This way, the script keeps the destination directory up-to-date with the source directory.

Cleaning Up Extra Files

The script also has an optional feature to delete files in the destination directory that are not in the source directory.

This is controlled by a --delete flag that the user can set.

If this flag is used, the script goes through the destination directory and compares each file and folder to the source.

If it finds anything in the destination that isn't in the source, the script deletes it.

This ensures that the destination directory is an exact copy of the source directory.

# Clean up files in the destination directory that are not in the source directory, if delete flag is set
    if delete:
        # Get a list of all files in the destination directory
        files_to_delete = []
        for root, dirs, files in os.walk(dst_dir):
            for directory in dirs:
                files_to_delete.append(os.path.join(root, directory))
            for file in files:
                files_to_delete.append(os.path.join(root, file))

        # Iterate over each file in the destination directory with a progress bar
        with tqdm(total=len(files_to_delete), desc="Deleting files", unit="file") as pbar:
            # Iterate over each file in the destination directory
            for replica_path in files_to_delete:
                # Check if the file exists in the source directory
                source_path = os.path.join(src_dir, os.path.relpath(replica_path, dst_dir))
                if not os.path.exists(source_path):
                    # Set the description of the progress bar
                    pbar.set_description(f"Processing '{replica_path}'")
                    print(f"\nDeleting {replica_path}")

                    # Check if the path is a directory and remove it
                    if os.path.isdir(replica_path):
                        shutil.rmtree(replica_path)
                    else:
                        # Remove the file from the destination directory
                        os.remove(replica_path)

                # Update the progress bar
                pbar.update(1)

This part of the script uses similar techniques as the synchronization process.

It uses os.walk() to gather files and directories and tqdm to show progress.
The shutil.rmtree() function is used to remove directories, while os.remove() handles individual files.

Running the Script

The script is designed to be run from the command line, with arguments specifying the source and destination directories.

The argparse module makes it easy to handle these arguments, allowing users to simply provide the necessary paths and options when running the script.

# Main function to parse command line arguments and synchronize directories
if __name__ == "__main__":
    # Parse command line arguments
    parser = argparse.ArgumentParser(description="Synchronize files between two directories.")
    parser.add_argument("source_directory", help="The source directory to synchronize from.")
    parser.add_argument("destination_directory", help="The destination directory to synchronize to.")
    parser.add_argument("-d", "--delete", action="store_true",
                        help="Delete files in destination that are not in source.")
    args = parser.parse_args()

    # If the delete flag is set, print a warning message
    if args.delete:
        print("\nExtraneous files in the destination will be deleted.")

    # Check the source and destination directories
    if not check_directories(args.source_directory, args.destination_directory):
        exit(1)

    # Synchronize the directories
    sync_directories(args.source_directory, args.destination_directory, args.delete)
    print("\nSynchronization complete.")

The main function brings everything together.

It processes the command-line arguments, checks the directories, and then performs the synchronization.

If the --delete flag is set, it also handles the cleanup of extra files.

Examples

Let's see some examples of how to run the script with the different options.

Source to Destination

python file_sync.py d:\sync d:\sync_copy

Destination directory 'd:\sync2' created. Processing 'd:\sync\video.mp4': 0%| | 0/5 [00:00, ?file/s] Copying d:\sync\video.mp4 to d:\sync2\video.mp4 Processing 'd:\sync\video_final.mp4': 20%|██████████████████▌ | 1/5 [00:00, ?file/s] Copying d:\sync\video_final.mp4 to d:\sync2\video_final.mp4 Processing 'd:\sync\video_single - Copy (2).mp4': 40%|████████████████████████████████▍ | 2/5 [00:00, ?file/s] Copying d:\sync\video_single - Copy (2).mp4 to d:\sync2\video_single - Copy (2).mp4 Processing 'd:\sync\video_single - Copy.mp4': 60%|█████████████████████████████████████████████▌ | 3/5 [00:00

Source to Destination with Cleanup of Extra Files

python file_sync.py d:\sync d:\sync_copy -d

Extraneous files in the destination will be deleted. Syncing files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00

Conclusion

This Python script offers a powerful and flexible way to synchronize files between two directories.

It uses key libraries like os, shutil, and filecmp, and enhances the user experience with tqdm for tracking progress.

This ensures that your data is consistently and efficiently synchronized.
Whether you're maintaining backups or ensuring consistency across storage locations, this script can be a valuable tool in your toolkit.

リリースステートメントこの記事は次の場所に転載されています: https://dev.to/devasservice/synchronizing-files-between-two-directories-using-python-19li?1 侵害がある場合は、[email protected] に連絡して削除してください。

最新のチュートリアルもっと>

ゼロから Web 開発者へ: PHP の基礎をマスターする
PHP の基本をマスターすることが不可欠です。 PHP をインストールする PHP ファイルを作成するコードを実行する変数とデータ型を理解する式と演算子を使用する実際のプロジェクトを作成してスキルを向上させる PHP 開発の入門: PHP の基本をマスターするPHP は、動的でインタラク...

プログラミング 2024 年 11 月 5 日に公開
バッファ: Node.js
Node.js のバッファーの簡単なガイド Node.js の A Buffer は、生のバイナリデータを処理するために使用されます。これは、ストリーム、ファイル、またはネットワークデータを操作するときに役立ちます。バッファの作成方法文字列から: co...

プログラミング 2024 年 11 月 5 日に公開
Node.js でのバージョン管理をマスターする
開発者として、私たちは異なる Node.js バージョンを必要とするプロジェクトに頻繁に遭遇します。このシナリオは、Node.js プロジェクトに定期的に関与していない新人開発者と経験豊富な開発者の両方にとって落とし穴です。各プロジェクトに正しい Node.js バージョンが使用されていることを確認...

プログラミング 2024 年 11 月 5 日に公開
トラブルシューティングのために Go バイナリに Git リビジョン情報を埋め込む方法
Go バイナリでの Git リビジョンの決定コードをデプロイするとき、バイナリをビルド元の Git リビジョンに関連付けると便利です。トラブルシューティングの目的。ただし、リビジョン番号を使用してソースコードを直接更新することは、ソースが変更されるため現実的ではありません。解決策: ビルドフラグ...

プログラミング 2024 年 11 月 5 日に公開
一般的な HTML タグ: 視点
HTML (HyperText Markup Language) は Web 開発の基礎を形成し、インターネット上のすべての Web ページの構造として機能します。 2024 年には、最も一般的な HTML タグとその高度な使用法を理解することで、開発者はより効率的でアクセスしやすく、視覚的に魅力的...

プログラミング 2024 年 11 月 5 日に公開
CSSメディアクエリ
Web サイトがさまざまなデバイス間でシームレスに機能することを保証することが、これまで以上に重要になっています。ユーザーがデスクトップ、ラップトップ、タブレット、スマートフォンから Web サイトにアクセスするようになったため、レスポンシブデザインが必須となっています。レスポンシブデザインの中...

プログラミング 2024 年 11 月 5 日に公開
JavaScript でのホイスティングを理解する: 包括的なガイド
JavaScript でのホイスティングホイストは、変数と関数の宣言が、含まれるスコープ (グローバルスコープまたは関数スコープ) の先頭に移動 (または「ホイスト」) される動作です。コードが実行されます。これは、コード内で実際に宣言される前に変数や関数を使用できることを意味...

プログラミング 2024 年 11 月 5 日に公開
Stripe を単一製品の Django Python ショップに統合する
In the first part of this series, we created a Django online shop with htmx. In this second part, we'll handle orders using Stripe. What We'll...

プログラミング 2024 年 11 月 5 日に公開
Laravel でキューに入れられたジョブをテストするためのヒント
Laravel アプリケーションを使用する場合、コマンドが負荷の高いタスクを実行する必要があるシナリオに遭遇するのが一般的です。メインプロセスのブロックを避けるために、キューで処理できるジョブにタスクをオフロードすることを決定することもできます。例を見てみましょう。コマンド app:import-...

プログラミング 2024 年 11 月 5 日に公開
人間レベルの自然言語理解 (NLU) システムを作成する方法
Scope: Creating an NLU system that fully understands and processes human languages in a wide range of contexts, from conversations to literature. ...

プログラミング 2024 年 11 月 5 日に公開
JSTL を使用して HashMap 内で ArrayList を反復するにはどうすればよいですか?
JSTL を使用した HashMap 内の ArrayList の反復Web 開発では、JSTL (JavaServer Pages Standard Tag Library) は、JSP での一般的なタスクを簡素化するためのタグのセットを提供します ( Javaサーバーページ)。そのようなタスクの...

プログラミング 2024 年 11 月 5 日に公開
Encore.ts — ElysiaJS や Hono よりも高速
数か月前、私たちは TypeScript 用のオープンソースバックエンドフレームワークである Encore.ts をリリースしました。すでに多くのフレームワークが存在するため、私たちが行った珍しい設計上の決定のいくつかと、それがどのようにして驚くべきパフォーマンス数値につながるのかを共有したい...

プログラミング 2024 年 11 月 5 日に公開
+ を使用した文字列連結が文字列リテラルで失敗するのはなぜですか?
文字列リテラルと文字列の連結C では、演算子を使用して文字列と文字列リテラルを連結できます。ただし、この機能には混乱を招く可能性のある制限があります。質問の中で、作成者は文字列リテラル「Hello」、「,world」、および「!」を連結しようとしています。 2つの異なる方法で。最初の例:const ...

プログラミング 2024 年 11 月 5 日に公開
React の再レンダリング: 最適なパフォーマンスのためのベストプラクティス
React の効率的なレンダリングメカニズムは、その人気の主な理由の 1 つです。ただし、アプリケーションが複雑になるにつれて、コンポーネントの再レンダリングの管理がパフォーマンスを最適化するために重要になります。 React のレンダリング動作を最適化し、不必要な再レンダリングを回避するためのベ...

プログラミング 2024 年 11 月 5 日に公開
条件付き列の作成を実現する方法: Pandas DataFrame で If-Elif-Else を探索する?
条件付き列の作成: Pandas の If-Elif-Else指定された問題では、新しい列を DataFrame に追加することが求められます一連の条件付き基準に基づいて決定されます。課題は、コードの効率性と可読性を維持しながらこれらの条件を実装することにあります。関数アプリケーションを使用したソリ...

プログラミング 2024 年 11 月 5 日に公開