Easily Split and Rename PDFs for Skyward

Front page > Programming > Easily Split and Rename PDFs for Skyward

Easily Split and Rename PDFs for Skyward

Published on 2024-07-31

Browse:769

Easily Split and Rename PDFs for Skyward

Why build it and what does it do

A few weeks ago my supervisor gave me a challenge to see if I could come up with a workflow for a particular problem we were having. We wanted to get Pre/ACT letters into our SMS(Student Management System), which in our case was Skyward. The problem we ran into is that Pre/ACT letters are in either a bulk PDF or per individual PDF, and to get into Skyward we would need to have a PDF for each student's name as their ID number. To accomplish this I decided to write a program in Python, using Streamlit for the UI.

Let's look at the problems we need to address, starting with the PDF. It made more sense just to grab the bulk single PDF export of the letters, this meant we needed to split up the bulk export into individual PDFs. While each letter is typically 2 pages that isn't always the case, so a simple break every other page is likely to be error-prone.

The second issue was reading each student's PDF and renaming it to the corresponding ID Number. This mostly hinged on a Regex pattern that pulled what I needed.

Since this was also a time challenge I worked with AI to help generate the code. NOTE: This is not a replacement for knowing the logic and language you are using. When writing this with AI/LLM I used the chain-of-thought approach, giving bite-sized chunks of what I wanted, and then debugging and testing each chunk before adding more. The code below is the final code that was used, I'll break each down section by section. If you're looking to implement this as a solution at your district see the TLDR are the end of this post.

Requirements and Imports

This part is fairly straightforward and is the foundation the program runs on.

Streamlit for our UI
pypdf2, pymupdf, and fitz for PDF manipulation

Content of requirements.txt

streamlit
pypdf2
fitz
pymupdf

The app.py imports

import PyPDF2
import fitz  # PyMuPDF
import re
from pathlib import Path
import concurrent.futures
import streamlit as st
import shutil
import zipfile
import os

Finding ID's

This next snippet is dealing with finding the IDs in the bulk PDF and creating a list of pages to be used to split them up, this is the part that hinges on the regex and may need to be changed for your situation.

def find_id_pages(input_pdf):
 doc = fitz.open(input_pdf)
 id_pages = []
 id_pattern = re.compile(r'\(ID#:\s*(\d )\)')

    for i, page in enumerate(doc):
 text = page.get_text()
        if id_pattern.search(text):
 id_pages.append(i)

    return id_pages

Splitting the PDF's

As the title says, this is used to split up the PDFs. This will use a function for extracting the names for each individual PDF. You'll also notice that this splits them in parallel, up to 10 at a time, to improve performance.

def split_pdf(input_pdf, output_folder, progress_callback):
 input_path = Path(input_pdf)
 output_folder = Path(output_folder)
 output_folder.mkdir(parents=True, exist_ok=True)

    # Find pages with IDs
 id_pages = find_id_pages(input_pdf)

    if not id_pages:
 st.error("No ID pages found in the PDF.")
        return

 pdf_reader = PyPDF2.PdfReader(str(input_path))
 total_pages = len(pdf_reader.pages)
 temp_pdfs = []

    for i in range(len(id_pages)):
 start_page = id_pages[i]
 end_page = id_pages[i   1] if i   1 





def extract_and_rename_pdf(pdf_path, output_folder):
 doc = fitz.open(pdf_path)
 text_first_page = doc[0].get_text()

    # Extract ID using a regex pattern for the format (ID#: 01234)
 match_first_page = re.search(r'\(ID#:\s*(\d )\)', text_first_page)

    if match_first_page:
 id_value = match_first_page.group(1)
 new_pdf_path = output_folder / f'{id_value}.pdf'
 pdf_path.rename(new_pdf_path)
    else:
 new_pdf_path = output_folder / f'unknown_{pdf_path.stem}.pdf'
 pdf_path.rename(new_pdf_path)





  
  
  Almost there


Next up are a couple of short functions, one to zip all the split PDFs (in case you want to run this on an internal server), and one to cleanup any temp files so there is no PII student information hanging around where it doesn't need to live.



def zip_output_folder(output_folder, zip_name):
 shutil.make_archive(zip_name, 'zip', output_folder)






def clean_up(output_folder, zip_name):
 shutil.rmtree(output_folder)
 os.remove(f"{zip_name}.zip")





  
  
  Building the UI


The last bit of code is for the UI. Streamlit is a WebUI for versatility(yes you can run it solo). After a few attempts and considering usability. Keeping it simple I distilled it down to an upload button, an action button(ie split), and a download button to get the zipped PDFs.



# Streamlit App Portion
st.title("PDF Splitter and Renamer")

uploaded_file = st.file_uploader("Choose a PDF file", type="pdf")
output_folder = "output_folder"

if st.button("Split and Rename PDF"):
    if uploaded_file and output_folder:
        try:
            # Save uploaded file temporarily
            with open("temp_input.pdf", "wb") as f:
 f.write(uploaded_file.getbuffer())

 progress_bar = st.progress(0)
            def update_progress(progress):
 progress_bar.progress(progress)

 split_pdf("temp_input.pdf", output_folder, update_progress)

 zip_name = "output_pdfs"
 zip_output_folder(output_folder, zip_name)
 st.success("PDF split and renamed successfully!")

            with open(f"{zip_name}.zip", "rb") as f:
 st.download_button(
                    label="Download ZIP",
                    data=f,
                    file_name=f"{zip_name}.zip",
                    mime="application/zip"
 )

            # Remove temporary file
 Path("temp_input.pdf").unlink()
 clean_up(output_folder, zip_name)
        except Exception as e:
 st.error(f"An error occurred: {e}")
    else:
 st.error("Please upload a PDF file and specify an output folder.")





  
  
  TLDR to get up and running


To get things up and running just use the following commands(this assumes Linux, WSL, and MacOS). and you'll be able to reach the app by going to http://localhost:8501.



git clone https://github.com/Blacknight318/act-to-sms.git
cd act-to-sms
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app.py





  
  
  In Closing


If you're in a K12 school I hope you'll find this helpful. If so clap or consider buying me a coffee. Till next time, fair winds and following seas.

Release Statement This article is reproduced at: https://dev.to/blacknight318/easily-split-and-rename-pdfs-for-skyward-17ha?1 If there is any infringement, please contact [email protected] to delete it

Latest tutorial More>

How can I safely concatenate text and values when constructing SQL queries in Go?
Concatenating Text and Values in Go SQL QueriesWhen constructing a text SQL query in Go, there are certain syntax rules to follow when concatenating s...

Programming Posted on 2025-03-26
How to Convert a Pandas DataFrame Column to DateTime Format and Filter by Date?
Transform Pandas DataFrame Column to DateTime FormatScenario:Data within a Pandas DataFrame often exists in various formats, including strings. When w...

Programming Posted on 2025-03-26
How Can I Execute Command Prompt Commands, Including Directory Changes, in Java?
Execute Command Prompt Commands in JavaProblem:Running command prompt commands through Java can be challenging. Although you may find code snippets th...

Programming Posted on 2025-03-26
Why Am I Getting a "Could Not Find an Implementation of the Query Pattern" Error in My Silverlight LINQ Query?
Query Pattern Implementation Absence: Resolving "Could Not Find" ErrorsIn a Silverlight application, an attempt to establish a database conn...

Programming Posted on 2025-03-26
How Can I Customize Compilation Optimizations in the Go Compiler?
Customizing Compilation Optimizations in Go CompilerThe default compilation process in Go follows a specific optimization strategy. However, users may...

Programming Posted on 2025-03-26
How Can I Efficiently Generate URL-Friendly Slugs from Unicode Strings in PHP?
Crafting a Function for Efficient Slug GenerationCreating slugs, simplified representations of Unicode strings used in URLs, can be a challenging task...

Programming Posted on 2025-03-26
How to Simplify JSON Parsing in PHP for Multi-Dimensional Arrays?
Parsing JSON with PHPTrying to parse JSON data in PHP can be challenging, especially when dealing with multi-dimensional arrays. To simplify the proce...

Programming Posted on 2025-03-26
How Can You Define Variables in Laravel Blade Templates Elegantly?
Defining Variables in Laravel Blade Templates with EleganceUnderstanding how to assign variables in Blade templates is crucial for storing data for la...

Programming Posted on 2025-03-26
How Do I Efficiently Select Columns in Pandas DataFrames?
Selecting Columns in Pandas DataframesWhen dealing with data manipulation tasks, selecting specific columns becomes necessary. In Pandas, there are va...

Programming Posted on 2025-03-26
How to Correctly Use LIKE Queries with PDO Parameters?
Using LIKE Queries in PDOWhen trying to implement LIKE queries in PDO, you may encounter issues like the one described in the query below:$query = &qu...

Programming Posted on 2025-03-26
$How to Fix \"mysql_config not found\" Error When Installing MySQL-python on Ubuntu/Linux?$
How to Fix \"mysql_config not found\" Error When Installing MySQL-python on Ubuntu/Linux?
MySQL-python Installation Error: "mysql_config not found"Attempting to install MySQL-python on Ubuntu/Linux Box may encounter an error messa...

Programming Posted on 2025-03-26
How to Parse Numbers in Exponential Notation Using Decimal.Parse()?
Parsing a Number from Exponential NotationWhen attempting to parse a string expressed in exponential notation using Decimal.Parse("1.2345E-02&quo...

Programming Posted on 2025-03-26
Is There a Performance Difference Between Using a For-Each Loop and an Iterator for Collection Traversal in Java?
For Each Loop vs. Iterator: Efficiency in Collection TraversalIntroductionWhen traversing a collection in Java, the choice arises between using a for-...

Programming Posted on 2025-03-26
How to upload files with additional parameters using java.net.URLConnection and multipart/form-data encoding?
Uploading Files with HTTP RequestsTo upload files to an HTTP server while also submitting additional parameters, java.net.URLConnection and multipart/...

Programming Posted on 2025-03-26
How does Android send POST data to PHP server?
Sending POST Data in AndroidIntroductionThis article addresses the need to send POST data to a PHP script and display the result in an Android applica...

Programming Posted on 2025-03-26