Blog | retzdev

Process

Before doing a difficult task, it helps to have a process. Processes allow you to reduce the cognitive load of a task by outsourcing part of the "figuring out" to the process.

Processes can—also—reduce the chance that you will miss something obvious, or waste time trying to do something in a way that is based on your current mood, feelings, energy level, etc.

In this article, I am going to walk through a simple process for figuring out how to do something in Python. I will share the steps that I like to follow and the resources I like to use.

The task? Merge and split PDFs.

This is pretty easy, and straightforward, in Python. However, I have never done it. There are more difficult tasks that I could have chosen. But, the first step when choosing to do something in Python is effort consideration and research.

Research

Before I decide to take on a task with Python, I need to ask myself one of (if not all):

How much effort is this going to take?
Is this something that is within the bounds of my skill level?
How much time is this going to take?
Will the time/effort equal the reward?

An example of a task that is beyond my skill level, would take more time than I have, and has no immediate benefit for me, is using the OpenCV library to try to identify a car.

An example of a task below the bounds of my skill level, that satisfies the effort/reward ratio is merging and splitting PDFs in a blog post.

Next, I can use a search engine to see if this has been done successfully by other programmers. Using DuckDuckGo I searched, python pdf merge. The results are promising.

When I do a search about programming I am hoping to find something on Stack Overflow (SO). I am in luck because the first hit is a post on SO.

After clicking on the post (https://stackoverflow.com/questions/3444645/merge-pdf-files), I found:

It's a task people have been asking about for many years
It can be done without a lot of code
The best library for the task is PyPDF2

If this were not the case, I would have a few more options:

Look for other blogs articles
Try a different search
Reconsider what I am trying to do

If I don't find something tangible quick, I may need to reconsider what, or how, I am doing something. Typically, I think, if other people aren't doing [this thing I am trying to do] then maybe I should not be doing it either.

Fortunately, that is not the case, and we can move on with setting up a playground.

Set-Up

I know that I can use the PyPDF2 library to merge PDFs. Next, I need to set up a small playground environment where I can get to know the library. This step requires getting the proper data, files, URLs, libraries, etc.

I need to do two things:

Acquire two PDFs
Install PyPDF2

In a terminal, I installed the library.

pip3 install pypdf2

In my desktop folder, I have two PDFs: example_file and example_file2.

Now, I can open up Pythons' IDLE application and try to merge the PDFs.

Merging PDF Files with Python

It's important that I have an uncomplicated environment, and that I am using small files. When I am doing something with Python that is new to me, I want to make the initial task trivial.

My goal may be to build a cloud function that parses PDFs and conditionally merges or splits them for an application. Or, it can be a contrived example in a blog post. Either way, I want to be successful with the library as soon as possible.

This means I need to do something small.

What's nice about this task is that there is example code online that I can follow. There are many different ways that people have done this on SO, but they all seem to import the same module. In the Python shell, I am going to import that module.

>>> from PyPDF2 import PdfFileMerger

Documentation

I used Stack Overflow and got a pretty good start.

I saw that people were using the PyPdf2 library, and that there was a module in the library named PdfFileMerger. Before getting too much into copying and running code, I want to investigate this class further.

Next, I searched the documentation online and found information on how to use the PdfFileMerger class.

In the documentation, I see a method named merge.

This is an ideal progression:

Find some starting code on Stack Overflow
Quickly locate useful documentation

So, I believe I can create a merger object and then use the merge() method.

Merging

First, I can create the merger object.

>>> file_merger = PdfFileMerger()

Then, I can call the merge() function and pass in the necessary arguments for the two files.

>>> file_merger.merge(0, "./example_file.pdf") >>> file_merger.merge(1, "./example_file2.pdf")

The data is in the file_merger object.

Finally, we can write that data to a new file.

>>> file_merger.write("./merged_example_files.pdf")

With this code, I am able to create a new file that merges the two PDFs!

Conclusion

I was able to accomplish this new task with only a few lines of careful code. I combined informal information on Stack Overflow with formal documentation on the library website to conserve time and limit frustration.

I now feel comfortable using two new methods and a new library to merge PDFs with Python!

Learning How to Do Things with Python