Process
Before doing a difficult task, it helps to have a process. Processes allow you to reduce the cognitive load of a task by outsourcing part of the "figuring out" to the process.
Processes can—also—reduce the chance that you will miss something obvious, or waste time trying to do something in a way that is based on your current mood, feelings, energy level, etc.
In this article, I am going to walk through a simple process for figuring out how to do something in Python. I will share the steps that I like to follow and the resources I like to use.
The task? Merge and split PDFs.
This is pretty easy, and straightforward, in Python. However, I have never done it. There are more difficult tasks that I could have chosen. But, the first step when choosing to do something in Python is effort consideration and research.
Research
Before I decide to take on a task with Python, I need to ask myself one of (if not all):
- How much effort is this going to take?
- Is this something that is within the bounds of my skill level?
- How much time is this going to take?
- Will the time/effort equal the reward?
An example of a task that is beyond my skill level, would take more time than I have, and has no immediate benefit for me, is using the OpenCV library to try to identify a car.
An example of a task below the bounds of my skill level, that satisfies the effort/reward ratio is merging and splitting PDFs in a blog post.
Next, I can use a search engine to see if this has been done successfully by other programmers. Using DuckDuckGo I searched, python pdf merge. The results are promising.
When I do a search about programming I am hoping to find something on Stack Overflow (SO). I am in luck because the first hit is a post on SO.
After clicking on the post (https://stackoverflow.com/questions/3444645/merge-pdf-files), I found:
- It's a task people have been asking about for many years
- It can be done without a lot of code
- The best library for the task is PyPDF2
If this were not the case, I would have a few more options:
- Look for other blogs articles
- Try a different search
- Reconsider what I am trying to do
If I don't find something tangible quick, I may need to reconsider what, or how, I am doing something. Typically, I think, if other people aren't doing [this thing I am trying to do] then maybe I should not be doing it either.
Fortunately, that is not the case, and we can move on with setting up a playground.
Set-Up
I know that I can use the PyPDF2 library to merge PDFs. Next, I need to set up a small playground environment where I can get to know the library. This step requires getting the proper data, files, URLs, libraries, etc.
I need to do two things:
- Acquire two PDFs
- Install PyPDF2
In a terminal, I installed the library.
pip3 install pypdf2
In my desktop folder, I have two PDFs: example_file
and example_file2
.
Now, I can open up Pythons' IDLE application and try to merge the PDFs.
Merging PDF Files with Python
It's important that I have an uncomplicated environment, and that I am using small files. When I am doing something with Python that is new to me, I want to make the initial task trivial.
My goal may be to build a cloud function that parses PDFs and conditionally merges or splits them for an application. Or, it can be a contrived example in a blog post. Either way, I want to be successful with the library as soon as possible.
This means I need to do something small.
What's nice about this task is that there is example code online that I can follow. There are many different ways that people have done this on SO, but they all seem to import the same module. In the Python shell, I am going to import that module.
>>> from PyPDF2 import PdfFileMerger
Documentation
I used Stack Overflow and got a pretty good start.
I saw that people were using the PyPdf2 library, and that there was a module in the library named PdfFileMerger
. Before getting too much into copying and running code, I want to investigate this class further.
Next, I searched the documentation online and found information on how to use the PdfFileMerger class.
In the documentation, I see a method named merge
.
This is an ideal progression:
- Find some starting code on Stack Overflow
- Quickly locate useful documentation
So, I believe I can create a merger object and then use the merge()
method.
Merging
First, I can create the merger object.
>>> file_merger = PdfFileMerger()
Then, I can call the merge()
function and pass in the necessary arguments for the two files.
>>> file_merger.merge(0, "./example_file.pdf")
>>> file_merger.merge(1, "./example_file2.pdf")
The data is in the
file_merge
r
object.
Finally, we can write that data to a new file.
>>> file_merger.write("./merged_example_files.pdf")
With this code, I am able to create a new file that merges the two PDFs!
Conclusion
I was able to accomplish this new task with only a few lines of careful code. I combined informal information on Stack Overflow with formal documentation on the library website to conserve time and limit frustration.
I now feel comfortable using two new methods and a new library to merge PDFs with Python!