In this tutorial we will explore how to download PDF from URL using Python.
Table of Contents
Introduction
A lot of product manuals, instructions, books, and other files with lots of text are mainly available online in PDF format.
Downloading several files manually can be a very time consuming task, so in this tutorial we will focus on the automation of this process.
To continue following this tutorial we will need the following Python library: requests.
Requests is a simple Python library that allows you to send HTTP requests.
If you don’t have it installed, please open “Command Prompt” (on Windows) and install it using the following code:
pip install requests
Download PDF from URL using Python
In this section we will learn how to download an image from URL using Python.
Here, we will assume you have the URL of the specific PDF file (and not just a webpage).
As the first step, we will import the required dependency and define a function we will use to download images, which will have 3 inputs:
- url – URL of the specific image
- file_name – name for the saved image
- headers – the dictionary of HTTP Headers that will be sent with the request
import requests
def download_pdf(url, file_name, headers):
Now we can send a GET request to the URL along with the headers, which will return a Response (a server’s response to an HTTP request):
import requests
def download_pdf(url, file_name, headers):
#Send GET request
response = requests.get(url, headers=headers)
If the HTTP request has been successfully completed, we should receive Response code 200 (you can learn more about response codes here).
We are going to check if the response code is 200, and if it is, then we will save the image (which is the content of the request), otherwise we will print out the response code:
import requests
def download_pdf(url, file_name, headers):
# Send GET request
response = requests.get(url, headers=headers)
# Save the PDF
if response.status_code == 200:
with open(file_name, "wb") as f:
f.write(response.content)
else:
print(response.status_code)
The function to download a PDF from URL is ready and now we just need to define the url, file_name, and headers, and then run the code.
For example, in one of the previous tutorials, we used some sample PDF file, and you can it here.
The URL looks like this:
https://pyshark.com/wp-content/uploads/2022/05/merged_all_pages.pdf
You can see that it has the .pdf extension, meaning that this is a URL to a specific PDF file.
We will save this image as ‘file1.pdf’.
For the headers we are only using the User-Agent request header which lets the servers identify the application of the requesting user agent (a computer program representing a person, like a browser or an app accessing the Webpage).
import requests
def download_pdf(url, file_name, headers):
# Send GET request
response = requests.get(url, headers=headers)
# Save the PDF
if response.status_code == 200:
with open(file_name, "wb") as f:
f.write(response.content)
else:
print(response.status_code)
if __name__ == "__main__":
# Define HTTP Headers
headers = {
"User-Agent": "Chrome/51.0.2704.103",
}
# Define URL of a PDF
url = "https://pyshark.com/wp-content/uploads/2022/05/merged_all_pages.pdf"
# Define PDF file name
file_name = "file1.pdf"
# Download PDF
download_pdf(url, file_name, headers)
Run the code and you should see file1.png created in the same directory as the main.py file with the code:
Conclusion
In this article we explored how to download PDF from URL using Python.
Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Python Programming tutorials.