Python for PDF

Extract Metadata from PDF using Python

27/06/2022

In this tutorial we will explore how to extract metadata from PDF using Python.

Table of Contents

Introduction

Sample PDF
Extract metadata from PDF using Python
Conclusion

Introduction

PDF metadata consists of information about the PDF document, which includes title, author, creation date, and so on. All of these are searchable fields of each PDF document and can be retrieved.

To continue following this tutorial we will need the following Python library: pikepdf.

If you don’t have it installed, please open “Command Prompt” (on Windows) and install it using the following code:


pip install pikepdf

Sample PDF

In order to continue in this tutorial we will need some PDF file to work with.

Let’s reuse one of the PDF we created in one of our previous tutorials:

webpage Download