In this tutorial we will explore HTML files, how Python interacts with HTML files, and everything you need to know about them as a Python developer.

Table of Contents


Introduction

HTML, or HyperText Markup Language, is the standard language used to create and design documents on the World Wide Web. HTML documents are the building blocks of a website which define its structure and content.

HTML files consist of a series of elements which tell the browser how to display the content. Each element is denoted by tags which label pieces of content such as “heading,” “paragraph,” “table,” and so on, making up the structure of a webpage.


Why HTML?

In web development, HTML serves as the foundation upon which websites are built. Understanding HTML is essential for Python developers for web development projects as well as web scraping, automated testing, and others, as many Python frameworks and libraries interact directly with HTML content.


How HTML works with Python?

Python is its own language and it doesn’t natively understand HTML code and HTML files. However it can interact with HTML files through different Python libraries and frameworks.

Libraries like BeautifulSoup and lxml can help you parse HTML files which allows Python to understand the file structure and content. This enables Python developers to programmatically interact with HTML files and perform tasks like web scraping, automated testing of web applications, and even creating and manipulating HTML files directly.

Python’s web frameworks, such as Flask and Django, use HTML templates to dynamically generate web pages.


Environment setup

To continue following this tutorial we will need the following Python libraries: beautifulsoup4, requests, and lxml.

If you don’t have them installed, please open “Command Prompt” (on Windows) and install it using the following code:


pip install beautifulsoup4
pip install requests
pip install lxml

Basic HTML concepts

In this section we will cover the fundamental concepts of HTML and how to structure HTML files, and the common HTML tags and their uses.

Structure of HTML file

To get started, let’s look at a sample HTML file and discuss each part:

<!DOCTYPE html>
<html>
<head>
    <title>This is a Title</title>
</head>
<body>
    <h1>This is a Heading</h1>
    <p>This is a paragraph.</p>
</body>
</html>

What do we see in the above code?

  • Doctype – it appears at the beginning of HTML file and tells the browser which version of HTML is used in this file. Since our file is written in HTML5, we see <!DOCTYPE html>
  • <html> tag – it appears at the beginning of the file <html> (before you write any code) and at the end of the file </html> (when you finished writing the code). It wraps all the content except the Doctype.
  • <head> tag – the section that includes all the meta information about the HTML file, including: title, character encoding, style, and others. In our case, we have included a title using the <title> tag.
  • <body> tag – the section that includes all the content of the HTML file that will appear on the webpage, including: text, lists, images, links, and others.

When rendered in the browser, the above file will have the following content:

This is a Title

This is a Heading

This is a paragraph.


HTML tags and their uses

After we observed and understood the basic structure of HTML files, let’s discuss some of the common HTML tags and their uses:


Heading tag (<h1> to <h6>)

What: Heading tags are used to define headings of different levels on a web page. <h1> denotes the most important heading, while <h6> denotes the least important.

When to use: Use heading tags for structuring your content hierarchically. It’s good for SEO and accessibility as it outlines the content, making it easier for search engines and screen readers to navigate the page.

HTML code:

<!DOCTYPE html>
<html>
<body>
<h1>This is a Level 1 Heading: Most Important</h1>
<h2>This is a Level 2 Heading</h2>
<h3>This is a Level 3 Heading</h3>
<h4>This is a Level 4 Heading</h4>
<h5>This is a Level 5 Heading</h5>
<h6>This is a Level 6 Heading: Least Important</h6>
</body>
</html>

Rendered file:

This is a Level 1 Heading: Most Important

This is a Level 2 Heading

This is a Level 3 Heading

This is a Level 4 Heading

This is a Level 5 Heading
This is a Level 6 Heading: Least Important

Paragraph tag (<p>)

What: The paragraph tag <p> is used to define a block of text as a paragraph.

When to use: Paragraph tags are block-level elements and automatically start on a new line. Text within a paragraph is styled in a way that makes it easy to read as a block of text.

HTML code:

<!DOCTYPE html>
<html>
<body>
<p>This is a paragraph of text. All the text inside this tag will be considered a single paragraph.</p>
</body>
</html>

Rendered file:

This is a paragraph of text. All the text inside this tag will be considered a single paragraph.


Anchor tag (<a>)

What: The anchor tag <a> is used to create hyperlinks to other web pages or locations within the same page.

When to use: Anchor tags are often used with the href attribute which holds the URL of the link. It can also be used with other attributes like target="_blank" to open the link in a new tab. You can also link to sections within the same page using # followed by the id of the element you wish to navigate to on the page.

HTML code:

<!DOCTYPE html>
<html>
<body>
<a href="https://www.pyshark.com">Visit PyShark.com</a>
<a href="https://www.pyshark.com" target="_blank">Visit PyShark in a New Tab.com</a>
</body>
</html>

Rendered file:

Visit PyShark
Visit PyShark in a New Tab

Note that I used the line break tag <br> between the two anchor tags because I want each link to appear on a new line.


Image tag (<img>)

What: The image tag <img> is used to embed images into an HTML document.

When to use: Image tags are used to display images or alternative text. Image tags are often used with the src attribute which holds the path to the image file, and alt attribute which provides alternative text for the image if it can’t be displayed.

HTML code:

<!DOCTYPE html>
<html>
<body>
<img src="image.jpg" alt="Description of image">
</body>
</html>

Rendered file:

Description of image

Note that since we don’t have any image in the project directory, the alternative text will be displayed.


List tags (<ul>, <ol>, <li>)

What: List tags are used to create lists of items.

When to use: List tag <ul> creates an unordered list with bullet points, and list tag <ol> creates an ordered list with numbers or letters. List tag <li> must be used for each item in the list.

HTML code:

<!DOCTYPE html>
<html>
<body>
<ul>
    <li>Item A</li>
    <li>Item B</li>
    <li>Item C</li>
</ul>
<br>
<ol>
    <li>Item A</li>
    <li>Item B</li>
    <li>Item C</li>
</ol>
</body>
</html>

Rendered file:

  • Item A
  • Item B
  • Item C

  1. Item A
  2. Item B
  3. Item C

Note that I used the line break tag <br> between the two lists because I want an empty line between them.


These examples above cover some of the most fundamental and widely used HTML tags. Combined with the basis for creating structured HTML files, you can create informative and navigable web documents.

As you become more comfortable with these tags, you’ll be able to create more complex and styled web content, as well as understand and analyze HTML content much quicker for web scraping.


Read HTML files in Python

In this section we will explore how to use Python to interact with HTML files, focusing on reading HTML content using Python libraries.

We will start by looking at how to read HTML content using Python. Generally you will have two cases when reading HTML content:


Reading HTML from URL

Let’s use an example URL from which we would like to read the HTML content: https://example.com/

Using requests and beautifulsoup4 libraries, we can easily access, read, parse, and print out the HTML content from a URL:


import requests
from bs4 import BeautifulSoup

# Fetch HTML content from a web page
response = requests.get("http://example.com")

# Use BeautifulSoup to parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Print formatted HTML content
print(soup.prettify())

and you should get:

<!DOCTYPE html>
<html>
 <head>
  <title>
   Example Domain
  </title>
  <meta charset="utf-8"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <style type="text/css">
   body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;        

    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
  </style>
 </head>
 <body>
  <div>
   <h1>
    Example Domain
   </h1>
   <p>
    This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.
   </p>
   <p>
    <a href="https://www.iana.org/domains/example">
     More information...
    </a>
   </p>
  </div>
 </body>
</html>

In summary this is what’s happening in the code:

  • requests.get(URL) fetches the content of the page
  • BeautifulSoup(response.content, ‘html.parser’) parses the HTML content fetched
  • soup.prettify() formats the HTML content for printing

Reading HTML from file

Now we will need a sample HTML file with some code, and it should be placed in the same directory as your Python file with the code.

For example, we can create a simple index.html with the following code:

<!DOCTYPE html>
<html>
<body>
<ul>
    <li>Item A</li>
    <li>Item B</li>
    <li>Item C</li>
</ul>
</body>
</html>

Your project structure should look like this:

HTML Tutorial
 ├──   index.html
 └──  main.py 

With the HTML file ready, we can now use Python to read the HTML content from file:


from bs4 import BeautifulSoup

# Open the HTML file
with open('index.html', 'r') as html_file:

    # Fetch the HTML content from file
    content = html_file.read()

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(content, 'html.parser')

#Print formatted HTML content
print(soup.prettify())

and you should get:

<!DOCTYPE html>
<html>
 <body>
  <ul>
   <li>
    Item A
   </li>
   <li>
    Item B
   </li>
   <li>
    Item C
   </li>
  </ul>
 </body>
</html>

In summary this is what’s happening in the code:

  • open(‘index.html’, ‘r’) opens the file in read mode
  • html_file.read() reads the entire file content into a string
  • BeautifulSoup(content, ‘html.parser’) parses the HTML content
  • soup.prettify() formats the HTML content for printing

Writing HTML Files

In this section we will explore how to use Python to interact with HTML files, focusing on reading HTML content using Python libraries.

We will start with creating a text file (html_code.txt) in the same directory as the Python code file with some sample HTML code:

<!DOCTYPE html>
<html>
<body>
<ul>
    <li>Item A</li>
    <li>Item B</li>
    <li>Item C</li>
</ul>
</body>
</html>

Your project structure should look like this:

HTML Tutorial
 ├──  html_code.txt
 └──  main.py 

Now we can take read the content of the text file, parse it as HTML code and write it to HTML file:


from bs4 import BeautifulSoup

#Read HTML content from TXT file
with open('html_code.txt', 'r') as file:
    html_content = file.read()

#Parse it as HTML
soup = BeautifulSoup(html_content, 'html.parser')

#Write the HTML code to HTML file
with open('new.html', 'w') as file:
    file.write(soup.prettify())

and you will see a new HTML file new.html appear in your project directory:

HTML Tutorial
 ├──  html_code.txt
 ├──  main.py
 └──  new.html

with the following content:

<!DOCTYPE html>
<html>
 <body>
  <ul>
   <li>
    Item A
   </li>
   <li>
    Item B
   </li>
   <li>
    Item C
   </li>
  </ul>
 </body>
</html>

Integrating HTML files with Python web frameworks

This section is more advanced and focuses on a few showcases of how to integrate HTML with Python web frameworks to create websites and web applications.

Brief Overview of Python Web Frameworks

  1. Flask
  2. Django

Flask and Django are the two biggest Python web frameworks and are extensively used by many companies worldwide.

In order to continue with the code in this section, you will need to install the required Python libraries: flask and django.


Flask

Flask is a lightweight and flexible micro web framework. It’s easy to learn and provides the essentials for web development while allowing you to choose your tools and extensions.

Flask uses Jinja2 templating language, which allows for embedding Python-like expressions within HTML. This lets developers dynamically generate HTML content based on data and logic.

Your project structure should look like this:

Flask App
 ├──  templates/
 │      └──  index.html 
 └──  app.py 

with HTML template (index.html) having the following content:

<!DOCTYPE html>
<html>

<head>
    <title>Home Page</title>
</head>
<body>
    <h1>Welcome!</h1>
    <p>This is a simple Flask application.</p>
</body>
</html>

and the main Python file containing Flask application logic (app.py) having the following content:


from flask import Flask, render_template

app = Flask(__name__)

@app.route('/')
def home():
    return render_template('index.html')

if __name__ == '__main__':
    app.run(debug=True)

Once you run the code, the terminal will display a few messages, one of which will contain a link to open your app in the browser (locally): http://127.0.0.1:5000

and you will see the app running with the HTML file rendered and showing the following output:

Home Page

Welcome!

This is a simple Flask application.


Django

Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design. It’s known for its “batteries-included” approach, providing an ORM, admin panel, and many other features out of the box.

Django also uses its templating engine, similar to Jinja2, allowing the embedding of Python code within HTML templates. It supports template inheritance, filters, and tags to create dynamic and reusable HTML content.

Demo of the Django app is outside the scope of this tutorial as it’s significantly more complicated then Flask, however there will be tutorials on my blog explaining how to build websites and web apps using Django in the future!


Conclusion

In this tutorial we explored a wide range of topics required to understand and utilize HTML in the context of Python development.

We started by introducing HTML and its significance in web development, followed by setting up the environment necessary for HTML and Python integration.

Then We delved into the basics of HTML, discussing its structure, common tags, and best practices. Moving forward, we explored how Python can interact with HTML, focusing on libraries like BeautifulSoup and lxml for parsing, manipulating, and generating HTML content.

We also examined how Python’s web frameworks, Flask and Django, integrate with HTML to create dynamic web applications.

Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Python Functions tutorials.