Chapter 3 HTML Fundamentals
A webpage on the internet is simply a set of files that the browser renders (shows) in a particular way, allowing you to interact with it. The most basic way to control how a browser displays content (e.g., words, images, etc) is by encoding that content in HTML.
HTML (HyperText Markup Language) is a language that is used to give meaning to otherwise plain text, which the browser can then use to determine how to display that text. HTML is not a programming language but rather a markup language): it adds additional details to information (like notes in the margin of a book), but doesn’t contain any logic. HTML is a “hypertext” markup language because it was originally intended to mark up a document with hyperlinks, or links to other documents. In modern usage, HTML describes the semantic meaning of content: it marks what content is the a heading, what content is a paragraph, what content is a definition, what content is an image, what content is a hyperlink, and so forth.
- HTML serves a similar function to Markdown, but is much more expressive and powerful.
This chapter provides an overview and explanation of HTML’s syntax (how to use it to annotate content). HTML’s syntax is very simple, and generally only takes someone a few days to learn—though using it effectively can require more practice.
3.1 HTML Elements
HTML content is normally written in
.html files. By using the
.html extension, your editor, computer, and browser should automatically know that this file will contain content marked up in HTML.
As mentioned in Chapter 2, most web servers will by default serve a file named
index.html, and so that filename is traditionally used for a website’s home page.
As with all programming languages,
.html files are really just plain text files with a special extension, so can be created in any text editor. However, using a coding editor such as VS Code provides additional helpful features that can speed up your development process.
HTML files contain the content of your web page: the text that you want to show on the page. This content is then annotated (marked up) by surrounding it with tags:
The opening/start tag comes before the content and tell the computer “I’m about to give you content with some meaning”, while the closing/end tag comes after the content to tell the computer “I’m done giving content with that meaning.” For example, the
<h1> tag represents a top-level heading (equivalent to one
# in Markdown), and so the open tag says “here’s the start of the heading” and the closing tag says “that’s the end of the heading”.
Tags are written with a less-than symbol
<, then the name of the tag (often a single letter), then a greater-than symbol
>. An end tag is written just like a start tag, but includes a forward slash
/ immediately after the less-than symbol—this indicates that the tag is closing the annotation.
HTML tag names are not case sensitive, but you should always write them in all lowercase.
Line breaks and white space around tags (including indentation) is ignored. Tags may thus be written on their own line, or inline with the content. These two uses of the
<p>tag (which marks a paragraph of content) are equivalent:
<p> The itsy bitsy spider went up the water spout. </p> <p>The itsy bitsy spider went up the water spout.</p>
Taken together, the tags and the content they contain are called an HTML Element. A website is made of a bunch of these elements.
The start tag of an element may also contain one or more attributes. These are similar to attributes in object-oriented programming: they specify properties, options, or otherwise add additional meaning to an element. Like named parameters in R or HTTP query parameters, attributes are written in the format
attributeName=value (no spaces are allowed around the
=); values of attributes are almost always strings, and so are written in quotes. Multiple attributes are separated by spaces:
<tag attributeA="value" attributeB="value"> content </tag>
For example, a hyperlink anchor (
<a>) uses a
href (“hypertext reference”) attribute to specify where the content should link to:
<a href="https://ischool.uw.edu">iSchool homepage</a>
In a hyperlink, the content of the tag is the displayed text, and the attribute specifies the link’s URL. Contrast this to the same link in Markdown:
Similarly, an image (
<img>) uses the
src (source) attribute to specify what picture it is showing. The
alt attribute contains alternate text to use if the browser can’t show images—such as with screen readers (for the visible impaired) and search engine indexers.
<img src="baby_picture.jpg" alt="a cute baby">
- Note that because an
<img>has no textual content, it is an empty element (see below).
There are also a number of global attributes that can be used on any element. For example:
Every HTML element can include an
idattributes are named like variable names, and must be unique on the page.
<h1 id="title">My Web Page</h1>
idattribute is most commonly used used to create “bookmark hyperlinks”, which are hyperlinks to a particular location on a page (i.e., that cause the page to scroll down). You do this by including the
idas the fragment of the URI to link to (e.g., after the
#in the URI).
<a href="index.html#nav">Link to element on `index.html` with `id="nav"`</a> <a href="#footnote">Link to element on current page with `id="footnote"`</a>
Note that the title attribute does NOT contain the
#symbol, but the URI to link to does.
langattribute is used to indicate the language in which the element’s content is written. Programs reading this file might use that to properly index the content, correctly pronounce it via a screen reader, or even translate it into another language:
<p lang="sp">No me gusta</p>
langattribute for the
<html>element (see below) to define the default language of the page; that way you don’t need to mark the language of every element. Always include this attribute.
A few HTML elements don’t require a closing tag because they can’t contain any content. These tags are often used for inserting media into a web page, such as with the
<img> tag. With an
<img> tag, you can specify the path to the image file in the
src attribute, but the image element itself can’t contain additional text or other content. Since it can’t contain any content, we leave off the end tag entirely:
<img src="picture.png" alt="description of image for screen readers and indexers">
Older versions of HTML (and current related languages like XML) required you to include forward slash
/ just before the greater-than symbol. This “end” slash indicated that the element was complete and expected no further content:
<img src="picture.png" alt="description of image for screen readers and indexers" />
This is no longer required in HTML5, so feel free to omit that forward slash (though some purists, or those working with XML, will still include it).
3.2 Nesting Elements
Web pages are made up of multiple (hundreds! thousands!) of HTML elements. Moreover, HTML elements can be nested: that is, the content of an HTML element can contain other HTML tags (and thus other HTML elements):
The semantic meaning indicated by an element applies to all its content: thus all the text in the above example is a top-level heading, and the content “(with emphasis)” is emphasized in addition.
Because elements can contain elements which can themselves contain elements, an HTML document ends up being structured as a “tree” of elements:
In an HTML document, the “root” element of the tree is always an
<html> element. Inside this we put a
<body> element to contain the document’s “body” (that is, the shown content):
<html lang="en"> <body> <h1>Hello world!</h1> <p>This is <em>conteeeeent</em>!</p> <body> </html>
Caution! HTML elements have to be “closed” correctly, or the semantic meaning may be incorrect! If you forget to close the
<h1> tag, then all of the following content will be considered part of the heading! Remember to close your inner tags before you close the outer ones. Validating your HTML can help with this.
Block vs. Inline Elements
All HTML elements fall into one of two categories:
Block elements form a visible “block” on a page—in particular, they will be on a new line from the previous content, and any content after it will also be on a new line. These tend to be structural elements for a page: headings (
<h1>), paragraphs (
<p>), lists (
<div>Block element</div> <div>Block element</div>
Inline elements are contained “in the line” of content. These will not have a line break after them. Inline elements are used to modify the content rather than set it apart, such as giving it emphasis (
<em>) or declaring that it to be a hyperlink (
<span>Inline element</span> <span>Other inline element</span>
Inline elements go inside of block elements, and it’s common to put block elements inside of the other block elements (e.g., an
<li> inside of a
<ul>, or a
<p> inside of a
<div>). However, it is invalid to to nest a block element inside of an inline element—the content won’t make sense, and probably won’t look right.
Some elements have further restrictions on nesting. For example, a
<ul> (unordered list) is only allowed to contain
<li> elements—anything else is invalid markup.
3.3 Web Page Structure
Now that you understand how to specify HTML elements, you can begin making real web pages! However, there are a few more tags you need to know and include for a valid, modern web page.
All HTML files start with a document type declaration, commonly referred to as the “Doctype.” This tells the rendering program (e.g., the browser) what format and syntax your document is using. Since you’re writing pages with HTML 5, you can declare it as follows:
<!DOCTYPE html> <html lang="en"> ... </html>
<!DOCTYPE> isn’t technically an HTML tag (it’s actually XML). While modern browsers will perform a “best guess” as to the Doctype, it is best practice to specify it explicitly. Always include the DOCTYPE at the start of your HTML files!
In addition to the
<body> element that defines the displayed content, you should also include a
<head> element that acts as the document “header” (the
<head> is nested inside the
<html> at the same level as the
<body>). The content of the
<head> element is not shown on the web page—instead it provides extra (meta) information about the document being rendered.
There are a couple of common elements you should include in the
<title>, which specifies the “title” of the webpage:
<title>My Page Title</title>
Browsers will show the page title in the tab at the top of the browser window, and use that as the default bookmark name if you bookmark the page. But the title is also used by search indexers and screen readers for the blind, since it often provides a strong signal about what the page’s subject. Thus your title should be informative and reflective of the content.
<meta>tag that specifies the character encoding of the page:
<meta>tag itself represents “metadata” (information about the page’s data), and uses an attribute and value to specify that information. The most important
<meta>tag is for the character set, which tells the browser how to convert binary bits from the server into letters. Nearly all editors these days will save files in the
UTF-8character set, which supports the mixing of different scripts (Latin, Cyrillic, Chinese, Arabic, etc) in the same file.
You can also use the
<meta>tag to include more information about the author, description, and keywords for your page:
<meta name="author" content="your name"> <meta name="description" content="description of your page"> <meta name="keywords" content="list,of,keywords,separate,by,commas">
Note that the
nameattribute is used to specify the “variable name” for that piece of metadata, while the
contentattribute is used to specify the “value” of that metadata.
<meta>elements are empty elements and have no content of their own.
Again, these are not visible in the browser window (because they are in the
<head>!), but will be used by search engines to index your page.
- At the very least, always include author information for the pages you create!
We will discuss additional elements for the
<head>section throughout the text, such as using
<link>to include CSS and using
3.4 Web Page Template
Putting this all together produces the following “template” for making a web page:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="author" content="your name"> <meta name="description" content="description of your page"> <title>My Page Title</title> </head> <body> ... Content goes here! ... </body> </html>
You can use this to start off every web page you ever create from now on!
Also remember you can view the HTML page source of any webpage you visit. Use that to explore how others have developed pages and to learn new tricks and techniques!