Human language markup

Use HTML markup to indicate the human language of text

Indicating the text language (English, Spanish, etc.) used on a web page is critical because assistive technologies need to know how to correctly pronounce the text. Language markup is especially relevant at IU, since the university has a large international audience.

If the language of content is not specified, assistive technology will read the text in the language set in the visitor's operating system. Unfortunately, those languages don’t always match up. In such cases, users from non-English-speaking countries could have difficulty understanding the page content.

Guidelines

For HTML pages, the HTML lang attribute should be used to set both the default language and to indicate any language changes in the text.

For XHTML pages, use the xml:lang attribute.

For pages that mix HTML and XML, use both the lang attribute and the xml:lang attribute together.

Benefits

These measures allow users of assistive technology to better understand the structure of web pages, how to navigate them, and how different pieces of content relate to one another.

The measures also allow search engines to determine the text language and more accurately index the page’s content. Assistive technologies can also pronounce text content more accurately.

Examples

Setting the default page language for HTML pages

For HTML pages written entirely in US English, simply add the lang="en-US" attribute to the HTML tag:

See the lang attribute on line 2 below:

<!doctype html>
<html lang="en-US">
  <head>
    <title>Markup Text Language - Web Accessibility Top Ten - UITS</title>
  </head>
  <body>
    <h1>Provide a unique and descriptive page title</h1>
    …
  </body>
</html>

Indicating language changes in HTML pages

The “lang” attribute can also be appled to other HTML tags to indicate language changes such as  <blockquote>, <p>, <q><a><div>, and <span>. The lang attribute value applies to any text in the element's attributes, as well as the element's child text:

The lang attribute should not be applied to <applet><base><basefont><br><frame><frameset><iframe><param>,<script> to indicate language changes.

 See the lang attributes on lines 7 and 10 below:

<!doctype html>
<html lang="en-US">
  <head>
    <title>Markup Text Language - Web Accessibility Top Ten Tips - accessibility.iu.edu</title>
  </head>
  <body>
    <p><span lang="es">"No hables a menos que puedas mejorar el silencio"</span>, the father said.</p>
    …
    <p>Translate into
      <span title="French"><a lang="fr" href="text-french.html">Français</a></span>
    </p>
  </body>
</html>

Setting the default page language for HTML pages with XML

See the lang and xml:lang attributes on line 2.
<!doctype html>
<html lang="en-US" xml:lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Markup Text Language - Web Accessibility Top Ten Tips - accessibility.iu.edu</title>
  </head>
  <body>
    …
  </body>
</html>

Choosing language attributes

The lang attribute uses a primary code to indicate the language of a web page and uses optional subcodes to indicate the specific variant of the language. These subcodes are separated by a hyphen, e.g.,"en-US" for United States' English and "es-MX" for Mexican Spanish.

Language codes must be chosen from the standard list of language codes as found in ISO-639. Two-letter country subcodes are optional, and valid subcodes are listed in ISO-3166.

Notes: Be sure to use a hyphen, not an underscore. Do not include any spaces.

When choosing a primary language code from ISO-639, the rule is to keep the lang attribute value as short as possible. Use the two-letter code whenever possible. Only use a three-letter code if no two-letter code is available.

For additional information, the rules and standards for creating language attribute values are described by an IETF specification called BCP 47.

Best practices

  • Place a lang attribute on the HTML element to set the default language for the entire page
  • Use the shortest available language code
    • Use a three-letter code if no other code is available
  • Insert lang attributes anywhere the default language changes to a different language

Checking for correct usage

  • The html elements of the document should feature the correct lang attribute value and have applied  xml:lang attributes where appropriate.
  • The “lang” and/or xml:lang attributes should be applied to indicate any language changes in the content.
  • The values of the “lang” and/or xml:lang attributes need to meet the standards of BCP 47.

Relevant standards

WCAG 2.0 SC 3.1.1 (Level A) - Language of Page
The default human language of each web page can be programmatically determined.

WCAG 2.0 SC 3.1.2 (Level AA) - Language of Parts
The human language of each passage or phrase in the content can be programmatically determined except for proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text.