HTMLにおける言語コード/言語タグ メモ


以下のLanguage tags in HTML and XMLの抄訳は2006-11-09に執筆されたものを元にしています。元の文章は2009-12-09に更新されました。一部情報が古くなっている、または食い違いが生じていることに留意してください。また、BCP 47はこの文章の執筆時点でRFC 4646でしたが、RFC 5646によってObsoletesとされています。


Language tags in HTML and XML




Note that the HTML specification still recommends the use of RFC 1766 for identifying language but you should use RFC 4646 despite what the HTML specification currently says.

ってところ。HTML4はRFC 1766を参照しているけれども、(おそらくRFCの流儀に従って)それとは関係なく、(最新のRFCである)RFC 4646を参照すべき、とあります。
ちなみに、RFC 4646はRFC 3066とRFC 1766をobsolete(時代遅れ)にしています。


The golden rule when creating language tags is to keep the tag as short as possible. Avoid region, script or other subtags except where they add useful distinguishing information. For instance, use ja for Japanese and not ja-JP, unless there is a particular reason that you need to say that this is Japanese as spoken in Japan.


XML also provides a means to prevent inheritance of language using the empty string, ie.


Essentially, this says: I do not want to associate any language with this information.








The entries in the registry follow certain conventions with regard to upper and lowercasing - for example, language tags are lower case, alphabetic region subtags are upper case, and script tags begin with an initial capital. This is only a convention! When you use these subtags you are free to do as you like.


language subtag/言語subタグ

All language tags must begin with a language subtag.


Examples of simple, language-only language tags include:

  • en (English)
  • ast (Asturian - no two-letter code exists for Asturian in the ISO lists)

ISO 639の言語コードは2文字のものと3文字のものがあるのは周知のとおりだと思いますが、これについては、

These codes come from, and are kept up to date with, ISO 639 language codes. Because RFC 3066 didn't provide a list of valid subtags and just referred users to ISO 639, there was sometimes confusion about how to tag languages when the ISO code lists contained both two-letter and three-letter codes (and sometimes more than one three-letter code). Now all valid subtags are listed in a single IANA registry, which adopts only one value from the ISO lists per language. If a two-letter ISO code is available, this will be the one in the registry. Otherwise the registry will contain one three-letter code. This should make things simpler.

以前のRFC 306はISO 639を参照していただけなので、2文字か3文字、どちらを使っていいのか混乱することがあった(時には2つ以上の3文字コードがある―たぶんISO 639-2の書誌用の「Bコード」と用語学用の「Tコード」の2種類で違うものがある、ということを言いたいのだと思う)。そこで、IANAのレジストリを作り、そちらを参照するようにした。2文字のISOコードはそのまま登録され、2文字のコードがなければ3文字のISOコードを登録している、ということらしい。そんなに複雑でもない気はするんですが。

script subtag/用字subタグ


Examples of language tags including script tags are:

  • zh-Hans (Simplified Chinese)
  • az-Latn (Azerbaijani, written in Latin script - since Azerbaijani can also be written using the Arabic script)

比較的身近な例であるzh-Hans(簡体字中国語)がなんかRFC 4646で一番の目玉かなと(後でも説明)。

The script subtag is new in RFC 4646. The subtags come from, and are kept up to date with, the list of ISO 15924 script codes.

Only one script subtag can appear in a language tag, and it must immediately follow the language subtag. It is always four letters long.

用字タグはRFC 4646で採用された新しいもので、ISO 15924コードの最新リストを参照しています。

Although for common uses of language tags it is not likely that you will need to specify the script, there are one or two situations that have been crying out for it for some time. One such example is Chinese. There are many Chinese dialects, often mutually unintelligible, but these dialects are all written using either Simplified or Traditional Chinese script. People typically want to label Chinese text as either Simplified or Traditional, but until recently there was no way to do so. People had to bend something like zh-CN (meaning Chinese as spoken in China) to mean Simplified Chinese, even in Singapore, and zh-TW (meaning Chinese as spoken in Taiwan) for Traditional Chinese. Some people, however, use zh-HK for Traditional Chinese. The availability of zh-Hans and zh-Hant for Chinese written in Simplified and Traditional scripts should improve consistency and accuracy, and is already becoming widely used.


region subtag/地域subタグ


Examples of language tags including region subtags include:

  • en-GB (British English)
  • es-005 (South American Spanish)
  • zh-Hant-HK (Traditional Chinese as used in Hong Kong)


The region subtag in RFC 3066 took its values from the ISO 3166 country codes. These two-letter codes are still available from the new registry, but the registry also lists 3-digit UN M.49 region codes. The advantage of these codes is that they can represent more than just countries. For example, localization groups have for some time wanted to label their carefully crafted translations as Latin-American Spanish, rather than the Spanish of any particular country. With RFC 4646 this is now possible. (The appropriate language tag is es-419.)

ISO3166(厳密にはISO3166-1)の2文字の国コード、またはUN M.49の3桁の地域コードが使える。このUN M.49は国以上の括り、たとえばラテンアフリカといったような括りも可能とのこと。

ISO3166-2(JIS X 0401の都道府県コードあたり)なんかは使用することができない・・・大まかな地域には対応していても、細かい地域には対応しないのね。

variant subtags/異形subタグ

It is unlikely that you will need to use variant subtags unless you are working in a specialised area.

The following examples may help you understand what these subtags do.

  • sl-nedis (the Nadiza dialect of Slovenian)
  • sl-rozaj (the Rezijan dialect of Slovenian)
  • sl-IT-nedis (the specific variant of the Nadiza dialect of Slovenian that is spoken in Italy)
  • de-CH-1901 (the variant of German orthography dating from the 1901 reforms, as seen in Switzerland)


  • スロベニア語のNadiza方言
  • スロベニア語のRezijan方言
  • イタリアで話されるSlovenianのNadiza方言の特定の異形
  • スイスで見られるように1901年の改革をさかのぼるドイツの綴り字法の異形


extension and private-use subtags/拡張・私用subタグ

We will mention these other subtags in passing, but if you feel you really need to use these tags, you should read the specification, rather than this article.
Extension subtags allow for future extensions to the language tag. There are no such registered tags at the moment.
Private-use subtags do not appear in the subtag registry, and are chosen and maintained by private agreement amongst parties.
Extension and private use tags are introduced by a single letter tag, or 'singleton'. The singleton for private use is x.

拡張タグは言語タグに今後の拡大を考慮します。 現在、そのようなどんな登録されたタグもありません。



Matching different language tags is important for a number of applications. According to BCP 47 'en' can be said to match 'en-GB'. For example, the following CSS code colors all English text red in browsers that support the pseudo-attribute :lang.

:lang(en) { color: red; }

多くのアプリケーションにとって、異なった言語タグをマッチングさせることは重要です。 BCP47によると、'en-GB'を合わせると'en'を言うことができます。 例えば、以下のCSSコードはすべての英文を疑似属性が:langであるとサポートするブラウザの赤に着色します。

In the following code, the text described as lang="en-GB" will be red.

<p>En janvier, toutes les boutiques de Londres affichent des panneaux 
<span lang="en-GB">SALE</span>, mais en fait ces magasins sont bien propres!</p>


On the other hand, given the following CSS declaration,

:lang(en-GB) { color: red; }

the word 'SALE' should not be red in the following code.

<p>En janvier, toutes les boutiques de Londres affichent des panneaux 
<span lang="en">SALE</span>, mais en fait ces magasins sont bien propres!</p>