Thursday, October 14, 2010

XHTML Encoding

XHTML files, and any other text file is stored using a particular character encoding. Since there are many different character encoding in the world, and have no idea what the settings in your default browser of the visitor, always a good idea to explicitly what the encoding used to make your web page . Here is an example of how to declare the character encoding, in this case, the Unicode encoding is used:
e.g:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>

When the browser sees this tag, it will know that the page is UTF-8, and displays the correct. XHTML requires that you declare the encoding, if it is something different then the default UTF-8 or UTF-16. You can also use XML defines the character encoding. Some character encodings are almost identical.

International Character Set Codes
World Wide Web Consortium (W3C) recommends the use of UTF-8 if possible - UTF-8 can be used in all languages, and is recommended charset on the Internet. Support is growing fast. However, here is a partial list of countries,languages, & the older characters are generally used for:


Language (country)

Charset

Afrikaans (AF)

iso-8859-1, windows-1252

Catalan (CA)

iso-8859-1, windows-1252

Bulgarian (BG)

iso-8859-5

Basque (EU)

iso-8859-1, windows-1252

Byelorussian (BE)

iso-8859-5

Croatian (HR)

iso-8859-2, windows-1250

Czech (CS)

iso-8859-2

Danish (DA)

iso-8859-1, windows-1252

Dutch (NL)

iso-8859-1, windows-1252

English (EN)

iso-8859-1, windows-1252

Finnish (FI)

iso-8859-1, windows-1252

Albanian (SQ)

iso-8859-1, windows-1252

Esperanto (EO)

iso-8859-3*

Estonian (ET)

iso-8859-15

Hungarian (HU)

iso-8859-2

Faroese (FO)

iso-8859-1, windows-1252

Hebrew (IW)

iso-8859-8

French (FR)

iso-8859-1, windows-1252

Galician (GL)

iso-8859-1, windows-1252

German (DE)

iso-8859-1, windows-1252

Icelandic (IS)

iso-8859-1, windows-1252

Inuit (Eskimo) languages

iso-8859-10*

Macedonian (MK)

iso-8859-5, windows-1251

Irish (GA)

iso-8859-1, windows-1252

Italian (IT)

iso-8859-1, windows-1252

Japanese (JA)

shift_jis, iso-2022-jp, euc-jp

Korean (KO)

euc-kr

Latvian (LV)

iso-8859-13, windows-1257

Lapp

iso-8859-10* **

Lithuanian (LT)

iso-8859-13, windows-1257

Portuguese (PT)

iso-8859-1, windows-1252

Maltese (MT)

iso-8859-3*

Polish (PL)

iso-8859-2

Norwegian (NO)

iso-8859-1, windows-1252

Serbian (SR) latin

iso-8859-2, windows-1250

Swedish (SV)

iso-8859-1, windows-1252

Russian (RU)

koi8-r, iso-8859-5

Romanian (RO)

iso-8859-2

Serbian (SR) cyrillic

windows-1251, iso-8859-5***

Turkish (TR)

iso-8859-9, windows-1254

Slovak (SK)

iso-8859-2

Slovenian (SL)

iso-8859-2, windows-1250

Spanish (ES)

iso-8859-1, windows-1252

Scottish (GD)

iso-8859-1, windows-1252

Ukrainian (UK)

iso-8859-5

Arabic (AR)

iso-8859-6



No comments:

Post a Comment