ECMA-94

ISO 8859 encoding family
Standard	ISO/IEC 8859
Classification	8-bit extended ASCII, ISO/IEC 4873 level 1
Extends	US-ASCII
Preceded by	ISO/IEC 646
Succeeded by	ISO/IEC 10646 (Unicode)
Other related encoding(s)	ISO/IEC 10367, Windows-125x
	v; t; e;

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12.^[1] The ISO working group maintaining this series of standards has been disbanded.

ISO/IEC 8859 parts 1, 2, 3, and 4 were originally Ecma International standard ECMA-94.

Introduction

While the bit patterns of the 95 printable ASCII characters are sufficient to exchange information in modern English, most other languages that use Latin alphabets need additional symbols not covered by ASCII. ISO/IEC 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte to allow positions for another 96 printable characters. Early encodings were limited to 7 bits because of restrictions of some data transmission protocols, and partially for historical reasons. However, more characters were needed than could fit in a single 8-bit character encoding, so several mappings were developed, including at least ten suitable for various Latin alphabets.

The ISO/IEC 8859 standard parts only define printable characters, although they explicitly set apart the byte ranges 0x00–1F and 0x7F–9F as "combinations that do not represent graphic characters" (i.e. which are reserved for use as control characters) in accordance with ISO/IEC 4873; they were designed to be used in conjunction with a separate standard defining the control functions associated with these bytes, such as ISO 6429 or ISO 6630.^[2] To this end a series of encodings registered with the IANA add the C0 control set (control characters mapped to bytes 0 to 31) from ISO 646 and the C1 control set (control characters mapped to bytes 128 to 159) from ISO 6429, resulting in full 8-bit character maps with most, if not all, bytes assigned. These sets have ISO-8859-n as their preferred MIME name or, in cases where a preferred MIME name is not specified, their canonical name. Many people use the terms ISO/IEC 8859-n and ISO-8859-n interchangeably. ISO/IEC 8859-11 did not get such a charset assigned, presumably because it was almost identical to TIS 620.

Characters

The ISO/IEC 8859 standard is designed for reliable information exchange, not typography; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO/IEC 8859 standards, or use Unicode instead.

An inexact rule based on practical experience states that if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it did not get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks “ and ” used for English and some other languages.

French did not get its œ and Œ ligatures because they could be typed as 'oe'. Likewise, Ÿ, needed for all-caps text, was dropped as well.^[3]^[4]^[5] Albeit under different codepoints, these three characters were later reintroduced with ISO/IEC 8859-15 in 1999, which also introduced the new euro sign character €. Likewise Dutch did not get the ĳ and Ĳ letters, because Dutch speakers had become used to typing these as two letters instead.

Romanian did not initially get its Ș/ș and Ț/ț (with comma) letters, because these letters were initially unified with Ş/ş and Ţ/ţ (with cedilla) by the Unicode Consortium, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in ISO/IEC 8859-16.

Most of the ISO/IEC 8859 encodings provide diacritic marks required for various European languages using the Latin script. Others provide non-Latin alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. Most of the encodings contain only spacing characters, although the Thai, Hebrew, and Arabic ones do also contain combining characters.

The standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic writing systems require many thousands of code points. Although it uses Latin based characters, Vietnamese does not fit into 96 positions (without using combining diacritics such as in Windows-1258) either. Each Japanese syllabic alphabet (hiragana or katakana, see Kana) would fit, as in JIS X 0201, but like several other alphabets of the world they are not encoded in the ISO/IEC 8859 system.

The parts of ISO/IEC 8859

ISO/IEC 8859 is divided into the following parts:

Part	Name	Revisions	Other standards	Description
Part 1	Latin-1 Western European	1987, 1998	ECMA-94 (1985, 1986)	Perhaps the most widely used part of ISO/IEC 8859, covering most Western European languages: Danish (partial),^{[nb 1]} Dutch (partial),^{[nb 2]} English, Faeroese, Finnish (partial),^{[nb 3]} French (partial),^{[nb 3]} German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Catalan, and Swedish. Languages from other parts of the world are also covered, including: Eastern European Albanian, Southeast Asian Indonesian, as well as the African languages Afrikaans and Swahili. A modification of DEC MCS; the first (1985) standard version at the ECMA level lacked the times sign and division obelus, which were added the next year. The missing euro sign and capital Ÿ are in the revised version ISO/IEC 8859-15 (see below). The corresponding IANA character set is ISO-8859-1.
Part 2	Latin-2 Central European	1987, 1999	ECMA-94 (1986)^{[nb 4]}	Supports those Central and Eastern European languages that use the Latin alphabet, including Bosnian, Polish, Croatian, Czech, Slovak, Slovene, Serbian, and Hungarian. The missing euro sign can be found in version ISO/IEC 8859-16.
Part 3	Latin-3 South European	1988, 1999		Turkish, Maltese, and Esperanto. Largely superseded by ISO/IEC 8859-9 for Turkish.
Part 4	Latin-4 North European	1988, 1998		Estonian, Latvian, Lithuanian, Greenlandic, and Sami.
Part 5	Latin/Cyrillic	1988, 1999	ECMA-113 (1988, 1999)^{[nb 5]}	Covers mostly Slavic languages that use a Cyrillic alphabet, including Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian (partial).^{[nb 6]}
Part 6	Latin/Arabic	1987, 1999	ASMO 708 (1986) ECMA-114 (1986, 2000)	Covers the most common Arabic language characters. Does not support other languages using the Arabic script. Needs to be BiDi and cursive joining processed for display.
Part 7	Latin/Greek	1987, 2003	ELOT 928 (1986) ECMA-118 (1986)	Covers the modern Greek language (monotonic orthography). Can also be used for Ancient Greek written without accents or in monotonic orthography, but lacks the diacritics for polytonic orthography. These were introduced with Unicode. Updated 2003 to add the euro sign, drachma sign and spacing ypogegrammeni.
Part 8	Latin/Hebrew	1988, 1999	ECMA-121 (1987, 2000) SI 1311 (2002)	Covers the modern Hebrew alphabet as used in Israel. In practice two different encodings exist, logical order (needs to be BiDi processed for display) and visual (left-to-right) order (in effect, after bidi processing and line breaking). Updated 1999 to add LRM and RLM. Updated at national standard level in 2002 to add euro and shekel signs and more bidirectional format effectors; the 2002 additions were never incorporated back into the ISO standard version.
Part 9	Latin-5 Turkish	1989, 1999	TS 5881 (1988) ECMA-128 (1988, 1999)	Largely the same as ISO/IEC 8859-1, replacing the rarely used Icelandic letters with Turkish ones.
Part 10	Latin-6 Nordic	1992, 1998	ECMA-144 (1990, 1992, 2000)	A rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages use Latin-4 more.
Part 11	Latin/Thai	2001	TIS-620 (1986, 1990)	Contains characters needed for the Thai language. First revision established in 1986 at national standard level as TIS 620. Elevated to ISO standard status as a part of ISO 8859 in 2001, with the addition of a non-breaking space.
~~Part 12~~	Latin/Devanagari	N/A	-	The work in making a part of 8859 for Devanagari was officially abandoned in 1997. ISCII and Unicode/ISO/IEC 10646 cover Devanagari.
Part 13	Latin-7 Baltic Rim	1998	-	Added some characters for Baltic languages which were missing from Latin-4 and Latin-6. Related to the earlier-published^{[nb 7]} Windows-1257.
Part 14	Latin-8 Celtic	1998	-	Covers Celtic languages such as Gaelic and the Breton language. Welsh letters correspond to the earlier (1994) ISO-IR-182.
Part 15	Latin-9	1999	-	A revision of 8859-1 that removes some little-used symbols, replacing them with the euro sign € and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of French, Finnish and Estonian.
Part 16	Latin-10 South-Eastern European	2001	SR 14111 (1998)	Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovene, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols. The generic currency sign is replaced with the euro sign.

Each part of ISO/IEC 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all of its seven special characters at the same positions in all Latin variants (1–4, 9, 10, 13–16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1–4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.

Table

Zdroj:https://en.wikipedia.org?pojem=ECMA-94
Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok. Podrobnejšie informácie nájdete na stránke Podmienky použitia.

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

Comparison of the various parts (1–16) of ISO/IEC 8859
Binary	Oct	Dec	Hex	1	2	3	4	5	6	7	8	9	10	11	13	14	15	16
1010 0000	240	160	A0	Non-breaking space (NBSP)
1010 0001	241	161	A1	¡	Ą	Ħ	Ą	Ё		‘		¡	Ą	ก	”	Ḃ	¡	Ą
1010 0010	242	162	A2	¢	˘		ĸ	Ђ		’	¢		Ē	ข	¢	ḃ	¢	ą
1010 0011	243	163	A3	£	Ł	£	Ŗ	Ѓ		£			Ģ	ฃ	£			Ł
1010 0100	244	164	A4	¤				Є	¤	€	¤		Ī	ค	¤	Ċ	€
1010 0101	245	165	A5	¥	Ľ		Ĩ	Ѕ		₯	¥		Ĩ	ฅ	„	ċ	¥	„
1010 0110	246	166	A6	¦	Ś	Ĥ	Ļ	І		¦			Ķ	ฆ	¦	Ḋ	Š
1010 0111	247	167	A7	§				Ї		§				ง	§
1010 1000	250	168	A8	¨				Ј		¨			Ļ	จ	Ø	Ẁ	š
1010 1001	251	169	A9	©	Š	İ	Š	Љ		©			Đ	ฉ	©
1010 1010	252	170	AA	ª	Ş		Ē	Њ		ͺ	×	ª	Š	ช	Ŗ	Ẃ	ª	Ș
1010 1011	253	171	AB	«	Ť	Ğ	Ģ	Ћ		«			Ŧ	ซ	«	ḋ	«
1010 1100	254	172	AC	¬	Ź	Ĵ	Ŧ	Ќ	،	¬			Ž	ฌ	¬	Ỳ	¬	Ź
1010 1101	255	173	AD	Soft hyphen (SHY)										ญ	SHY
1010 1110	256	174	AE	®	Ž		Ž	Ў			®		Ū	ฎ	®			ź
1010 1111	257	175	AF	¯	Ż		¯	Џ		―	¯		Ŋ	ฏ	Æ	Ÿ	¯	Ż
1011 0000	260	176	B0	°				А		°				ฐ	°	Ḟ	°
1011 0001	261	177	B1	±	ą	ħ	ą	Б		±			ą	ฑ	±	ḟ	±
1011 0010	262	178	B2	²	˛	²	˛	В		²			ē	ฒ	²	Ġ	²	Č
1011 0011	263	179	B3	³	ł	³	ŗ	Г		³			ģ	ณ	³	ġ	³	ł
1011 0100	264	180	B4	´				Д		΄	´		ī	ด	“	Ṁ	Ž
1011 0101	265	181	B5	µ	ľ	µ	ĩ	Е		΅	µ		ĩ	ต	µ	ṁ	µ	”
1011 0110	266	182	B6	¶	ś	ĥ	ļ	Ж		Ά	¶		ķ	ถ	¶
1011 0111	267	183	B7	·	ˇ	·	ˇ	З		·				ท	·	Ṗ	·
1011 1000	270	184	B8	¸				И		Έ	¸		ļ	ธ	ø	ẁ	ž
1011 1001	271	185	B9	¹	š	ı	š	Й		Ή	¹		đ	น	¹	ṗ	¹	č
1011 1010	272	186	BA	º	ş		ē	К		Ί	÷	º	š	บ	ŗ	ẃ	º	ș
1011 1011	273	187	BB	»	ť	ğ	ģ	Л	؛	»			ŧ	ป	»	Ṡ	»
1011 1100	274	188	BC

[1]

[2]

[3]

[4]

[5]

[nb 1]

[nb 2]

[nb 3]

[nb 4]

[nb 5]

[nb 6]

[nb 7]