Unicode block

A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Each block is generally, but not always, meant to supply glyphs used by one or more specific languages, or in some general application area such as mathematics, surveying, decorative typesetting, social forums, etc.

Design and implementation

Unicode blocks are identified by unique names, which use only ASCII characters and are usually descriptive of the nature of the symbols, in English; such as "Tibetan" or "Supplemental Arrows-A". (When comparing block names, one is supposed to equate uppercase with lowercase letters, and ignore any whitespace, hyphens, and underbars; so the last name is equivalent to "supplemental_arrows__a" and "SUPPLEMENTALARROWSA".^[1]

Blocks are pairwise disjoint; that is, they do not overlap. The starting code point and the size (number of code points) of each block are always multiples of 16; therefore, in the hexadecimal notation, the starting (smallest) point is U+xxx0 and the ending (largest) point is U+yyyF, where xxx and yyy are three or more hexadecimal digits. (These constraints are intended to simplify the display of glyphs in Unicode Consortium documents, as tables with 16 rows labeled with the last hexadecimal digit of the code point.^[1]) The size of a block may range from the minimum of 16 to a maximum of 65,536 code points.

Every assigned code point has a glyph property called "Block", whose value is a character string naming the unique block that owns that point.^[2] However, a block may also contain unassigned code points, usually reserved for future additions of characters that "logically" should belong to that block. Code points not belonging to any of the named blocks, e.g. in the unassigned planes 4–13, have the value block="No_Block".^[1]

Simply belonging to a particular Unicode block does not guarantee the certain particular properties of the characters it is or will be expected to contain. The identity of any character is determined by its properties stated in the Unicode Character Database. For example, the contiguous range of 32 noncharacter code points U+FDD0..U+FDEF share none of the properties common to the other characters in the Arabic Presentation Forms-A block, that they are certainly not Arabic script characters or "right-to-left noncharacters", and are assigned there as a filler to this block given that it has been agreed that no further Arabic compatibility characters will be encoded. ^[3]

Other classifications

Each Unicode point also has a property called "General Category", that attempts to describe the role of the corresponding symbol in the languages or applications for whose sake it was included in the system. Examples of General Categories are "Lu" (meaning upper-case letter), "Nd" (decimal digit), "Pi" (open-quote punctuation), and "Mn" (non-spacing mark, i.e. a diacritic for the preceding glyph). This division is completely independent of code blocks: the code points with a given General Category generally span many blocks, and do not have to be consecutive, not even within each block.^[4]

Each code point also has a script property, specifying which writing system it is intended for, or whether it is intended for multiple writing systems. This, also, is independent of block.

In descriptions of the Unicode system, a block may be subdivided into more specific subgroups, such as the "Chess symbols" in the Miscellaneous Symbols block (not to be confused with the separate Chess Symbols block). Those subgroups are not "blocks" in the technical sense used by the Unicode consortium, and are named only for the convenience of users.

List of blocks

Unicode 15.1 defines 328 blocks:^[1]

164 in plane 0, the Basic Multilingual Plane (in table below: § BMP)
151 in plane 1, the Supplementary Multilingual Plane (§ SMP)
7 in plane 2, the Supplementary Ideographic Plane (§ SIP)
2 in plane 3, the Tertiary Ideographic Plane (§ TIP)
2 in plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (§ SSP)
One each in the planes 15 (F_hex) and 16 (10_hex), called Supplementary Private Use Area-A and -B (§ PUA-A)

Zdroj:https://en.wikipedia.org?pojem=Unicode_block
Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok. Podrobnejšie informácie nájdete na stránke Podmienky použitia.

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

v t e Unicode blocks and contained scripts
Plane	Block range	Block name	Code points^[a]	Assigned characters	Scripts^[b]^[c]^[d]^[e]^[f]
0 BMP	U+0000..U+007F	Basic Latin^[g]	128	128	Latin (52 characters), Common (76 characters)
0 BMP	U+0080..U+00FF	Latin-1 Supplement^[h]	128	128	Latin (64 characters), Common (64 characters)
0 BMP	U+0100..U+017F	Latin Extended-A	128	128	Latin
0 BMP	U+0180..U+024F	Latin Extended-B	208	208	Latin
0 BMP	U+0250..U+02AF	IPA Extensions	96	96	Latin
0 BMP	U+02B0..U+02FF	Spacing Modifier Letters	80	80	Bopomofo (2 characters), Latin (14 characters), Common (64 characters)
0 BMP	U+0300..U+036F	Combining Diacritical Marks	112	112	Inherited
0 BMP	U+0370..U+03FF	Greek and Coptic	144	135	Coptic (14 characters), Greek (117 characters), Common (4 characters)
0 BMP	U+0400..U+04FF	Cyrillic	256	256	Cyrillic (254 characters), Inherited (2 characters)
0 BMP	U+0500..U+052F	Cyrillic Supplement	48	48	Cyrillic
0 BMP	U+0530..U+058F	Armenian	96	91	Armenian
0 BMP	U+0590..U+05FF	Hebrew	112	88	Hebrew
0 BMP	U+0600..U+06FF	Arabic	256	256	Arabic (238 characters), Common (6 characters), Inherited (12 characters)
0 BMP	U+0700..U+074F	Syriac	80	77	Syriac
0 BMP	U+0750..U+077F	Arabic Supplement	48	48	Arabic
0 BMP	U+0780..U+07BF	Thaana	64	50	Thaana
0 BMP	U+07C0..U+07FF	NKo	64	62	N’Ko
0 BMP	U+0800..U+083F	Samaritan	64	61	Samaritan
0 BMP	U+0840..U+085F	Mandaic	32	29	Mandaic
0 BMP	U+0860..U+086F	Syriac Supplement	16	11	Syriac
0 BMP	U+0870..U+089F	Arabic Extended-B	48	41	Arabic
0 BMP	U+08A0..U+08FF	Arabic Extended-A	96	96	Arabic (95 characters), Common (1 character)
0 BMP	U+0900..U+097F	Devanagari	128	128	Devanagari (122 characters), Common (2 characters), Inherited (4 characters)
0 BMP	U+0980..U+09FF	Bengali	128	96	Bengali
0 BMP	U+0A00..U+0A7F	Gurmukhi	128	80	Gurmukhi
0 BMP	U+0A80..U+0AFF	Gujarati	128	91	Gujarati
0 BMP	U+0B00..U+0B7F	Oriya	128	91	Oriya
0 BMP	U+0B80..U+0BFF	Tamil	128	72	Tamil
0 BMP	U+0C00..U+0C7F	Telugu	128	100	Telugu
0 BMP	U+0C80..U+0CFF	Kannada	128	91	Kannada
0 BMP	U+0D00..U+0D7F	Malayalam	128	118	Malayalam
0 BMP	U+0D80..U+0DFF	Sinhala	128	91	Sinhala
0 BMP	U+0E00..U+0E7F	Thai	128	87	Thai (86 characters), Common (1 character)
0 BMP	U+0E80..U+0EFF	Lao	128	83	Lao
0 BMP	U+0F00..U+0FFF	Tibetan	256	211	Tibetan (207 characters), Common (4 characters)
0 BMP	U+1000..U+109F	Myanmar	160	160	Myanmar
0 BMP	U+10A0..U+10FF	Georgian	96	88	Georgian (87 characters), Common (1 character)
0 BMP	U+1100..U+11FF	Hangul Jamo	256	256	Hangul
0 BMP	U+1200..U+137F	Ethiopic	384	358	Ethiopic
0 BMP	U+1380..U+139F	Ethiopic Supplement	32	26	Ethiopic
0 BMP	U+13A0..U+13FF	Cherokee	96	92	Cherokee
0 BMP	U+1400..U+167F	Unified Canadian Aboriginal Syllabics	640	640	Canadian Aboriginal
0 BMP	U+1680..U+169F	Ogham	32	29	Ogham
0 BMP	U+16A0..U+16FF	Runic	96	89	Runic (86 characters), Common (3 characters)
0 BMP	U+1700..U+171F	Tagalog	32	23	Tagalog
0 BMP	U+1720..U+173F	Hanunoo	32	23	Hanunoo (21 characters), Common (2 characters)
0 BMP	U+1740..U+175F	Buhid	32	20	Buhid
0 BMP	U+1760..U+177F	Tagbanwa	32	18	Tagbanwa
0 BMP	U+1780..U+17FF	Khmer	128	114	Khmer
0 BMP	U+1800..U+18AF	Mongolian	176	158	Mongolian (155 characters), Common (3 characters)
0 BMP	U+18B0..U+18FF	Unified Canadian Aboriginal Syllabics Extended	80	70	Canadian Aboriginal
0 BMP	U+1900..U+194F	Limbu	80	68	Limbu
0 BMP	U+1950..U+197F	Tai Le	48	35	Tai Le
0 BMP	U+1980..U+19DF	New Tai Lue	96	83	New Tai Lue
0 BMP	U+19E0..U+19FF	Khmer Symbols	32	32	Khmer
0 BMP	U+1A00..U+1A1F	Buginese	32	30	Buginese
0 BMP	U+1A20..U+1AAF	Tai Tham	144	127	Tai Tham
0 BMP	U+1AB0..U+1AFF	Combining Diacritical Marks Extended	80	31	Inherited
0 BMP	U+1B00..U+1B7F	Balinese

[1]

[2]

[3]

[4]

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[h]