BKUNICOD.RVW 960907 "The Unicode Standard, Version 2.0", The Unicode Consortium, 1996, 0-201-483455-9 %A The Unicode Consortium unicode-inc@unicode.org %C 1 Jacob Way, Reading, MA 01867-9984 %D 1996 %G 0-201-483455-9 %I Addison-Wesley Publishing Co. %O 800-527-5210 617-944-3700 bkexpress@aw.com %T "The Unicode Standard, Version 2.0" In the dim and distant past, the late, and generally unlamented, SUZY information system was born in Vancouver. Rather an oddball as far as online services went, one "feature" was that the programmer had tried to allow for the use of all of the IBM graphics characters. This lead to an entirely new field of "smiley" or "emoticon" (emotional icon) endeavors. Instead of the usual sideways happy face of the colon, hyphen and right parenthesis; ":-)"; we were able to use the "Ctrl-A" alternative of the IBM PC character set. Having a decimal value of one, this character is an upright happy face. This allowed other expansions, such as Ctrl-A and the right square bracket, which looks like a face and a telephone handset, and was used (usually in the "chat" modes) for "I am on the phone." "How nice," I hear you mutter between clenched teeth. "Can we now get on with the review?" Patience, stout nerds. This *is* the review. As SUZY users, particularly those who had been introduced to computer communications on the system, moved on to other services or local bulletin boards, they were usually quite shocked to find that their favorite symbols no longer worked. The little diamond (Ctrl-C) would kill a message on a VAX. Fidonet users might find that the cute tagline they had formed from graphics characters completely disappeared when they sent the message through an Internet gateway. ASCII (the American Standard Code for Information Interchange) is widely, and mistakenly, believed to define two hundred and fifty-six characters. It doesn't. Furthermore, of the hundred and twenty-eight characters it does define, many are "control" rather than printable characters. (The "card suit" symbols on the IBM PC graphics set are defined as "end of text", "end of transmission", "enquiry" and "acknowledgement" under the real ASCII standard.) In addition, many believe ASCII to be a universal standard; also not true. An octet with the decimal value thirty-five, for example, is the number sign (sometimes called an "octothorpe") in the United States, but a pound sign (the British currency) in Britain. As with most fields of computer endeavour, the nice thing about standards is that there are so many to choose from. Many vary only slightly -- but they vary. The point is that there are a number of symbols which we commonly know, but which cannot be consistently displayed on terminals or printers. Certain terminals will have certain "international" character sets, but not all are identical. Accents and other phonetic modifiers may be difficult to handle: entire character sets are given over strictly to accented characters. (In Canada we are acutely aware of the problems, with "French" keyboards used at many sites. On one, I was having difficulty finding some necessary punctuation marks for network addressing, and asked a Francophone programmer for help. "Who knows," he growled, "I never use the ____ things!") Unicode seeks to address this problem. Including not only the variations on the Latin alphabet, Unicode incorporates Greek, Cyrillic, Hebrew and other alphabets. It also includes punctuation, diacriticals, mathematical and scientific symbols and miscellaneous graphics. Asian ideographs are also assigned codes. This is no longer suitable, of course, for a seven-bit code, and Unicode is based on a sixteen-bit address space. The book gives some background, general principles, rules for conformance, character properties, and implementation guidelines in the first five chapters. To comment on these in any meaningful way would be to rewrite the book. This is technical material, though not the same technology that computer types are used to. Some background study in linguistics would be a good idea, although it is not strictly necessary to understand and use the Unicode standard. There are, however, a wealth of symbols, punctuation marks and typesetting codes which Unicode gives standardized access to. On the other hand, any application which used the standard in a significant way would likely require a linguistics background in any case. The bulk of the book (now reduced to a single volume) is, of course, taken up with the actual code charts. The charts are augmented with verbal definitions of the symbols, and with cross references to similar forms. A CD-ROM is included, which contains (ASCII) text only code charts, as well as Unicode documents for the full charts. The Unicode standard is recent. In comparative terms its current usage is negligible. However, it is the defacto standard for broadly based international character sets. copyright Robert M. Slade, 1993, 1996 BKUNICOD.RVW 960907 ====================== roberts@decus.ca rslade@vcn.bc.ca slade@freenet.victoria.bc.ca link to virus, book info at http://www.freenet.victoria.bc.ca/techrev/rms.html Author "Robert Slade's Guide to Computer Viruses" 0-387-94663-2 (800-SPRINGER)