Localisation

Northern Sámi

At present (jan 02), the Northern Sámi project is run in a 7-bit fashion, with digraphs (a1, c1, d1, n1, s1, t1, z1) for the 7 Sámi letters. This is an ad hoc solution. We hope to migrate either to UTF-8 format, or to an 8-bit-format, either ISO-IR 197 or Latin 4. Both Linux localisers and the Xerox tools manuals boast UTF-8 compatibility, thus, in theory, this should be possible. Still, Xerox advices us to use an 8-bit-solution internally. This must be sorted out.

Should we go for UTF-8, the following must be in place:

Latin 4 or ISO-IR 197, the two 8-bit code tables are both supported by iconv, and both contain the required symbols. Of the two, Latin 4 might be better supported, but ISO-IR 197 is a true superset of the alphabetic repertoire of Latin 1, and should thus give no compatibility problems with Latin 1 input. The general name files use virtually every Latin 1 character, the ISO-IR 197 compatibility with Latin 1 makes it better suited than Latin 4. Furhermore, ISO-IR 197 makes it possible to represent Skolt and Inari Sámi as well. In the long run, Unicode and UTF-8 is still the desired output, and migrating directly from 7 bit to UTF-8 seems a better solution. Crucial is Emacs support, shells, etc.

Lule and Southern Sámi

Southern and Lule Sámi are adequately represented in 8-bit format, by Latin 1 (the Lule Sámi files use ñ for n-acute). These files can be carried over both to ISO-IR 197 and (?) Latin 4, or they can be kept in Latin 1.

Inari and Skolt Sámi

Inari Sámi may be represented in the current 8-bit format, with the vowels directly in 8-bit Latin 1 and the consonants as digraphs c1 etc.

Note that it would be much harder to represent Skolt Sámi the way sketched for Inari Sámi. The localisation issue should thus be solved before Skolt Sámi can be included.


Last modified: Fri May 17 00:10:06 CEST 2002