! Divvun & Giellatekno - open source grammars for Sámi and other languages ! Copyright © 2000-2010 The University of Tromsø & the Norwegian Sámi Parliament ! http://giellatekno.uit.no & http://divvun.no ! ! This program is free software; you can redistribute and/or modify ! this file under the terms of the GNU General Public License as published by ! the Free Software Foundation, either version 3 of the License, or ! (at your option) any later version. The GNU General Public License ! is found at http://www.gnu.org/licenses/gpl.html. It is ! also available in the file $GTHOME/LICENSE.txt. ! ! Other licensing options are available upon request, please contact ! giellatekno@hum.uit.no or divvun@samediggi.no Multichar_Symbols @U.DATE.1@ @U.DATE.2@ @U.DATE.3@ @U.DATE.4@ @U.DATE.5@ @U.DATE.6@ @U.DATE.7@ @U.DATE.8@ @U.DATE.9@ @U.DATE.10@ @U.DATE.11@ @U.DATE.12@ @U.DATE.13@ @U.DATE.14@ @U.DATE.15@ @U.DATE.16@ @U.DATE.17@ @U.DATE.18@ @U.DATE.19@ @U.DATE.20@ @U.DATE.21@ @U.DATE.22@ @U.DATE.23@ @U.DATE.24@ @U.DATE.25@ @U.DATE.26@ @U.DATE.27@ @U.DATE.28@ @U.DATE.29@ @U.DATE.30@ @U.DATE.31@ +Use/NG ! Do not generate, for isme-ped.fst and apertium +String ! Tag to denote non-numeric strings +NumNum ! Tag to denote real numbers in one form or another ! Note! the diacritic marks in this file are only part of the lower ! representation. This means they can only be used from number to letter ! conversion of date and month representation, not the other way around. ! In order to get a two-way conversion, they must be added to the ! upper level as well. LEXICON Root ! < [a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|q|y|z|æ|ø|å|ä|ö|á|č|đ|ŋ|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z|Æ|Ø|Å|Ä|Ö|Á|Č|Đ|Ŋ|Š|Ŧ|Ž]* > # ; ! This first line is to allow all letter strings. ! The regex in the previous line is now commented out. If you want this fst to ! pass through all letters untouched (but transform all numbers) ! then comment it in again. It becomes critical for the invers sme-inum.fst, that ! does not work with this line. If it really is needed, we might filter it out ! for sme-inum.fst. NUMBERSECTION ; ! Numerals COMMASECTION ; ! Separator section DATESECTION ; ! Date expressions (for TTS) ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! Number section !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! LEXICON NUMBERSECTION HUNDREDSM ; ! 200M čuođi:1 HUNDREDM ; ! 100M TENSM ; ! 20-99M TEENSM ; ! 10-19M ONESM ; ! 1-9M ONESG ; ! 1-9G HUNDREDST ; ! 200000-999999 čuođi:1 HUNDREDT ; ! 100000-100999 TENST ; ! 20000-99999,10000-10999 TEENST ; ! 11000-19999 ONEST ; ! 2000-9999 duhát:1 THOUSAND ; ! 1000-1999 UNDERTHOUSAND ; ! 100-999 TENS ; ! 20-99 TEENS ; ! 10-19 ONES ; ! 1-9 NUMJAHKASAS ; LEXICON ONESG miljárda:1 OVERMILLIONS ; guokte:2 GIGA ; golbma:3 GIGA ; njeallje:4 GIGA ; vihtta:5 GIGA ; guhtta:6 GIGA ; čieža:7 GIGA ; gávcci:8 GIGA ; ovcci:9 GIGA ; LEXICON GIGA miljárdda: OVERMILLIONS ; LEXICON HUNDREDSM guokte:2 CUODIM ; golbma:3 CUODIM ; njeallje:4 CUODIM ; vihtta:5 CUODIM ; guhtta:6 CUODIM ; čieža:7 CUODIM ; gávcci:8 CUODIM ; ovcci:9 CUODIM ; LEXICON CUODIM čuođi: HUNDREDM ; LEXICON HUNDREDM TENSM ; TEENSM ; :%0 ONESM ; :%0%0 MILJON ; LEXICON TEENSM :1 TEENM ; LEXICON TEENM okta:1 LOHKAIM ; akta+Use/NG:1 LOHKAIM ; guokte:2 LOHKAIM ; golbma:3 LOHKAIM ; njeallje:4 LOHKAIM ; vihtta:5 LOHKAIM ; guhtta:6 LOHKAIM ; čieža:7 LOHKAIM ; gávcci:8 LOHKAIM ; ovcci:9 LOHKAIM ; LEXICON LOHKAIM nuppelohkái: MILJON ; LEXICON TENSM logi:1%0 MILJON ; guokte:2 LOGIM ; golbma:3 LOGIM ; njeallje:4 LOGIM ; vihtta:5 LOGIM ; guhtta:6 LOGIM ; čieža:7 LOGIM ; gávcci:8 LOGIM ; ovcci:9 LOGIM ; LEXICON LOGIM logi:%0 MILJON ; logi: ONESM ; LEXICON ONESM miljon:1 OVERTHOUSANDS ; guokte:2 MILJON ; golbma:3 MILJON ; njeallje:4 MILJON ; vihtta:5 MILJON ; guhtta:6 MILJON ; čieža:7 MILJON ; gávcci:8 MILJON ; ovcci:9 MILJON ; LEXICON MILJON miljovnna: OVERTHOUSANDS ; ! ================= ! Under the million ! ================= LEXICON OVERMILLIONS HUNDREDSM ; čuođi:1 HUNDREDM ; :%0 TENSM ; :%0 TEENSM ; :%0%0 MILJON ; :%0%0 ONESM ; :%0%0%0 HUNDREDST ; ! x00.000.000 čuođi:%0%0%01 HUNDREDT ; ! 100.000.000 :%0%0%0%0 TENST ; ! 20.000.000 :%0%0%0%0 TEENST ; ! 10.000.000 :%0%0%0%0%0 ONEST ; ! 2.000.000 duhát:%0%0%0%0%01 THOUSAND ; ! 1.000.000 :%0%0%0%0%0%0 UNDERTHOUSAND ; :%0%0%0%0%0%0%0 TENS ; :%0%0%0%0%0%0%0 TEENS ; :%0%0%0%0%0%0%0%0 ONES ; :%0%0%0%0%0%0%0%0%0 COMMA ; ! x000000 LEXICON OVERTHOUSANDS HUNDREDST ; ! x00.000.000 čuođi:1 HUNDREDT ; ! 100.000.000 :%0 TENST ; ! 20.000.000 :%0 TEENST ; ! 10.000.000 :%0%0 ONEST ; ! 2.000.000 duhát:%0%01 THOUSAND ; ! 1.000.000 :%0%0%0 UNDERTHOUSAND ; :%0%0%0%0 TENS ; :%0%0%0%0 TEENS ; :%0%0%0%0%0 ONES ; :%0%0%0%0%0%0 COMMA ; ! x000000 LEXICON HUNDREDST guokte:2 CUODIT ; ! 200000-299999 golbma:3 CUODIT ; ! 300000-399999 njeallje:4 CUODIT ; ! 400000-499999 vihtta:5 CUODIT ; ! 500000-599999 guhtta:6 CUODIT ; ! 600000-699999 čieža:7 CUODIT ; ! 700000-799999 gávcci:8 CUODIT ; ! 800000-899999 ovcci:9 CUODIT ; ! 900000-999999 LEXICON CUODIT čuođi: HUNDREDT ; ! 100000-199999 LEXICON HUNDREDT ! X = 1-9, Y = 0-9 TENST ; ! X2XYYY, X10YYY TEENST ; ! X1XYYY okta:%01 THOUSANDS ; akta+Use/NG:%01 THOUSANDS ; :%0 ONEST ; ! XX0YYY :%0%0 THOUSANDS ; ! X00YYY LEXICON TEENST :1 TEENT ; LEXICON TEENT logi:%0 THOUSANDS ; okta:1 LOHKAIT ; akta+Use/NG:1 LOHKAIT ; guokte:2 LOHKAIT ; golbma:3 LOHKAIT ; njeallje:4 LOHKAIT ; vihtta:5 LOHKAIT ; guhtta:6 LOHKAIT ; čieža:7 LOHKAIT ; gávcci:8 LOHKAIT ; ovcci:9 LOHKAIT ; LEXICON LOHKAIT nuppelohkái: THOUSANDS ; LEXICON TENST guoktelogi:2 LOGIT ; golbmalogi:3 LOGIT ; njealljelogi:4 LOGIT ; vihttalogi:5 LOGIT ; guhttalogi:6 LOGIT ; čiežalogi:7 LOGIT ; gávccilogi:8 LOGIT ; ovccilogi:9 LOGIT ; LEXICON LOGIT :%0 THOUSANDS ; okta:1 THOUSANDS ; akta+Use/NG:1 THOUSANDS ; ONEST ; LEXICON ONEST guokte:2 THOUSANDS ; golbma:3 THOUSANDS ; njeallje:4 THOUSANDS ; vihtta:5 THOUSANDS ; guhtta:6 THOUSANDS ; čieža:7 THOUSANDS ; gávcci:8 THOUSANDS ; ovcci:9 THOUSANDS ; LEXICON THOUSANDS ! x > 1 duhát: THOUSAND ; LEXICON THOUSAND UNDERTHOUSAND ; ! 1100-1999 :%0 TENS ; ! 1020-1099 :%0 TEENS ; ! 1010-1019 :%0%0 ONES ; ! 1001-1009 :%0%0%0 COMMA ; ! 1000 !=========================== !Here starts the 999 numbers !=========================== LEXICON UNDERTHOUSAND HUNDREDS ; čuođi:1 HUNDRED ; čuođát:1 CUODAAT ; LEXICON HUNDREDS guokte:2 CUODI ; golbma:3 CUODI ; njeallje:4 CUODI ; vihtta:5 CUODI ; guhtta:6 CUODI ; čieža:7 CUODI ; gávcci:8 CUODI ; ovcci:9 CUODI ; LEXICON CUODI čuođi: HUNDRED ; čuođát: CUODAAT ; LEXICON HUNDRED TENS ; TEENS ; :%0 ONES ; :%0%0 COMMA ; LEXICON CUODAAT :%0%0. COMMA ; LEXICON TEENS :1 TEEN ; LEXICON TEEN okta:1 LOHKAI ; akta+Use/NG:1 LOHKAI ; guokte:2 LOHKAI ; golbma:3 LOHKAI ; njeallje:4 LOHKAI ; vihtta:5 LOHKAI ; guhtta:6 LOHKAI ; čieža:7 LOHKAI ; gávcci:8 LOHKAI ; ovcci:9 LOHKAI ; LEXICON LOHKAI nuppelohkái: COMMA ; nuppelot: JAHKASAS ; :%. LOGAAT ; LEXICON LOGAAT nuppelogát: COMMA ; LEXICON TENS logi:1%0 COMMA ; logi:1%0 JAHKASAS ; logát:1%0%. COMMA ; guokte:2 LOGI ; golbma:3 LOGI ; njeallje:4 LOGI ; vihtta:5 LOGI ; guhtta:6 LOGI ; čieža:7 LOGI ; gávcci:8 LOGI ; ovcci:9 LOGI ; LEXICON LOGI logi:%0 COMMA ; logi: NUMJAHKASAS ; !logát:%0 COMMA ; logi: ONES ; lot:%0 JAHKASAS ; lot:%0 LOHKU ; LEXICON NUMJAHKASAS ! experiment for TTS guovtte:2 JAHKASAS ; golmma:3 JAHKASAS ; njealje:4 JAHKASAS ; viđa:5 JAHKASAS ; guđa:6 JAHKASAS ; čieža:7 JAHKASAS ; gávcci:8 JAHKASAS ; ovcci:9 JAHKASAS ; LEXICON LOHKU ! experiment for TTS logu:-logu ENDLEX ; logus:-logus ENDLEX ; lohkui:-lohkui ENDLEX ; loguin:-loguin ENDLEX ; LEXICON JAHKASAS ! experiment for TTS jahkása:-jahkása JAHKASAS_CASE ; gearddása:-gearddása JAHKASAS_CASE ; mánnosa:-mánnosa JAHKASAS_CASE ; vahkkosa:-vahkkosa JAHKASAS_CASE ; LEXICON JAHKASAS_CASE ! experiment for TTS š ENDLEX ; čča ENDLEX ; ččat ENDLEX ; ččaid ENDLEX ; žžii ENDLEX ; ččaide ENDLEX ; ččas ENDLEX ; ččain ENDLEX ; ččaiguin ENDLEX ; žžan ENDLEX ; LEXICON ONES CARDINAL ; ORDINAL ; LEXICON CARDINAL okta:1 COMMA ; akta+Use/NG:1 COMMA ; guokte:2 COMMA ; golbma:3 COMMA ; njeallje:4 COMMA ; vihtta:5 COMMA ; guhtta:6 COMMA ; čieža:7 COMMA ; gávcci:8 COMMA ; ovcci:9 COMMA ; LEXICON ORDINAL vuosttaš:1%. COMMA ; nubbi:2%. COMMA ; goalmmát:3%. COMMA ; njealját:4%. COMMA ; viđát:5%. COMMA ; guđát:6%. COMMA ; čihččet:7%. COMMA ; gávccát:8%. COMMA ; ovccát:9%. COMMA ; ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! Comma section !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! LEXICON COMMA ENDLEX ; COMMASECTION ; LEXICON COMMASECTION ENDLEX ; % komma% :, NUMBERSECTION ; % čuokkis% :%. NUMBERSECTION ; ! these will overlap with dates % kolon% :%: NUMBERSECTION ; % sárggis% :%- NUMBERSECTION ; % lea% :%= NUMBERSECTION ; % gráda% :%° NUMBERSECTION ; % paragráfa% :§ NUMBERSECTION ; % násti% :%* NUMBERSECTION ; % ja% :& NUMBERSECTION ; ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! Date section !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!! !!!!!!!!!!!!!!!!!!! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! [(0)[1-9]|[12][0-9]|3[01]]\.[3[01]]\.[([12][0-9])[0-9][0-9]]] ! 1.1.1934 = ođđajagimánu vuosttaš beaivi ovccinuppelotčuođigolbmaloginjeallje ! 3.11.2009 = skábmamánu goalmmát beaivi guokteduhátovcci ! 5.6.09 = geassemánu viđát beaivi nulla ovcci ! are there other formats in use? ! E.g. ská. 3. - ie combinations of abbr text and numbers? ! what about other orders? In Sweden, the year typically comes first: ! 2009.11.3 ! Is this order found in Sámi texts, eg. from Sweden? ! and irrespective of written order, should the spoken order always be the same? ! Strategy: ! Use flag diacritics: Do the dates in one batch and the months in another ! Under this regime, we do this: ! skábmamánu { goalmmát | njealjjat } ! 3.@U.d.3@ 11. @U.d.3@ @U.d.4@ ! Then, the flag diacritic will remove non-harmonic strings @R.3@...@D.4@ ! And only the correct 3-3 string will be left intact. LEXICON DATESECTION :%0 SMALLDATEDAYMARK ; ! Eventual leading zero for one-digit day SMALLDATEDAYMARK ; ! No leading zero for one-digit day LARGEDATEDAYMARK ; ! No leading zero for two-digit day LEXICON SMALLDATEDAYMARK :@U.DATE.1@1 DATEMONTHSEPARATOR ; :@U.DATE.2@2 DATEMONTHSEPARATOR ; :@U.DATE.3@3 DATEMONTHSEPARATOR ; :@U.DATE.4@4 DATEMONTHSEPARATOR ; :@U.DATE.5@5 DATEMONTHSEPARATOR ; :@U.DATE.6@6 DATEMONTHSEPARATOR ; :@U.DATE.7@7 DATEMONTHSEPARATOR ; :@U.DATE.8@8 DATEMONTHSEPARATOR ; :@U.DATE.9@9 DATEMONTHSEPARATOR ; LEXICON LARGEDATEDAYMARK :@U.DATE.10@1%0 DATEMONTHSEPARATOR ; :@U.DATE.11@11 DATEMONTHSEPARATOR ; :@U.DATE.12@12 DATEMONTHSEPARATOR ; :@U.DATE.13@13 DATEMONTHSEPARATOR ; :@U.DATE.14@14 DATEMONTHSEPARATOR ; :@U.DATE.15@15 DATEMONTHSEPARATOR ; :@U.DATE.16@16 DATEMONTHSEPARATOR ; :@U.DATE.17@17 DATEMONTHSEPARATOR ; :@U.DATE.18@18 DATEMONTHSEPARATOR ; :@U.DATE.19@19 DATEMONTHSEPARATOR ; :@U.DATE.20@2%0 DATEMONTHSEPARATOR ; :@U.DATE.21@21 DATEMONTHSEPARATOR ; :@U.DATE.22@22 DATEMONTHSEPARATOR ; :@U.DATE.23@23 DATEMONTHSEPARATOR ; :@U.DATE.24@24 DATEMONTHSEPARATOR ; :@U.DATE.25@25 DATEMONTHSEPARATOR ; :@U.DATE.26@26 DATEMONTHSEPARATOR ; :@U.DATE.27@27 DATEMONTHSEPARATOR ; :@U.DATE.28@28 DATEMONTHSEPARATOR ; :@U.DATE.29@29 DATEMONTHSEPARATOR ; :@U.DATE.30@3%0 DATEMONTHSEPARATOR ; :@U.DATE.31@31 DATEMONTHSEPARATOR ; LEXICON DATEMONTHSEPARATOR % :%. MONTHZERO ; LEXICON MONTHZERO :%0 SMALLMONTH ; ! Eventual leading zero SMALLMONTH ; ! no leading zero LARGEMONTH ; LEXICON SMALLMONTH ođđajágimánu:1 DATEDAYSEPARATOR ; guovvamánu:2 DATEDAYSEPARATOR ; njukčamánu:3 DATEDAYSEPARATOR ; cuoŋománu:4 DATEDAYSEPARATOR ; miessemánu:5 DATEDAYSEPARATOR ; geassemánu:6 DATEDAYSEPARATOR ; suoidnemánu:7 DATEDAYSEPARATOR ; borgemánu:8 DATEDAYSEPARATOR ; čakčamánu:9 DATEDAYSEPARATOR ; LEXICON LARGEMONTH golggotmánu:1%0 DATEDAYSEPARATOR ; skábmamánu:11 DATEDAYSEPARATOR ; juovlamánu:12 DATEDAYSEPARATOR ; LEXICON DATEDAYSEPARATOR % :%. DATENAME ; LEXICON DATENAME vuosttaš:@U.DATE.1@ DAYTEXT ; nubbi:@U.DATE.2@ DAYTEXT ; goalmmát:@U.DATE.3@ DAYTEXT ; njealját:@U.DATE.4@ DAYTEXT ; viđát:@U.DATE.5@ DAYTEXT ; guđát:@U.DATE.6@ DAYTEXT ; čihččet:@U.DATE.7@ DAYTEXT ; gávccát:@U.DATE.8@ DAYTEXT ; ovccát:@U.DATE.9@ DAYTEXT ; logát:@U.DATE.10@ DAYTEXT ; oktanuppelogát:@U.DATE.11@ DAYTEXT ; aktanuppelogát+Use/NG:@U.DATE.11@ DAYTEXT ; guoktenuppelogát:@U.DATE.12@ DAYTEXT ; golbmanuppelogát:@U.DATE.13@ DAYTEXT ; njealljenuppelogát:@U.DATE.14@ DAYTEXT ; vihttanuppelogát:@U.DATE.15@ DAYTEXT ; guhttanuppelogát:@U.DATE.16@ DAYTEXT ; čiežanuppelogát:@U.DATE.17@ DAYTEXT ; gávccinuppelogát:@U.DATE.18@ DAYTEXT ; ovccinuppelogát:@U.DATE.19@ DAYTEXT ; guoktelogát:@U.DATE.20@ DAYTEXT ; guoktelogivuosttaš:@U.DATE.21@ DAYTEXT ; guokteloginubbi:@U.DATE.22@ DAYTEXT ; guoktelogigoalmmát:@U.DATE.23@ DAYTEXT ; guokteloginjealját:@U.DATE.24@ DAYTEXT ; guoktelogiviđát:@U.DATE.25@ DAYTEXT ; guoktelogiguđát:@U.DATE.26@ DAYTEXT ; guoktelogičihččet:@U.DATE.27@ DAYTEXT ; guoktelogigávccát:@U.DATE.28@ DAYTEXT ; guoktelogiovccát:@U.DATE.29@ DAYTEXT ; golbmalogát:@U.DATE.30@ DAYTEXT ; golbmalogivuosttaš:@U.DATE.31@ DAYTEXT ; LEXICON DAYTEXT % beaivi% : DATEYEAR ; LEXICON DATEYEAR guokte:2 THOUSANDS ; ! 2000 -> golbma:3 THOUSANDS ; ! etc.-> njeallje:4 THOUSANDS ; vihtta:5 THOUSANDS ; guhtta:6 THOUSANDS ; čieža:7 THOUSANDS ; gávcci:8 THOUSANDS ; ovcci:9 THOUSANDS ; YEARTEENS ; ! 1000 - 1999 YEARHUNDREDS ; ! 100-999 TWODIGITYEAR ; ! 10 - 99 nolla% :%0 ONES ; ! 01-09 ENDLEX ; ! no year LEXICON YEARTEENS :1 YEARTEEN ; LEXICON YEARTEEN okta:1 YEARLOGAI ; akta+Use/NG:1 YEARLOGAI ; guokte:2 YEARLOGAI ; golbma:3 YEARLOGAI ; njeallje:4 YEARLOGAI ; vihtta:5 YEARLOGAI ; guhtta:6 YEARLOGAI ; čieža:7 YEARLOGAI ; gávcci:8 YEARLOGAI ; ovcci:9 YEARLOGAI ; LEXICON YEARLOGAI nuppelohkái: TWODIGITYEAR ; LEXICON YEARHUNDREDS okta:1 YEARHUNDRED ; akta+Use/NG:1 YEARHUNDRED ; guokte:2 YEARHUNDRED ; golbma:3 YEARHUNDRED ; njeallje:4 YEARHUNDRED ; vihtta:5 YEARHUNDRED ; guhtta:6 YEARHUNDRED ; čieža:7 YEARHUNDRED ; gávcci:8 YEARHUNDRED ; ovcci:9 YEARHUNDRED ; LEXICON YEARHUNDRED čuođi: TWODIGITYEAR ; LEXICON TWODIGITYEAR TENS ; ! 20-99 TEENS ; ! 10-19 LEXICON ENDLEX # ;