!!!TODO for maintaining our dictionaries !!Intro At present, we have five bilingual dictionaries: * smesmj: North Sámi - Lule Sámi (words/dicts/smesmj/smesmj.xml) * smenob: North Sámi - Bokmål Norwegian (words/dicts/smenob/smenob.xml) * fkvnob: Kven - Bokmål (kvensk/fkvnob/fkvnob.xml) * nobfkv: Bokmål - Kven (kvensk/nobfkv/nobfkv.xml) * kaldan: Greenlandic - Danish (st/kal/src/kaldan-lex.txt) * komfin: Komi - Finnish,English (kt/kom/src/kom-lex.xml We would like them to serve several purposes : # As resources for a glossing function for the analysis # As input to an online dictionary via XQuery # As transfer lexicon for Machine Translation # As input to browseable html dictionaries online Status quo as of 08.08.08 # Glossing function for the analysis: smenob and kaldan are in function as just that # Online dictionary via XQuery: in principle up and running, but g5 is now down # Machine Translation: project not started # Browseable html dictionaries online: html ready whenever xml, but not online The main obstacle has been the internal discussion at the Sámi parliament, now it is the machine situation. We will have to discuss schedule with Sjur. !!Common challenges !Feedback We need feedback on the dictionary work. One posibility would be to set up a beta page for all these dictionary projects, either one beta page with links to all of them, or a separate page for each language. __TODO:__ * Set up an XQuery framework for all of them * Make the neccesary pages * A further development would be to direct inflected wordforms to the dictionary via an analysis function Status quo: We have a framework, but it is cumbersome. cf: {{gtsvn/xtdoc/gtuit/src/documentation/content/xdocs/dict.eng.xml}} (forthcoming in docu) We have a dictionary containing inflected wordforms, the smenob Mac OS 10.5 dictionary. !Work environment The lexicographers workbench is still an open question. Our plan was to use XMLEditor, now, it seems it has a problem when the dictionary files become too large. One alternative is Oxygen. This is not a free program, and it is unclear whether it offers a graphical interface (via xscheme). TODO: Look into this issue, test out alternative editors. * We could make an in-house application, based on perl/QT !Infrastructure __TODO__: * eXist on G5 - done * risten.no on G5 - homepage done, but needs port tweaking/changing ** some work, possibly with Børre * risten.no XQuery framework - NOT done ** this should be easy and quick * XInclude conversion script - NOT done ** this might take some work * homepage on the documetation pages - draft done * Make a routine for synchronising eXist and dictionaries. Make a routine for synchronising eXist and dictionaries. The dictionaries are stored here (name to the left of '=' refers to the catalogue in termdb/src/db-colls/dicts/, and path to the right refers to the xml file accessed by the interface. * catalogue = original * fkvnob = kvensk/fkvnob/fkvnob.xml * kaldan = st/kal/src/kaldan-lex.txt (format kaldan) * komi-JMR = kt/kom/src/kom-lex.xml * kotussanalista (this one is not updated) * nobfkv = kvensk/nobfkv/nobfkv.xml * nobsme = (words/dicts/nobsme/src/nobsme.xml) (does not exist yet) * smenob = words/dicts/smenob/src/smenob.xml * smesmj = words/dicts/smesmj/src/smesmj.xml (format smesmjpos) * smjsme = (words/dicts/smjsme/src/smjsme.xml) (does not exist yet) !!smesmj * words/dicts/smenob/smenob.xml The smesmj.xml file is still formatted as a csv file. Since the two languages are very similar, a transfer lexicon basically needs to contain singleton lemma pairs. TODO: Look at the Makefile and make it better (well, it works) !!smenob * words/dicts/smenob/smenob.xml * words/dicts/smenob/Makefile smenob.xml file is an xml file. It is based upon input from a school dictionary (Nils Jernsletten) and the risten.no termbase. We are about to write translations for the missing entries. TODO: * the smenob.xml file should be sorted. We need a function sortdic to keep xml files sorted * The Makefile should make targets smenob.fst, smenob.html ** smenob.fst: pair lemma and first translation ( and ) ** smenob.html: represent the content of the * The smenob.xml file should be set up to serve a glossing function, like the kaldan one ** In order to do that, we need another version of lookup2cg, a version which does not make new compounds. Its rating system should otherwise be kept (choose the reading with less compound boundaries), but it should not make new compounds. The reason for this is that smenob.xml then may be able to find the parts of the compound, and give translations for them. * The smenob.xml file should be exported to XQuery, like the komfin one Today the Makefile contains some amataurish attempts at fulfilling the first of these goals, via a shellscript. Making a Twig routine for pick !!fkvnob This is an xml file with accompanying dtd, css and xsl. It is part of our cooperation with Kvensk Institutt, and is based upon Terje Aronsen's dictionary. The dictionary is now being completed at Kvensk Institutt. In order to facilitate work, and also to get comments, we need a beta version of it online. TODO: * Put the fkvnob dictionary online via the XQuery franework !!komfin This project is not active at the moment, it serves as example for how to add the other dictionaries to the XQuery format. The Komi file is in xml format. TODO: * Add this dictionary to the dictionary beta page !!kaldan The Greenlandic file stems from an Access database in Nuuk, we will not edit our version. It serves as example for the smenob glossing function. !!nobsme Immediately after we publish smenob, there will be a demand for nobsme. The dictionary nobsme should be made in the following way: Quasicode for creating nobsme.xml: # The xml structure is the same as for smenob.xml. # The source file is words/dicts/smenob/src/smenob.xml # If within there is one , make a new as follows: ## -> ## -> ## -> ## -> # If within there are more than one , then make one for each ## for each : ## -> ## -> ## -> ## ->