!!!Meeting setup * Date: 12.3.2008 * Time: 09.30 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat/Skype !!!Agenda Cf. one of the following, depending on context: * the upper bar of the SEE window (provided you use the JSPWiki syntax mode) * the TOC in Forrest-rendered output, like HTML and PDF !!!Opening, agenda review, participants Opened at 10:30. Present: __Børre, Per-Eric, Sjur, Thomas, Tomi__ Absent: __Maaren, Trond__ Agenda accepted as is. !!!Updated task status since last meeting !!Børre * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * Hunspell lexicon conversion * InDesign documentation * investigate the NSIS installer * give contract with blank fields to __Per-Eric__ * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project !!Maaren * Put the list of possible {{sma}} corpus sources into a document * update the ''Changes'' document !!Per-Eric * keep the contact with Kurt Tores family about his texts. ** Nothing * try to find other authors who have smj texts digitaly, send contracts to them ** Working on it, find some * Work with a few things missing from the bible texts. ** Worked and still working * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** Nothing done !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur * start to reorganise the documentation ** nothing * gather {{sma}} texts ** presented the project in Trondheim, will get some text as a result * improve forrest stability with i18n, site look ** nothing * set up the Leopard Server features for collaborative support ** nothing * name db/risten.no ** nothing useful * investigate the NSIS installer ** nothing * make a first {{sma}} project plan ** nothing new * publish corpus contracts and project infra as open-source on NoDaLi-sta ** nothing * call Julie Eira about Maaren's share of the Divvun project ** done. Maaren can do as much as she finds time for, and as much as we need her * prepare Trondheim presentation ** done * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas * look at test cases still not behaving properly ** not much done * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** worked some !!Tomi * Hunspell lexicon conversion ** not done * document how compounding is controlled in the PLX conversion ** not done * fix double hyphen bugs ** not done * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** not done * other ** worked with XML processing ** initial work for making sfst transducer(s) !!Trond * Reorganise documentation (with Børre and Sjur) ** Fiddling around myself, awaiting a plenary discussion. * Gather sma texts (with Børre and Sjur) ** Not done * Name lexicon project: Test editing xml files (when they are ready for it) ** Waiting * Work on {{sma}} analyser and visl integration ** Huge progress here. Started including the Errata sheet from Verbh, still things to be done. * [fix bugs!|http://giellatekno.uit.no/bugzilla]. !!!Pedagogical software online __TODO:__ * get an easy-to-remember URL (__UiT/IT__) * More thorough skin, layout, ... (__External person within the Ped team__, __Internal forrest expert__) This we will postpone until later !!!Documentation __TODO:__ * start to reorganise the documentation (__Børre, Sjur, Trond__) !!!Corpus gathering The Trondheim visit and presentation has brought both {{smj}} and {{sma}} texts. The {{smj}} texts are translations by __Samuel Gælok__ of public reports, for which also {{sme}} translations should exists (as well as an English), and the Norwegian original. __TODO:__ * follow-up on the {{smj}} texts from __Kurt Tore__ (__Per-Eric__) * get texts from __Sigga Tuolja Sandstrøm__, possibly through __Olavi Korhonen__ (contract is ok now) (__Per-Eric__) * other contacts: Nord-Salten avis, Børge Strandskog, Lena Davidsson daughter to Lars-Matto Tuolja * gather {{sma}} texts (__Børre, Sjur, Trond, Joseph__) ** __Sjur__ got some important contacts in Trondheim * Put the list of possible corpus sources into a document {{gt/doc/lang/sma/sma-corpus-plan.jspwiki}} (__Maaren__) * give contract with blank fields to __Per-Eric__ (__Børre__) !!!Infrastructure __TODO:__ * add Jabber account in iChat (__all__) * improve forrest stability with i18n, site look (__Børre, Sjur, Tomi__) * set up the Leopard Server features for collaborative support - permanent chat rooms, project calendar(s), wiki? (__Børre, Sjur__) !!!Linguistics !!North Sámi Hyphenation bugs still there, now properly documented by the improved test bench. !!Lule Sámi All missing lists now added. A few things missing from the bible texts. Hyphenation: same as for {{sme}}. __TODO:__ * {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and increase {{smj}} coverage (__Trond, Svenne__). * Add the words when all words are ready. !!South Sámi __Tomi__ has added even more verbs. The corrections to ''Verbh'' are now added to the lexicon. __Joseph__ is hunting for {{sma}} texts at SD. !!!Name lexicon infrastructure __TODO:__ # fix i18n bug in risten.no/G5 (so they will work without the proper locale request) (__Sjur__) ## it works ok locally, set-up / config needs to be checked on the G5; probably easy to fix ### it works the same both locally and on the G5, relates to i18n setup in forrest # fix bugs in lexc2xml; add comments to the log element (__Saara__) # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # implement data synchronisation between [risten.no|http://www.risten.no] and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!Proofing tools !!Hunspell __TODO:__ # add {{smj}} to the soup, make sure it works roughly as good as {{sme}} # fix the remaining conversion bugs for {{sme}} # return to {{smj}}, and fix whatever is left to fix # integrate the derivations as separate "continuation lexicons" !!Testing !Spelling Error Markup __TODO:__ * Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (__Saara__) * test new and nested error markup (__Sjur__) !!Speller bugs Open issues based on test results : !sme Version: __Davvisámi, version 1.0.1, 2008-02-17__ * 426 - comp words from Divvun.no - ''guoktedássásaš'' accepted - still open * 435 - roman number - single letter numbers now recognised ** we should pregenerate all numbers once and for all, and store them in a separate lexicon file * 595 - prefix+name wihtout hyphen (''ovdaLot'' instead of ''ovda-Lot'') * 600 - __REGRESSION:__ gen+hyph compound ''sámi-dáru'' * 603 - suomabealdi, norggabealdi accepted * 606 - speller accepts VUOHTA compound * 607 - acro + hyphen **''NRKGA'' is acro + clitic accepted without colon - what is correct? * 611 - double hyphen sugg still accepted * 613 - short gen. as second compound part * 619 - numerals and pronouns to NAMÁK and SASJ fails * 627 - prefix + hyhpen does not get accepted * 629 - ''a'' taking part in compounding without hyphen * 633 - double hyphens accepted in Word, not by cmdline speller * 634 - PropGen+hyph+PropGen * 641 - numeral+noun compounds * 642 - noun/adj/proper + hyphen + ain * 644 - cased numeral+numeral compund * 646 - adverb + hyphen + noun * 647 - numerals+NOUN * 648 - unmotivated suggestions with numeral+noun * 649 - name + adj compound without hyphen * 654 - speller does not recognize ordinals on -nuppelogát * 655 - pron + nai * 658 - Suggestion saame * 660 - abbr. not recognised !smj Version: __Julevsáme, version 1.0.1, 2008-02-14__ * 435 - roman number - single letter numbers now recognised ** we should pregenerate all numbers once and for all, and store them in a separate lexicon file * 595 - prefix+name wihtout hyphen (''tsåhkeLot'' instead of ''tsåhke-Lot'') * 600 - __REGRESSION:__ gen+hyph compound ''sáme-dáro'' * 607 - acro + hyphen **''NRKGA'' is acro + clitic accepted without colon - what is correct? * 616 - Bispadime-me-ráden - still __OPEN__, try to find an acro or abbr ''me'' * 619 - numerals and pronouns to NAMÁK and SASJ fails - still __OPEN__ * 629 - ''a'' taking part in compound - still __OPEN__ * 634 - rop gen + hyphen + Prop gen - still __OPEN__ * 641 - numeral+noun compounds - still __OPEN__ * 644 - cased numeral+numeral compund * 647 - numerals+NOUN * 648 - unmotivated suggestions with numeral+noun * 649 - name + adj compound without hyphen * 650 - noun prefix+name compound without hyphen * 658 - Suggestion saame __TODO:__ * look at test cases still not behaving properly (__Thomas, Tomi__) * document how compounding is controlled in the PLX conversion (__Tomi__) !!Hyphenator bugs Open issues based on test results : !sme * 468 - ''Márkomeanu'' -> Polderland - __FIXED__ * 548 - ''duostan'' -> Polderland - __FIXED__ * 549 - missing hyph at word boundary -> Polderland - __FIXED__ * 633 - extra hyphen inserted -> Divvun - __FIXED__ There are still some bugs found in the wordtypes test file. They should be added to Bugzilla. __TODO:__ * add remaining hyphenation bugs to Bugzilla (__Thomas__) !smj * 549 - missing hyph at word boundary -> Polderland - __FIXED__ * 633 - extra hyphen inserted -> Polderland - __FIXED__ * 636 - hyphen before last char -> Divvun The wordtypes test file does contain another problem, but that one belongs to Polderland, and is reported. __TODO:__ * lexicalise ''europarádeministarjuogos'' (__Thomas__) * try to fix 636 (__Thomas__) !!InDesign tools We're waiting for an update from Polderland. !!Windows installer __TODO:__ * investigate the NSIS installer (__Børre, Sjur__) !!Releases __TODO:__ * update the ''Changes'' document (__Maaren__) * documentation (__Sjur__) ** Norwegian translation received from Davvi Girji !!!Other !!Corpus contracts + open source TODO: * publish corpus contracts and project infra as open-source on NoDaLi-sta (__Sjur__) !!Easter vacation Please report all ''real'' vacation days (as opposed to taking extra hours off) to __Ellen Mienna Guttorm__ or __Julie Eira__. || Name || Period | Børre | 10.-16.3. (winter holidays) | Sjur | 17.-24.3 | Thomas | 17.-24.3 | Jovsset | 17.-24.3. | Maaren | 19.-24.3. | Per-Eric | 25.-28.3. | Tomi | nothing special, writing on the thesis !!!Next meeting, closing The next meeting is 25.3.2008. The meeting was closed at 11:29. !!!Appendix - task lists for the next five days !!Boerre [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Boerre.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * Hunspell lexicon conversion * InDesign documentation * investigate the NSIS installer * give contract with blank fields to __Per-Eric__ * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Lene * Ped project !!Maaren [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Maaren.ics] * Put the list of possible {{sma}} corpus sources into a document * update the ''Changes'' document !!Per-Eric [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Per-Eric.ics] * keep the contact with Kurt Tores family about his texts. * try to find other authors who have smj texts digitaly, send contracts to them * Work with missing list from the bible texts. * Keep the contact with Ulf-Stefan Winka who has many more smj texts to add. * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Saara [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Saara.ics] * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * discuss more parallel texts !!Sjur [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Sjur.ics] * start to reorganise the documentation * gather {{sma}} texts * improve forrest stability with i18n, site look * set up the Leopard Server features for collaborative support * name db/risten.no * investigate the NSIS installer * make a first {{sma}} project plan * publish corpus contracts and project infra as open-source on NoDaLi-sta * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Thomas.ics] * look at test cases still not behaving properly * add remaining hyphenation bugs to Bugzilla * lexicalise ''europarádeministarjuogos'' * try to fix 636 * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Tomi.ics] * Hunspell lexicon conversion * document how compounding is controlled in the PLX conversion * fix double hyphen bugs * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond [iCal|/doc/admin/weekly/2008/Tasks_2008-03-12_Trond.ics] * Reorganise documentation (with Børre and Sjur) * Gather sma texts (with Børre and Sjur) * Name lexicon project: Test editing xml files (when they are ready for it) * Work on {{sma}} analyser and visl integration * [fix bugs!|http://giellatekno.uit.no/bugzilla].