!!!Meeting setup * Date: 29.10.2007 * Time: 09.30 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat/Skype !!!Agenda # Opening, agenda review # Reviewing the task list from last week # Documentation - divvun.no # Corpus gathering # Corpus infrastructure # Infrastructure # Linguistics # name lexicon infrastructure # Spellers # Other issues # Summary, task lists # Closing !!!1. Opening, agenda review, participants Opened at 10:15. Present: __Børre, Per-Eric, Risten, Sjur, Thomas, Tomi, Trond__ Absent: __Ilona__ Agenda accepted as is. !!!2. Updated task status since last meeting !!Børre * move __Steinar's__ error markup in the xml files to (a copy of) the original * begin adding support for the sami languages in OpenOffice.org ** Began making locales, __Sjur__ added a request for spelling * fix Unicode bug in Hunspell conversion java code ** not done * test closed POSes in Hunspell speller ** did some testing * buy InDesign CS3: one Mac upgrade, one Mac full, one Windows ** done * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] ** not done * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Ilona * lexicalise {{smj}} missing words * add {{smj}} proper nouns * other {{smj}} tasks !!Maaren * lexicalise actio compounds !!Per-Eric * lexicalise words from the Olavi missing list ** Worked and still working * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** fixed some !!Risten * finish the design/text for the CD and the cover * set up CD-printing printer !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * add nested error markup to xml conversion !!Sjur * document the AppleScript speller test output ** done * work on the XML name editor/risten.no integration ** nothing * set up risten.no on the G5 again ** nope * test new and nested error markup ** waiting for __Saara__ * get command line hyphenator for automated testing of the hyph-lexicons ** still not received * add hyphenation testing ** waiting for command line hyphenator * add hunspell testing ** looked at it, installed the first alpha in OpenOffice.org * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] ** I have some ideas, but nothing done yet * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** reported new ones * other: ** requested support for [Sámi languages|http://www.openoffice.org/issues/show_bug.cgi?id=82927] in speller language list in OOo ** tested installation from CD, on both Windows and Mac !!Thomas * {{sme->smj}} lexicon conversion to build bilingual lexicon resources ** not anything this week * add {{smj}} proper nouns ** some added * check for bad hyphenation ** worked * look at test cases still not behaving properly ** worked * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** worked !!Tomi * Hunspell lexicon conversion ** did some * {{sme->smj}} lexicon conversion to build bilingual lexicon resources ** not done * fix Unicode bug in Hunspell conversion java code ** not done * test closed POSes in Hunspell speller ** tested * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** fixed !!Trond * {{sme->smj}} lexicon conversion to build bilingual lexicon resources ** Worked hard on this issue. Continued the work based on joint work with Thomas. We will * [fix bugs!|http://giellatekno.uit.no/bugzilla]. !!!3. Documentation [Bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] __TODO:__ * fix bug 550 (__Børre, Sjur__) !!!4. Corpus infrastructure Nothing. !!!5. Infrastructure Nothing except bug 550 (see above). !!!6. Linguistics !!North Sámi No real issues at the moment. !!Lule Sámi __Thomas__ has looked at the missing baseforms. __Trond__: we are generating {{smj}} words from {{sme}} as part of the Univ. project for MT dictionary creation. TODO: * lexicalise words from the Olavi missing list, but check against the pdf original where in doubt (__Per-Eric__) * {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and increase {{smj}} coverage (__Trond, Thomas, Svenne__) * add proper nouns (__Thomas, Ilona__) * look at missing baseforms (__Thomas__) !!!7. Name lexicon infrastructure This sub-project needs to get up and running soon. Mainly __Sjur's__ task. Decisions made in Tromsø can be found in [this meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html] __TODO:__ # set up Tomcat and risten.no on the G5 again (__Sjur, Børre__) ## install risten.no # fix bugs in lexc2xml; add comments to the log element (__Saara__) # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # implement data synchronisation between [risten.no|http://www.risten.no] and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!8. Proofing tools !!Hunspell The initial alpha is working, and looks promising. TODO: # Begin adding support for the sami languages in OpenOffice.org (__Børre__) ## __Sjur__ started the process # Hunspell lexicon conversion (__Tomi, Børre__) ## fix Unicode bug in Hunspell conversion java code (__Tomi, Børre__) ### it seems to work now on the G5 ## test closed POSes (__Tomi, Børre__) ### done some ## add hunspell testing to the make file (__Sjur__) ### not yet done ## debug and fix remaining conversion issues (__Børre, Tomi__) ### a lot work to do here:) !!Testing !Spelling Error Markup __TODO:__ * Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (__Saara__) * move __Steinar's__ error markup in the xml files to (a copy of) the original (__Børre, Kimme__) * add nested error markup to xml conversion (__Saara__) * test new and nested error markup (__Sjur__) !Automated testing __TODO:__ * document the AppleScript speller test output (__Sjur__) ** done * add hyphenation testing ** waiting for testing tool !!Lexicon conversion to the PLX format Open issues based on test results: !smj 482 - still problematic, 506 - fixed, 518 - fixed again, 552 - still open, Svierigadárogielan - still rejected !sme 408 - not really a prefix, but PLX-encoded as such, 419, 423, 425 - roman number, 426 - fixed, 431, 449 - fixed (almost), 452 - ''miel'' is a prefix, 461 - almost 50 % (17) gets correct suggestion, 489, 508 - regression, 518 - fixed again, 522, 524. __TODO:__ * look at test cases still not behaving properly (__Thomas, Tomi__) !!InDesign tools __TODO:__ * add hyphenation testing (__Sjur__) * buy InDesign CS3: one Mac upgrade, one Mac full, one Windows (__Børre__) ** ordered !!Hyphenators We should look into the possibility of generating pattern-based hyphenation for OOo. It shouldn't be too hard, or require too much work, but needs investigation. __TODO:__ * get command line hyphenator (__Sjur__) !!Release version The CD cover etc. will be worked on by __John-Marcus Kuhmunen__, and will follow the SD design rules. He is now waiting for the text to be put on the CD cover and other places. Network printer for CD printing is ok. __TODO:__ * write text to go on the CD cover (__Risten__) * set up CD-printing printer (__Risten__) !!Actual release December 11-13, one of these days. !!!9. Other !!Corpus contracts Delayed till after final release. TODO: * publish corpus contracts and project infra as open-source on NoDaLi-sta (__Sjur__) !!Faroese Speller for {{fao}} using our infrastructure and the knowledge we have. !!Bug fixing When fixing bugs, record the version number containing the fix in the Bugzilla bug report, such that for each bug, we know exactly when it should have been fixed, in what file(s) and what version. __69__ open Divvun/Disamb bugs (__35__ of these 56 are speller-related bugs, __34__ are other bugs), and __23__ risten.no bugs !!SD yearly personell seminar 6.-7. December. __Sjur__ will discuss it with Julia, but our view is that we don't have time to go this year. The release are just a few days later. !!Software updates * SubEthaEdit * Leopard, 10.5 * Skype 2.6.x !!!10. Next meeting, closing The next meeting is 5.11.2007, 09:30 Norwegian time. The meeting was closed at 10:49. !!!Appendix - task lists for the next week !!Boerre * move __Steinar's__ error markup in the xml files to (a copy of) the original * adding support for the sami languages in OpenOffice.org * fix Unicode bug in Hunspell conversion java code * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Ilona * lexicalise {{smj}} missing words * add {{smj}} proper nouns * other {{smj}} tasks !!Maaren * lexicalise actio compounds !!Per-Eric * lexicalise words from the Olavi missing list * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Risten * finish the design/text for the CD and the cover * set up CD-printing printer !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * add nested error markup to xml conversion !!Sjur * work on the XML name editor/risten.no integration * set up risten.no on the G5 again * test new and nested error markup * get command line hyphenator for automated testing of the hyph-lexicons * add hyphenation testing * add hunspell testing * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * add {{smj}} proper nouns * check for bad hyphenation * look at test cases still not behaving properly * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi * Hunspell lexicon conversion * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * fix Unicode bug in Hunspell conversion java code * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * [fix bugs!|http://giellatekno.uit.no/bugzilla].