!!!Meeting setup * Date: 19.11.2007 * Time: 09.30 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat/Skype !!!Agenda # Opening, agenda review # Reviewing the task list from last week # Documentation - divvun.no # Corpus gathering # Corpus infrastructure # Infrastructure # Linguistics # name lexicon infrastructure # Spellers # Other issues # Summary, task lists # Closing !!!1. Opening, agenda review, participants Opened at 10:15. Present: __Børre, Ilona, Per-Eric, Sjur, Thomas, Tomi__ Absent: __Risten, Trond__ Agenda accepted as is. !!!2. Updated task status since last meeting !!Børre * move __Steinar's__ error markup in the xml files to (a copy of) the original ** not done * fix Unicode bug in Hunspell conversion java code ** don't know the reason for this one * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] ** not done * move Bugzilla to the G5. ** done * fix Windows CD installation bug ** not done * discuss more parallel texts ** nothing con * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Ilona * lexicalise {{smj}} missing words ** Done.. at least most of it. * other {{smj}} tasks, ask __Thomas__ !!Maaren * lexicalise actio compounds !!Per-Eric * lexicalise words from the Olavi missing list ** Worked and still working. It will be ready this week, some strange words left. We have to make a new missng list to see which words are left. * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** Nothing to fix !!Risten * finish the design/text for the CD and the cover ** done * set up CD-printing printer * try to burn a CD at SD ** done * get price and schedule for printed CD cover !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * add nested error markup to xml conversion ** almost finished, just some testing left * discuss more parallel texts !!Sjur * work on the XML name editor/risten.no integration ** nothing - this will have to wait till Divvun2 * set up risten.no on the G5 again ** really tried last week, to help __Trond__ with some work, but failed (eXist couldn't reliably restore files from a local backup, leaving the dictionary and term collections incomplete and useless; and Forrest rejected to change port despite explicit requests for it, and port-crashed with the existing divvun.no site running off the same computer, this made the whole portal dysfunctional) * test new and nested error markup ** __Saara__ tried getting it in place, but is not finished with her work, thus nothing to test yest * get command line hyphenator for automated testing of the hyph-lexicons ** done * add hyphenation testing ** first rough version done * improve paradigm testing ** done * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] ** not yet * follow-up support for the sami languages in OpenOffice.org ** done, they will be included in the next OOo release - 2.4 * fix Windows CD installation bug ** not yet done * fix circularity issue in nonrec transducers ** __Tomi__ did this on his own - great! * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas * {{sme->smj}} lexicon conversion to build bilingual lexicon resources ** not this week * check for bad hyphenation ** not this week * look at test cases still not behaving properly ** worked * paradigm testing ** worked a lot * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** worked !!Tomi * Hunspell lexicon conversion ** not done * fix circularity issue in nonrec transducers ** fixed * [fix bugs!|http://giellatekno.uit.no/bugzilla] * other ** installed Leopard !!Trond * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * fix hyphenation of derivations, inflections * telephone meeting with __Sjur__ and the faroese group re faroese speller * fix circularity issue in nonrec transducers * discuss more parallel texts * [fix bugs!|http://giellatekno.uit.no/bugzilla]. !!!3. Documentation Nothing new. __TODO:__ * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] (__Børre, Sjur__) !!!4. Corpus infrastructure Nothing. !!!5. Infrastructure Bugzilla is up and running again. __TODO:__ * add Jabber account in iChat (__all__) !!!6. Linguistics !!North Sámi __TODO:__ * fix hyphenation of derivations (__Thomas, Tomi, Sjur, Trond__) ** now tested, and much improved, but still needs improvements and further investigation * fix circularity issue (__Sjur, Tomi, Trond__) ** __Tomi__ fixed it !!Lule Sámi __TODO:__ * lexicalise words from the Olavi missing list, but check against the pdf original where in doubt (__Per-Eric__) ** almost finished - only parts of the letter ''r'' still missing * {{sme->smj}} lexicon conversion to build bilingual lexicon resources, and increase {{smj}} coverage (__Trond, Thomas, Svenne__) * look at missing baseforms (__Thomas__) ** done !!!7. Name lexicon infrastructure This sub-project needs to get up and running soon. Mainly __Sjur's__ task. Decisions made in Tromsø can be found in [this meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html] __TODO:__ # set up Tomcat and risten.no on the G5 again (__Sjur, Børre__) ## install risten.no ### really tried, but got problems # fix bugs in lexc2xml; add comments to the log element (__Saara__) # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # implement data synchronisation between [risten.no|http://www.risten.no] and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!8. Proofing tools !!Hunspell TODO: # Follow-up support for the sami languages in OpenOffice.org (__Børre, Sjur__) ## done, scheduled for version 2.4 # Hunspell lexicon conversion (__Tomi, Børre__) ## improved, nouns compile much improved, adjectives also converts, but with errors, verbs are quite rough !!Testing !Spelling Error Markup This will wait till after the release. __TODO:__ * Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (__Saara__) * move __Steinar's__ error markup in the xml files to (a copy of) the original (__Børre, Kimme__) * add nested error markup to xml conversion (__Saara__) * test new and nested error markup (__Sjur__) !Automated testing __TODO:__ * add hyphenation testing (__Sjur__) ** rough version added * improve paradigm testing report (__Sjur__) ** done !MS Office An important aspect of this testing is to document in the user guide anything that could be a problem for users. __TODO:__ * test the proofing tools with all MS Office applications (__Børre, Thomas__) !!Lexicon conversion to the PLX format PLX conversion update: we will soon get an updated speller engine with flags for marking word-initial, word-internal and word-final positions. That will make it possible for us to resolve some outstanding issues in the PLX conversion. Open issues based on test results: !smj 482 - still problematic (prefix), 484 - double hyphens suggested; 552 - now fixed!, Svierigadárogielan - still rejected (prefix) !sme 397 - double hyphens (name+name), 419, 425 - roman number, 431 - does not accept the correct string, but DO suggest the same; also hyphen final forms are accepted, but not the same form when part of a compound, 452 - ''miel'' is a prefix, 461 - ''ovda'' accepted, almost 50 % (17) gets correct suggestion, 489, 522, 524, Guovdageainnu-láđđi not accepted. {{{ Guovdageaidnu-láđđi nom- Guovdageainnu-láđđi gen- }}} It should be Guovdageaidnu-láđđi OR Guovdageainnu láđđi. The first one is suggested + Guovdageainnutláđđi ''Harstad-biila'' (nom) is ok, whereas gen. '' Harstada-biila'' is not, ie the same pattern. {{{ ovda- ovda- ovda+Cmpnd }}} __TODO:__ * look at test cases still not behaving properly (__Thomas, Tomi__) !!InDesign tools __TODO:__ * add hyphenation testing (__Sjur__) ** done but not finished !!Hyphenators We should look into the possibility of generating pattern-based hyphenation for OOo. It shouldn't be too hard, or require too much work, but needs investigation. => Divvun2. __TODO:__ * get command line hyphenator (__Sjur__) ** done !!Release version Schedule and tasks for the remaining weeks: __TODO:__ * try to burn a CD at SD (__Risten, Leif-Åge__) ** done, it is working exactly as one burned on the Mac * finish text to go on the CD cover (__Risten__) ** done * set up CD-printing printer (__Risten__) ** in the works * fix Windows CD installation bug (__Sjur, Børre__) ** not yet done * get price and schedule for printed CD cover (__Risten__) ** not received * fix remaining bugs - golden master by next Monday (__all__) * finalise InDesign hyphenator (__Sjur, Børre, Thomas__) ** testing ** documentation ** installation * update usage and installation documentation (__Børre, Thomas, Sjur__) * translate all new documentation (__all__) * QA all documentation (__all__) * do as much hunsopell as possible (__Børre, Tomi__) !!Actual release December 11-13, one of these days. Hotel rooms received for all except Ilona, will be received for her as well. There will be a release party in the afternoon. !!!9. Other !!Corpus contracts Delayed till after final release. TODO: * publish corpus contracts and project infra as open-source on NoDaLi-sta (__Sjur__) !!Faroese Speller for {{fao}} using our infrastructure and the knowledge we have. __TODO:__ * set up a telephone meeting with them and __Sjur__ (__Trond__) !!Bug fixing When fixing bugs, record the version number containing the fix in the Bugzilla bug report, such that for each bug, we know exactly when it should have been fixed, in what file(s) and what version. __69__ open Divvun/Disamb bugs (__35__ of these 56 are speller-related bugs, __34__ are other bugs), and __23__ risten.no bugs !!Dictionaries __TODO__: * eXist on G5 - done * risten.no on G5 - homepage done, but needs port tweaking/changing * risten.no XQuery framework - done * XInclude conversion script - done * homepage on the documetation pages - draft done !!Parallel corpora __TODO:__ * discuss more parallel texts (__Børre, Saara, Trond__) !!SD yearly personell seminar 6.-7. December. __Sjur__ has discussed it with Julia, and we won't go there. !!Software updates * Leopard, 10.5 ** Ilona - Doesn't have a proper DVD yet. ** Trond ** Per-Eric !!!10. Next meeting, closing The next meeting is 26.11.2007, 09:30 Norwegian time. The meeting was closed at 11:34. !!!Appendix - task lists for the next week !!Boerre * move __Steinar's__ error markup in the xml files to (a copy of) the original * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] * fix Windows CD installation bug * discuss more parallel texts * finalise InDesign hyphenator * update usage and installation documentation * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Ilona * lexicalise {{smj}} missing words * other {{smj}} tasks, ask __Thomas__ * Buy the DVD for Leopard !!Maaren * lexicalise actio compounds !!Per-Eric * lexicalise words from the Olavi missing list * derivations tests * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Risten * set up CD-printing printer * get price and schedule for printed CD cover !!Saara * add new XSL/XML headers for proofing test docs * Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not) * add nested error markup to xml conversion * discuss more parallel texts !!Sjur * work on the XML name editor/risten.no integration * set up risten.no on the G5 again * test new and nested error markup * improve hyphenation testing * fix [bug 550|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=550] * fix Windows CD installation bug * finalise InDesign hyphenator * update usage and installation documentation * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Thomas * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * check for bad hyphenation * look at test cases still not behaving properly * paradigm testing * test the proofing tools with all MS Office applications * finalise InDesign hyphenator * update usage and installation documentation * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Tomi * Hunspell lexicon conversion * [fix bugs!|http://giellatekno.uit.no/bugzilla] !!Trond * {{sme->smj}} lexicon conversion to build bilingual lexicon resources * telephone meeting with __Sjur__ and the faroese group re faroese speller * discuss more parallel texts * [fix bugs!|http://giellatekno.uit.no/bugzilla].