!!!Meeting setup * Date: 29.01.2007 * Time: 09.00 Norw. time * Place: Internet * Tools: SubEthaEdit, iChat !!!Agenda # Opening, agenda review # Reviewing the task list from last week # Documentation - divvun.no # Corpus gathering # Corpus infrastructure # Infrastructure # Linguistics # name lexicon infrastructure # Spellers # Other issues # Summary, task lists # Closing !!!1. Opening, agenda review, participants Opened at 10:10. Present: __Børre, Sjur, Steinar, Thomas, Tomi, Trond__ Absent: __Maaren, Saara__ Agenda accepted as is. !!!2. Updated task status since last meeting !! Børre * send {{smj}} translations to Polderland ** not done * write form to request corpus user account ** not done * document how to apply for access to closed corpus, and details on the corpus and its use in general ** not done * add short description on our front page on anonymous cvs and corpus access, with links to relevant documentation ** not done * update and fix our documentation and infrastructure as __Steinar__ finds problem areas ** not done * continue work on script for automatic testing of the spell checker in Word ** not done * fix {{sme}} texts in corpus this month ** not done * find missing {{nob}} parallel texts in corpus ** not done * translate Windows installer text to {{sme}} ** some done * work on the Polderland data generation (PLX format conversion) ** Concentrate on compounding *** compounds done *** some done on numerals * go through other directories, fix parallellity information for other documents ** not done * add {{sma}} texts to the corpus repository ** not done * order Intel Macs ** done * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** not done !! Maaren * tasks according to Thomas !! Saara * fix {{sme}} texts in corpus this month ** done character issues and msword doc table formatting * send aligned, xml {{nob}} texts to __Kristen__ ** done * fix problems with xml2lexc if needed * check the problem with pdf-conversion cutting wordforms. ** in progress * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Sjur * name lexicon: ** restructure interface code for easier maintenance, coding and use *** a lot of work, but still moving too slowly forward - probably need help with this (__Tomi?__) ** refactor the rest of the SD-terms editor code ** implement missing propnouns editing functions ** implement improvements decided upon in Tromsø * hire linguist and programmer ** the candidate for the linguist position I contacted, has answered. He is very interested, and can start April 1. * publish corpus contracts and project infra on NoDaLi-sta ** not done * fix stuorra-oslolaš lower case {{o}} ** not done * write form to request corpus user account ** not done * document how to apply for access to closed corpus, and details on the corpus and its use in general ** not done * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Steinar * test our infrastructure and documentation - follow the documentation exactly, and find problem areas - report problems to __Børre__. Start: At the front page. ** not done, waiting for a necessary update of our front page og the web site * Complete the semantic sets in sme-dis.rle ** worked with verbal sets and bird names * missing lists ** not done * report conversion errors to __Saara__ ** not done * Look at the actio compound issue when adding from missing lists ** not done * lexicalise actio compounds. Example: ''vuolggasadji'' vs. ''vuolginsadji'' ** not done * Go through the Num bugs ** not done * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Thomas * refine {{smj}} proper noun lexica, cf. the propernoun-smj-lex.txt ** not this week either * work with compounding ** awaiting answer from Polder * lexicalise actio compounds ** redirected to Maaren * Lack of lowering before hyphen: Twol rewrite. ** not this week either * Go through the Num bugs ** begun * fix stuorra-oslolaš lower case {{o}} ** not this week either * implement discontinous case inflection for numbers ** done smj * produce correct number base forms in the analyzer ** done smj * [fix bugs!|http://giellatekno.uit.no/bugzilla] ** not this week !! Tomi * add compound stems to the PLX generation ** done * include numerals in the speller ** cardinals done? * add prefixes to the PLX ** not done * add {{smj}} to PLX conversion ** not done * add derivations to the PLX generation ** not done * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Trond * update the {{smj}} proper noun lexicon, and refine the morphological analysis, cf. the propernoun-smj-lex.txt ** No smj yet. * fix {{sme}} texts in corpus this month ** Discussed with Saara and Ilona. * find missing {{nob}} parallel texts in corpus, go through Saara's list ** Not done. * report conversion errors to __Saara__ ** Not done. * Go through the Num bugs ** Not done. * Make numeral testbed for smj as well ** Done. * Get input on {{sma}} hyphenations ** Done. Improved version is checked in. * implement discontinous case inflection for numbers ** Not done. * produce correct number base forms in the analyzer ** Not done. * write form to request corpus user account ** Not done. * document how to apply for access to closed corpus, and details on the corpus and its use in general ** Not done. * [fix bugs!|http://giellatekno.uit.no/bugzilla]. ** Done some. !!!3. Documentation Nothing done last week. TODO: * write form to request corpus user account (__Børre, Sjur, Trond__) * document how to apply for access to closed corpus, and details on the corpus and its use in general (__Børre, Sjur, Trond__) * add short description on our front page on anonymous cvs and corpus access, with links to relevant documentation (__Børre__) !!!4. Corpus gathering Nothing new. We need to work systematically on filling our corpus holes, although not this and the next month. TODO: * {{sme}} texts: no new additions, fix corpus errors during this month (__Børre, Trond, Saara__) * missing {{nob}} parallel texts should be added if such holes are found (__Børre, Trond__) * Go through the list of missing or errouneous {{nob}} texts, based upon __Saara's__ perfect list (__Børre, Trond__) * add {{sma}} texts to the corpus repository (__Børre__) !!!5. Corpus infrastructure !!Alignment __TODO:__ * go through other directories (nob dicrectories, sd directories), fix parallellity information for other documents (2 hours) (__Børre__) * when aligned, send aligned, xml {{nob}} texts to __Kristin__ (__Saara__) ** done !!Conversion issues __TODO:__ * report conversion errors to __Saara__ (__Trond, Steinar__) * Have a look at the two suggestions for pdf discussed in the previous meeting (__Saara__) ** implemented replacement of r vv with rvv. The source of the error is in pdf-conversion, where the space between r and double v is falsely interpreted as space-mark. This concerns only one document and only r and double v. *** Comment: any initial double consonant is an indication of a space too much (no initial geminates in Sámi). ** The hyphens in page breaks are now replaced with , although I'm still testing it. !!!6. Infrastructure Nothing happened last week. __TODO:__ * test our infrastructure and documentation - follow the documentation exactly, and find problem areas - report problems to __Børre__. Start: At the front page. (__Steinar__) * update and fix our documentation and infrastructure as __Steinar__ finds problem areas (__Børre__) !!!7. Linguistics !!North Sámi Maaren is now working on lexicalising the actio compounds. TODO: * lexicalise actio compounds. Example: ''vuolggasadji'' vs. ''vuolginsadji'' (__Thomas, Maaren, Steinar__) * fix stuorra-oslolaš lower case {{o}} (__Sjur, Thomas, Trond__) !Numbers: TODO: * discontinous case inflection (but only for maximally three-part compound numerals) ({{viđain/goalmmát/logiin}} and {{guvttiin/logiin/viđain}}) (__Thomas, Trond__) ** done {{smj}} * produce correct number base forms in the analyzer (__Thomas, Trond__) ** done {{smj}} * Go through the Num bugs (__Trond, Thomas, Steinar__) ** done {{smj}} one bug #372 * Preprocessing of ordinals at the end of sentences - reported as bug #368. (__Trond__) !Hyphenation problem TODO: * ask Ove Lorentz to report on our {{sma}} hyphenator (__Trond__) ** Done. Still minor problems with handling of all-caps forms, but otherwise ok. !!Lule Sámi TODO: * refine {{smj}} proper noun lexica, cf. the propernoun-smj-lex.txt (__Thomas, Trond__) * Lack of lowering/fronting before hyphen: Twol rewrite. (__Thomas, Trond__) * Set up a test bed for numerals, test and revise (__Trond__) ** done * also done: numbers !!!8. Name lexicon infrastructure Decisions made in Tromsø can be found in [the meeting memo.|/admin/physical_meetings/tromso-2006-08-propnoun.html] Postponed: * data synchronisation between [risten.no|http://www.risten.no] and the cvs repo TODO: # restructure interface code for easier maintenance, coding and use ## well under way, still some work # finish first version of the editing (__Sjur__) # test editing of the xml files. If ok, then: (__Sjur, Thomas, Trond__) # make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (__Sjur, Saara__) # convert propernoun-($lang)-lex.txt to a derived file from common xml files (__Sjur, Tomi, Saara__) # start to use the xml file as source file # clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (__Thomas, Maaren, linguists__) # merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (__linguists__) # publish the name lexicon on risten.no (__Sjur__) # add missing parallel names for placenames (__linguists__) # add informative links between first names like Niillas and Nils (__linguists__) !!!9. Spellers !!Polderland data generation __TODO:__ # add {{smj}} to PLX conversion (__Børre, Tomi__) # Include numerals in the speller (__Børre, Tomi__) ## first version done, but needs more work # add prefixes to the PLX (__Børre, Tomi__) ## not yet # add derivations to the PLX generation (__Børre, Tomi__) !!Aspell TODO when the major part of the PLX conversion is done: * add Aspell/Hunspell data generation to the lexc2xspell (__Tomi__ - after the PLX data generation is finished) * study Hunspell, perhaps also Soikko (__Børre, Sjur, Tomi__) !!Testing __TODO:__ * get an Intel Mac for testing Windows spellers (__Børre__) ** done !!Localisation TODO: * translate Windows installer text to {{sme}} (__Børre, Thomas__) ** some more done, roughly 50 % done * send {{smj}} translations to Polderland (__Børre__) ** not yet !!!10. Other !!Corpus contracts TODO: * publish corpus contracts and project infra on NoDaLi-sta (__Sjur__) !!Bug fixing __57__ open Divvun/Disamb bugs, and __23__ risten.no bugs !!KUNSTI final meeting Conference invitation can be found [here|http://www.forskningsradet.no/servlet/Satellite?c=GenerellArtikkel&cid=1148232784218&p=1088796623254&pagename=kunsti%2FGenerellArtikkel%2FVis_i_dette_menypunkt&site=kunsti]. http://tinyurl.com/326lfy 8.-9. February (Thursday & Friday), Oslo. Thomas could present the morphological work, if he wants to. !!!11. Next meeting, closing The next meeting is 5.2.2007, 09:30 Norwegian time. The meeting was closed at 11:01. !!!Appendix - task lists for the next week !! Boerre * send {{smj}} translations to Polderland * write form to request corpus user account * document how to apply for access to closed corpus, and details on the corpus and its use in general * add short description on our front page on anonymous cvs and corpus access, with links to relevant documentation * update and fix our documentation and infrastructure as __Steinar__ finds problem areas * continue work on script for automatic testing of the spell checker in Word * fix {{sme}} texts in corpus this month * find missing {{nob}} parallel texts in corpus * translate Windows installer text to {{sme}} * work on the Polderland data generation (PLX format conversion) ** Concentrate on compounding * go through other directories, fix parallellity information for other documents * add {{sma}} texts to the corpus repository * order Intel Macs * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Maaren * tasks according to Thomas !! Saara * fix {{sme}} texts in corpus this month * continue aligning the rest of the parallel files * fix problems with xml2lexc if needed * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Sjur * name lexicon: ** restructure interface code for easier maintenance, coding and use ** refactor the rest of the SD-terms editor code ** implement missing propnouns editing functions ** implement improvements decided upon in Tromsø * hire linguist and programmer * publish corpus contracts and project infra on NoDaLi-sta * fix stuorra-oslolaš lower case {{o}} * write form to request corpus user account * document how to apply for access to closed corpus, and details on the corpus and its use in general * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Steinar * test our infrastructure and documentation - follow the documentation exactly, and find problem areas - report problems to __Børre__. Start: At the front page. * Complete the semantic sets in sme-dis.rle * missing lists * report conversion errors to __Saara__ * Look at the actio compound issue when adding from missing lists * lexicalise actio compounds. Example: ''vuolggasadji'' vs. ''vuolginsadji'' * Go through the Num bugs * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Thomas * refine {{smj}} proper noun lexica, cf. the propernoun-smj-lex.txt * work with compounding * lexicalise actio compounds * Lack of lowering before hyphen: Twol rewrite. * Go through the Num bugs * fix stuorra-oslolaš lower case {{o}} * implement discontinous case inflection for numbers * produce correct number base forms in the analyzer * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Tomi * add compound stems to the PLX generation * include numerals in the speller * add prefixes to the PLX * add {{smj}} to PLX conversion * add derivations to the PLX generation * [fix bugs!|http://giellatekno.uit.no/bugzilla] !! Trond * update the {{smj}} proper noun lexicon, and refine the morphological analysis, cf. the propernoun-smj-lex.txt * fix {{sme}} texts in corpus this month * find missing {{nob}} parallel texts in corpus, go through Saara's list * report conversion errors to __Saara__ * Go through the Num bugs * implement discontinous case inflection for numbers * produce correct number base forms in the analyzer * write form to request corpus user account * document how to apply for access to closed corpus, and details on the corpus and its use in general * Write project presentation * [fix bugs!|http://giellatekno.uit.no/bugzilla].