!!!Meeting setup

* Date: 04.09.2006
* Time: 09.30 Norw. time
* Place: Where we are
* Tools: SubEthaEdit, iChat

!!!Agenda

# Opening, agenda review
# Reviewing the task list from two weeks ago
# Documentation - divvun.no
# Corpus gathering
# Corpus infrastructure
# Infrastructure
# Linguistics
# name lexicon infrastructure
# Spellers
# Other issues
# Summary, task lists
# Closing

!!!1. Opening, agenda review, participants

Opened at 09:47.

Present: __Børre, Sjur, Thomas, Tomi, Trond__

Absent: __Maaren, Saara__

Agenda accepted as is.

!!!2. Updated task status since last meeting

!! Børre
* corpus collection:
** send out contracts with accompanying letter
** Gather public texts, preferrably also parallel ones
** Send out letters to the rest of the Iđut authors
** contact __Ája__ (Kåfjord), talk to __Lene__
** send contracts to __Čálliid Lágádus__
** contact __Richard Valkepää__ at NSI about older Min Áigi and Áššu files
** contact __Bård Eriksen__ again
* corpus conversion:
** convert nob and nno bible texts to be used as part of a parallel corpus
** convert fin, swe to paratext or directly to our XML
** review the paratext2xml converter
** Move norwegian documents in Min Áigi from sme to nob
* corpus access:
** possibly deploy the user account form as an HTML form
** make a test user
** Write both user and admin documentation (__Børre__, review: __Sjur, Thomas__)
*** User documentation probably in several languages. This covers how to apply
    for an account, on what grounds one can apply, and pointers to documentation
    telling how to use the corpus.
*** Admin documentation, telling how we set the permissions to the corpus files,
   and whatever other processes and tasks needed to set up a corpus account.
* set up Bugzilla automatic reminders for open issues
* create document & document entry for semantic double-tagging of names (for
  __Trond__)
* Update forrests to svn version r430284
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
** None of the above done, due to illness
** Made Forrest font embedding in pdf work by using absolute paths;
   worked on Forrest i18n

!! Maaren
* On sick leave

!! Saara
* Create a parallel corpora of the new testaments
* add more texts to the graphical corpus interface
* Implement parallel corpus upload in web upload script
* remove headers and footers from pdf documents
** not done
* Implement server of the analysis tools.
** not done
* Add more languages to the lexc2xml propernoun conversion.
** not done
* Refine the namelex output
** done according to the spec in the last meeting
* finish M4 work
** done
* implement tools for locating problems in the corpus files
** done some
* [fix bugs!|http://giellatekno.uit.no/bugzilla]

!! Sjur
* fst gymnastics to add hyphenation and word boundary marks to hyphenation
  transducer
** done some with __Trond__: we now have a transducer that produces no
   hyphenation in the analysis output. Left: produce a transducer that __only__
   adds hyphenation marks to the input. Basic schema for such a setup is done.
* name lexicon:
** implement editing functions
*** more work, not finished
** finalise refactoring for multiple collections
*** waiting for the CInclude bug to be fixed
** implement improvements decided upon in Tromsø
*** some done, still more, see below
* review user and admin documentation for corpus access
** not done
* write user account form, probably ask for copy of existing ones from the IT
  centre (with Trond)
** not done
* add time-based corpus summary as a feature request to Bugzilla
** done
* check in meeting memos from Tromsø
** done
* start hiring process of linguist and programmer
** started
* order AirPort Express to the Tromsø gang
** done
* install Gobby (using DarwinPorts)
** done
* fix e-mail address for __Thomas__
** __Trond__ fixed him an address at the Univ., __Sjur__ has asked __Roy__ at SD
  to change the mail forwarding - it is done now. We only have to wait for UiTø
  to finish their job
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
** looked at several bugs, and commented / updated them

!! Thomas
* sme G3 issue
** nothing this week
* bug-fixing!
** fixed some
* refine smj proper noun lexica, cf. the propernoun-smj-lex.txt 
** not done
* review user documentation for corpus access
** not done
* find and study all derived verbs in our corpus (depends on __Trond__)
** not done
* suggest which derivations could be generated (depends on __Trond__)
** not done

!! Tomi
* new proper name lexicon
** data synchronisation of proper nouns between risten.no and CVS
*** not done
** XQuery refactoring and code development for our proper noun editor
*** not done
** new version of xml2lexc (based on ccat), should handle complex names correct:
   construct entries like we have now from the different parts of a complex name
   entry
*** not done
** implement improvements decided upon in Tromsø
*** not done
* read aligner docu, install, provide feedback
* Set up the mechanism for the hash-mark transducer package
* test the new xml output of the xml-tagged analyses
* export corpus tools to {{/opt}} (with cron)
* Do the sme Der/ change (with __Trond__)
* consider __Trond's__ suggestion of a makefile for corpus conversion
* [fix bugs!|http://giellatekno.uit.no/bugzilla]
* looked at the CInclude bug
** no luck

!! Trond
* better smj NT text
** Not done
* get fin, swe, nob and nno NT and OT in __paratext__ format
** Not done
* contact __Bergen__ about aligner issues
* Done, made progress (many improvements), we still negotiate a commandline version
* fst gymnastics to add hyphenation and word boundary marks to hyphenation
  transducer
** These were two issues, one is fixed, one to go.  
* make shell script wrappers for the most common commands for user friendlyness
** Not done.
* write user account form, probably ask for copy of existing ones from the IT
  centre (with Sjur)
** Not done  
* write documentation for our {{bound}} users, with pointers to the ordinary 
  documentation  
** Not done  
* write documentation on semantic double-tagging of names
** Not done  
* discuss web-only user access management with Oslo
** Not done  
* change tagging of derived stems in the disamb output, to facilitate much
  easier extraction of non-lexicalised derivations
** Done
* do the sme Der/ change (with __Tomi__)
** Done  
* write short user guide for the corpus web interface
** Not done  
* install Gobby (using DarwinPorts)
** Not done
* fix e-mail address for Thomas
** Done, password and user name ready.
* [fix bugs!|http://giellatekno.uit.no/bugzilla].
** Some work.

!!!3. Documentation

The xtdoc/sd part in cvs has a branch i18n-reform that has been i18n'ized.
What lacks is some kind of mechanism to be able to get access to a document
in another language, if available, at best by listing all other language
versions.

Also the work on getting the Sámi chars into the PDF output has been done here.
It works, but only with absolute paths. This isn't portable across computers,
which isn't acceptable. Thus needs more work.

TODO:
* finish i18n work by adding a list of available language versions to each
  document (__Børre__ with help from __Sjur__)
* make pdf set-up work using relative paths (__Børre__)
* Write both user and admin documentation (__Børre__, review: __Sjur, Thomas__)
** User documentation probably in several languages. This covers how to apply
   for an account, on what grounds one can apply, and pointers to documentation
   telling how to use the corpus.
** Admin documentation, telling how we set the permissions to the corpus files,
   and whatever other processes and tasks needed to set up a corpus account.


!!!4. Corpus gathering

Nothing has happened last week.


!!!5. Corpus infrastructure

!!General

Saara has made the makefile and written [documentation for the process
|/doc/ling/corpus_conversion_tech.html]:

{{{
cd /usr/local/share/corp
make LANGUAGE=sme GENRE=facta

or

make bound/sme/admin/sd/dc_1_04.doc.xml
}}}

The default language is "sme" if not given in command line. GENRE can be omitted
(in theory - for now, some of the MinAigi filenames contain characters that make
cannot handle (%), so GENRE has to be specified explicitely (and not as "news");
we are working with the troublesome filenames).

__TODO:__
* remove headers and footers from antiword documents, other improvements
  as needed, including PDF conversion (__Saara__)
* fix Min Áigi filenames (__Saara__)


!!User accounts and access

For details, see a [previous meeting memo|Meeting_2006-06-19], as well as the
memo from a [dedicated
meeting|http://divvun.no/doc/infra/corpus_policy.html].

!Shell access

TODO:
* export to {{/opt}} (with cron) tools that the project team members find in 
  their cvs tree (the bound users do not have a cvs tree, and therefore need 
  these tools in {{/opt}} in order to conduct linguistic analyses) (__Tomi__)
** ccat
*** discussion started in news by __Saara__, please reply and follow-up
    (__all__)
* make shell script wrappers for the most common commands for user friendliness
  (we must think of what commands they are) (__Trond__)
** (first version of first script, teaksta.sh, was checked in, but it is still
   not working (the problem is a simple handling of input-output, some shell
   script literates should have a look at it)
* write user account form, probably ask for copy of existing ones from the IT
  centre (__Trond__ and __Sjur__)
* possibly deploy the user account form as an HTML form (__Børre__)
* write documentation for our {{bound}} users, with pointers to the ordinary 
  documentation (__Børre, Trond__)
* write documentation for how to apply for a user account (where's the form, to
  whom do I send the form, who needs it, etc.) (__Børre__)
* make our own guidelines for the user application processing (__Børre__)
* make a test user (__Børre__)
* test corpus access as test user (__Trond__)


!Web browser access

TODO:
* discuss with Oslo (__Trond__)
* delay other tasks until we are ready to go public?
* user management for access to bound texts
* short user guide needed before going public (either write one or take whatever
  has been made in Oslo (__Trond__)


!!More texts to the graphical corpus interface:

TODO:
* add text to the server (__Lars__)


!!Aligner

The aligner aligns fine, better than its competitors. Unfortunately it is slow,
and dependent upon manual input. It is a Java application, working only on
single files at a time. It can be downloaded - ask Trond for the link. Example
text needs to be in a certain XML format (TEI, it seems) - again, ask Trond for
a sample.

TODO:
* contact Bergen to discuss these issues, ask for a non-manual interface, etc.
  (__Trond__)
** discussions with Bergen. The aligner now works automatically, but still
   requires manual input / interaction in the GUI of the application. The output
   is very good, though.


!!Language recognition

Saara has worked on the issue. Short paragraph (e.g. phone numbers) are
problematic, and paragraphs under 50 words are now given the same language as
the document's main language. We will also need more {{sma}} text. There is
still room for improvement here.

TODO:
* Get more text of the poorly covered languages (__Trond, Børre__)
* study the paragraphs of 50 or less words, where the errors will be (__Trond__)
* study the mistakes our recogniser makes today (__Trond__)
* what about paragraphs with mixed content? Needs more investigation (__Trond__)


!!Corpus summary

The time-based statistics is still missing.

TODO:
* add time-based display as a feature request to Bugzilla (__Sjur__)
** done


!!!6. Infrastructure

!!Xerox tools wrapped as servers

To improve throughput and response time on heavy loads, it would really be nice
to have the Xerox tools wrapped up as servers.

__TODO:__
* decide the programming language to use (__Saara__)
* find some (almost-)ready-to-use code to build on (__Saara__)
* implement it (__Saara__)
** nothing so far


!!Hyphenator

TODO:
* correct the treatment of hyphenation of word boundaries and exceptions (fst
  gymnastics) (__Sjur, Trond__)
** analyser output now without hyphenation marks. The real mark-up of word
   boundaries and exceptions still to be done.
* Update the sma hyphenator rule set with the insights gained from smj updates
  (__Trond__ during weekends)


!!Automatic Bugzilla reminder for untouched bugs

TODO:
* give mail reminders a second try; ask Thor-Øivind for help if needed
  (__Børre__)


!!M4

Setup and infra finished. Now we are ready to start using M4.
* What can we use M4 for? (programmers)
** Select and/or exclude different parts of the twol files.
** Specialised make-targets that depend on a profiled twol (=M4-processed)
* What do we want to use M4 for? (linguists)
** Hyphenation
** Regional diphthong simplification (oahpahe(a)ddjiid)
** shortening in 3-part compounds
** explicit output of G3 mark ' and of allophones e2, o2 etc. for text-to-speech applications
** more?

__TODO:__
* finish the work, and check it in (__Saara__)
** Done.
* make speller and hyphenator make targets that utilise M4 to produce normative
  and hyphenation transducers (__Tomi__)


!!!7. Linguistics

!!Derivation and spellers like Aspell

To make it easier to extract all derived stems, we should enhance the tags used
for derivations in {{sme}} to make them easier to grep. The most straightforward
solution is to make the tags follow the same pattern as for {{smj}},
{{+Der/NNN}}. Presently {{sme}} is only using the {{NNN}} part as a tag, where
{{NNN}} represents the derivational suffix. That is, there is no single pattern
to match against for {{sme}}.

__Problematic issue:__ the disamb output will presently give information only
about non-lexicalised derivations. This can potentially give false data on the
frequency of derivational affixes. To get (more) precise data, there should be
an option in {{lookup2cg}} that favours derived analyses over lexicalised ones,
everything else being equal. A similar option for compounds can also be useful
in certain contexts. That is, the present behaviour should be partly turned
upside-down ("select the analysis/-es with the fewest compounds and derivations
available"). The best alternative would probably be to select the next to least
complex analysis, that is, to allow only one compound border or one derivation.

Only regard derivations for now. Two ways:
# Remove complex verbs and nouns from the lexicon files.
##    search stalla
# Turn the lookup2cg evaluation upside down.

Output from our transducer as it is now found below, showing that strategy
__Nº1__ above might not be that easy. In fact, it can only be used if such words
are explicitly marked, and we now that those words will also be analysed without
the lexicalised variants.
{{{
albmástallat
albmástallat    albmástallat+V+IV+Inf
albmástallat    albmástallat+V+IV+Ind+Prs+Pl1

Why not:
almmái+N+Der/stalla+V...
attástallat
No derivations N->V, only v->V
jeagoheapmi+A+Der/huvva+V jeagohuvvat
jeagoheapmi+A+Der/huhtti+V jeagohuhttit
jeagoheapmi+A+Der/hudda+V jeagohuddat
     ^^^^^^
-> disappears. Does this happen to all A's on -heapmi? With all derivations?
yes to derivations. When A's compound, only with attr. form.

muorahisvuohta
goikkis > goikebiergu vs. gievra > gievrras olmmái

muorra+N+Der/huvvat+V muorahuvvat
muorra+N+Der/heapmi+A muoraheapmi

jeagohuvvat
jeagohuvvat     jeagohuvvat+V+IV+Inf
jeagohuvvat     jeagohuvvat+V+IV+Ind+Prs+Pl1
jeagohuvvat     jeagoheapme+A+Der/huvva+V+IV+Inf
jeagohuvvat     jeagoheapme+A+Der/huvva+V+IV+Ind+Prs+Pl1
jeagohuvvat     jeagoheapmi+A+Der/huvva+V+IV+Inf
jeagohuvvat     jeagoheapmi+A+Der/huvva+V+IV+Ind+Prs+Pl1

attástallat
attástallat     attistit+V+TV+Der/alla+Inf
attástallat     attistit+V+TV+Der/alla+Ind+Prs+Pl1
attástallat     attestit+V+TV+Der/alla+Inf
attástallat     attestit+V+TV+Der/alla+Ind+Prs+Pl1
attástallat     attástallat+V+TV+Inf
attástallat     attástallat+V+TV+Ind+Prs+Pl1
attástallat     addit+V+TV+Der/st+Der/alla+Inf
attástallat     addit+V+TV+Der/st+Der/alla+Ind+Prs+Pl1

"<attástallat>"
         "attistit" V TV Der/alla Ind Prs Pl1
         "addit" V TV Der/st Der/alla Inf
         "attástallat" V TV Inf
         "attestit" V TV Der/alla Ind Prs Pl1
         "attestit" V TV Der/alla Inf
         "addit" V TV Der/st Der/alla Ind Prs Pl1
         "attástallat" V TV Ind Prs Pl1
         "attistit" V TV Der/alla Inf

bisuhit IV>TV      
      we have N+Dim+N+Dim+N  goađázaš
}}}
TODO:
* change tagging of derived stems in the disamb output, to facilitate much
  easier extraction of these verbs (__Trond, Tomi__)
** Done.
* find and study all derived verbs in our corpus (__Thomas__)
* suggest which derivations could be generated (__Thomas__)
** see source code above, but also consider overgeneration problems, as well as
   input from the statistical results in the previous point
* lexicalise the rest (__Thomas__)
* consider the problems of lexicalised derivations schewing the analyses
  (__Trond, Sjur__)


!!Semantic double-tagging of names

The policy needs documentation. Thus:

TODO:
* Make a section under {{gt/doc/lang/smi/}}, add a chapter
  __Common linguistic resources__ under the __Languages__ section in
  {{site-frag.xml}}, and add the document below to that section (__Børre__)
* write guidelines for annotators wrt. to name tagging and put them under 
  {{gt/doc/lang/smi}}. A short version of the
  guidelines goes as follows: \\Systematic
  ambiguity (the __Trosterud__ (plc/sur) and __Aftenposten__ (org/obj) types)
  should not be doubletagged, but given primary tags (plc, org) only.
  Doubletagging should be confined to exceptional  cases (__Eira__
  (Fem/Sur/Plc), __Janne__ (Fem/Mal), such things. These guidelines should be
  added to our documentation file. (__Trond__)
* make sure all linguists is aware of the guidelines (__Trond, Sjur__)
* write disamb rules to implement the system above (__Trond, Linda__)


!!North Sámi

Nothing this week, but see above re: derivations.

!!Lule Sámi

TODO:
* refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
  (__Thomas, Trond__)
* convert roughly 100 smj names from that file (lines 740-843) to XML
  (__Saara__)

!!!8. Name lexicon infrastructure

Decided in Tromsø:
* add smj proper noun lexicon file to the output
* remove {{{^ # 0}} from the center ID
** done
* replace spaces with underscores in all IDs
** done
* remove occurence indicator from language IDs: Agalin_1 (the center/concept ID)
  => Agalin (the language ID), and thus the two Agalin's should become one
  language entry (but two different concept entries)
** done
* store a redundant copy of the center-file semantic information in the
  language-specific files, for processing speed
** done 
* add logging facilities
** added empty log element during conversion to XML
* add option to download local copies of the lexicon files directly from the db
* batch editing (change all entries in the found set), should later be enhanced
  to allow selection of exceptions (the found set minus deselected items)
* all names in all languages by default
* tag for excluding/including a name from certain applications
* hide / display {{^}} during browsing
** done
* future epxansion: choose what info to display in the single language browser
* search by (single) language
** done
* display existing language entries when adding a new language to a record
* make searches behave predictable (the hits should be the expected ones)
** done
* add editor to change single, existing entries
** started

Details can be found in [the meeting
memo.|/doc/admin/physical_meetings/tromso-2006-08-propnoun.html]

Names found containing double inflection definitions:
{{{
Genova  adding multiple infl classes.
Guttorm adding multiple infl classes.
Heddy   adding multiple infl classes.
Heimo   adding multiple infl classes.
J?vreg?ddi      adding multiple infl classes.
Klaipeda        adding multiple infl classes.
Territory       adding multiple infl classes.
}}}
These are all wrong, and should be corrected. There should be no names with two
different inflection lexicons for the same name.

TODO:
* improve lexc2xml conversion (__Saara__)
** add default {{smj}} entries
** exclude {{^ # 0}} from the center ID
** add an empty <log/> element to all entries (center and lanuage files)
** add a {{last-update}} attribute to the root element of all files
*** all done
* finish refactoring for multiple collections in the search interfarce
  (__Sjur__)
** waiting for a bug fix (__Tomi__ is investigating it)
* develop the needed XQueries and UI (__Sjur, Tomi__)
** done some
* data synchronisation between risten.no and the cvs repo (__Tomi__)
** discussion started on eXist-list, nothing useful came up. We need to
   reformulate the question from our perspective, and bring it up again
   (__Sjur__)
* fix multiple inflection for identical names (__Trond and Thomas__)
* add eXist and the proper noun interface to the G5 using Tomcat
  (__Sjur and Børre__)


!!!9. Tromsø meeting round-up

TODO:
* check in meeting memos (__Sjur__)
** done
* Polderland questions. __Thomas__ did already send requested info.
** done. Send more even-syllable VNAs to cover all stem types, with derivations
* speller development - see the meeting memo. Separate follow-up next week.
* Lule Sámi linguist - Sjur has tried to call a possible candidate, but no
  answer so far. He will use e-mail instead. (__Sjur__)
** the response was negative, we need to consider other candidates
* order AirPort Express to Tromsø (__Sjur__)
** done


!!!10. Other


!!Bug fixing

__43__ open Divvun/Disamb bugs (two down!), and __25__ risten.no bugs

Guess: 1/3 of the bugs are fixed already (?)

Please help __Saara__ with [bug
279|http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=279]
 (Perl locale). Not much help...
__Saara__ will contact __Roy__ on this issue.

!!Gobby

TODO:
* install Gobby (__Trond, Sjur__)
** Done by __Sjur__


!! Compilation on victorio

{{{
...
DeverbalVerbsVUORDIL...4, DeverbalVerbsALIST...5, 
DeverbalVerbsSUOTNJAL...4, DeverbalVerbsBOTNJAS...2, 
DeverbalVerbsLASSAN...1, DeverbalVerbsCOASKKIT...4, 
DeverbalVerbsARVIL...3, K...14, K-son...2, ENDLEX...1

Reading from 'sme/src/noun-sme-lex.txt'
AspellAffix...21, GuessNoun...1, NounSecond...9, NounRoot...
*** Warning:  Ignoring info strings.
10000...20000...
22682

Reading from 'sme/src/verb-sme-lex.txt'
Negativeverb...1, negmood...3, negind...9, negimp...9, negsup...9, 
Copula...2, Finitecop...15, Prscop...10, Prtcop...11, Impcop...11, 
Infinitecop...11, STRAYFORMS...1, VerbRoot...10000...14271

Reading from 'sme/src/adv-sme-lex.txt'
Adverb...3005, gadv...2, adv...1, adv-comp...1, adv-sup...1, 
IHTTAS...10, DABBELAS...2, DABBELACCA-...11, COMPDIRADV...3

Reading from 'sme/src/closed-sme-lex.txt'
input in flex scanner failed
make: *** [sme/bin/sme.save] Error 2
gt$em sme/src/closed-sme-lex.txt.
}}}

Earlier fix: Specify right Xerox tools in makefile, and do make clean.
The earlier fix doesn't work now.

Hypothesis: closed-sme-lex.txt is broken
Problem: The file compiles on the other computers, hence it is not easy to
see what to eventually look for in the closed file.

TODO:
* Fix compilation on Victorio (__Tomi, Trond__)


!!Meetings and Marratech

Now that Tomi has moved to Helsinki, Maaren is back from her sick leave, and we
are trying to get more people to the project, we are growing out of iChat's
4-way video conferencing. We can still use audio-only conferencing, but unless
SD has upgraded their firewall, we still won't be able to include Maaren.

There are two choices: going back to the old phone conference calls (how stone
age-ish!), or try to use the Marratech solution provided by SD. I suggest we try
the last option first. A new version of [Marratech
|http://www.marratech.com/download/] is available, with improved performeance.

__TODO:__
* download and install newest Marratech
  (__Børre, Maaren, Saara, Thomas, Tomi, Trond__)


!!Task lists as iCal entries

TODO:
* update all forrest installations to r430284 (__Børre__)


{{{
cd $FORREST_HOME
svn up -r430284
}}}

!!!11. Next meeting, closing

Next meeting 11.9.2006 at 9:30.

Closed at 11:52.

!!!Appendix - task lists for the next week

!! Børre
[iCal|/doc/admin/weekly/2006/Tasks_2006-09-04_Boerre.ics]
* corpus collection:
** send out contracts with accompanying letter
** Gather public texts, preferrably also parallel ones
** Send out letters to the rest of the Iđut authors
** contact __Ája__ (Kåfjord), talk to __Lene__
** send contracts to __Čálliid Lágádus__
** contact __Richard Valkepää__ at NSI about older Min Áigi and Áššu files
** contact __Bård Eriksen__ again
* corpus conversion:
** convert nob and nno bible texts to be used as part of a parallel corpus
** convert fin, swe to paratext or directly to our XML
** review the paratext2xml converter
** Move norwegian documents in Min Áigi from sme to nob
* corpus access:
** possibly deploy the user account form as an HTML form
** make a test user
** Write both user and admin documentation (__Børre__, review: __Sjur, Thomas__)
*** User documentation probably in several languages. This covers how to apply
    for an account, on what grounds one can apply, and pointers to documentation
    telling how to use the corpus.
*** Admin documentation, telling how we set the permissions to the corpus files,
   and whatever other processes and tasks needed to set up a corpus account.
* set up Bugzilla automatic reminders for open issues
* create document & document entry for semantic double-tagging of names (for
  __Trond__)
* update all Forrest installations to svn version r430284
* finish Forrest i18n and Sámi in PDF work
* Get more {{sma, smj}} texts to improve language recognition
* set up Tomcat for use with eXist and the propnouns db on the G5
* download and install latest Marratech
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Maaren
* On sick leave
* download and install latest Marratech


!! Saara
[iCal|/doc/admin/weekly/2006/Tasks_2006-09-04_Saara.ics]
* Create a parallel corpora of the new testaments
* add more texts to the graphical corpus interface
* grammatical searchability in the graphical corpus interface
* Implement parallel corpus upload in web upload script
* remove headers and footers from pdf documents
* Implement server of the analysis tools.
* Add more languages to the lexc2xml propernoun conversion.
* Refine the namelex output
* convert roughly 100 {{smj}} names from gt/smj/propernoun-smj-lex.txt (lines
  740-843) to XML using {{namelex2xml.pl}}
* download and install latest Marratech
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Sjur
[iCal|/doc/admin/weekly/2006/Tasks_2006-09-04_Sjur.ics]
* fst gymnastics to add hyphenation and word boundary marks to hyphenation
  transducer
* name lexicon:
** implement editing functions
** finalise refactoring for multiple collections
** implement improvements decided upon in Tromsø
* review user and admin documentation for corpus access
* write user account form, probably ask for copy of existing ones from the IT
  centre (with Trond)
* start hiring process of linguist and programmer
* help __Børre__ finish i18n work of Forrest with a language override menu
* consider the problems of lexicalised derivations schewing analyses of
  derivation patterns
* install eXist and our local copy of risten.no and propnouns on the G5
* speller follo-up from the Tromsø meeting
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Thomas
[iCal|/doc/admin/weekly/2006/Tasks_2006-09-04_Thomas.ics]
* send more even-syllable VNAs to cover all stem types, with derivations
* fix multiple inflection for identical name
* sme G3 issue
* bug-fixing!
* refine smj proper noun lexica, cf. the propernoun-smj-lex.txt 
* review user documentation for corpus access
* find and study all derived verbs in our corpus (depends on __Trond__)
* download and install latest Marratech
* suggest which derivations could be generated (depends on __Trond__)


!! Tomi
[iCal|/doc/admin/weekly/2006/Tasks_2006-09-04_Tomi.ics]
* new proper name lexicon
** data synchronisation of proper nouns between risten.no and CVS
** XQuery refactoring and code development for our proper noun editor
** new version of xml2lexc (based on ccat), should handle complex names correct:
   construct entries like we have now from the different parts of a complex name
   entry
** implement improvements decided upon in Tromsø
* read aligner docu, install, provide feedback
* Set up the mechanism for the hash-mark transducer package
* test the new xml output of the xml-tagged analyses
* export corpus tools to {{/opt}} (with cron)
* make speller and hyphenator make targets using M4
* fix compilation on Victorio
* download and install latest Marratech
* [fix bugs!|http://giellatekno.uit.no/bugzilla]


!! Trond
[iCal|/doc/admin/weekly/2006/Tasks_2006-09-04_Trond.ics]
* better smj NT text
* get fin, swe, nob and nno NT and OT in __paratext__ format
* Continue discussion with __Bergen__ about aligner issues
* fst gymnastics to add hyphenation and word boundary marks to hyphenation
  transducer
* make shell script wrappers for the most common commands for user friendlyness
* write user account form, probably ask for copy of existing ones from the IT
  centre (with Sjur)
* write documentation for our {{bound}} users, with pointers to the ordinary 
  documentation  
* write documentation on semantic double-tagging of names
* refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
* discuss web-only user access management with Oslo
* write short user guide for the corpus web interface
* install Gobby
* Get more {{sma, smj}} texts to improve language recognition
* study corpus for language recognition errors, as well as paragraphs with mixed
  content
* consider the problems of lexicalised derivations schewing analyses of
  derivation patterns
* fix compilation on Victorio
* download and install latest Marratech
* [fix bugs!|http://giellatekno.uit.no/bugzilla].