!!!Meeting setup

* Date: 18.10.2010
* Time: 09.30 Norw. time
* Place: Internet
* Tools: SubEthaEdit, iChat

!!!Agenda

* Sma issues
* Forthcoming finsme dictionary
* Info topics

Cf. one of the following, depending on context:
* the upper bar of the SEE window (provided you use the JSPWiki syntax mode)
* the TOC in Forrest-rendered output, like HTML and PDF

!!!Opening, agenda review, participants

* Opened at 10:02. 
* Present: __Børre, Ciprian, Lene, Maja, Sjur, Tomi, Trond, Biret Ánne__
* Absent: __Thomas__

!!!Sma issues

!!Need to correct dictionary against speller

Wrong dict forms in speller? There are misspellings in the smanob dictionary,
and often those misspellings are found as well in the speller. Check that
misspellings in the dictionary are not replicated in the speller.

The fst and the dictionary show different Sg3 forms (where the Sg3 forms (marked
{{p3p}}) are taken from the Hemnes dictionary. The p3p forms missing in the fst
have now been manually marked with XXX in the dictionary.

The XXX marks can mean:
* there is a difference between analyser and p3p or wordclass info (comes often from smaswe)
* there are several forms given by the analyser, one can correspond with 
  the p3p, and that means that we have to mark more than more verb class 
  in smanob (if both are correct) - how do we do that?

!!Status quo for narmativity

The letter on loan words was sent to SGM on friday. We also sent a letter 
in June-July last year. Are there now normativity issues unresolved? Not that we
are aware of.

!!How shall Divvun and Gt cooperate on the sma work during what is left of this year?

Lene has a list of dictionary verbs not in the analyser. During this work some
issues have turned up:

* missing verbs (and other words)
* misspellings and errors in both the transducer and smanob (cf above)

Way of work:
# Read through smanob and look for errors (adj, noun)
# Check whether these errors are in the normative analyser
# mark problematic words as non-includes
# reverse sorting the fst lexicon, look at continuation lexicon
# unify lemma forms for all fst entries (ie multiple entries for the same word
  should have the same baseform/lemma)
# sort and unique fst entries - double forms in sma lexicon shall be unified
# learn typical error patterns from the dictionary correction work
  (-e vs -ie as stems, -a- vs -aa- or -ie- vs. -ïe- as root vowel), 
  grep similar patterns out of the fst code, and proofread them

The three i-s of sma - is this true?

# i as i 
# ï as i and ï (klinïgke / klinigke) 
# ï shall not be i (but is made so via spellrelax)

Before we implement it we shall find out whether the second category
exists.

__TODO:__
* sort and unique nav entries in sma (__Trond__)
* discuss the possible third i in sma (__Maja, Thomas__)
* go through the XXX in the dictionary verbs
  (__Maja, Marit, Sissel, Thomas, Toini__)
* sma linguist meeting - date & time to be decided
  (__Lene, Maja, Marit, Sissel, Thomas, Toini, Trond__)

!!!Forthcoming finsme dictionary

We do not quite know when it comes, but what shall we do with it when it comes?
The dictionary is made by KOTUS (fin-sme-fin).

Hvis penger - Johanna Ijäs kan gjøre noe med dette?

!!!Info topics

!!a new programmer in the sma-oahpa project

Ryan for 2 months.

!!Cips journey to Iceland

Here is my very minimalistic notes on tools:
* corpus_search tool (try it!) -
  [http://corpussearch.sourceforge.net/]
* DATR (vs. XFST try it!) -
  [http://www.informatics.sussex.ac.uk/research/groups/nlp/datr/]
* GLOSSA (Anders Nøklestad - GLOSSA is FLOSS, use it)
** [http://www.hf.uio.no/iln/tjenester/sprak/glossa/index.html]
** [http://tekstlab.uio.no/glossa//html/GLOSSA_manual.html]
* CLAN (Carnegy Mellon University) (try it) -
  [http://childes.psy.cmu.edu/clan/]
* MOLTO (MT open source Aarne Raanta on GF) -
  [http://www.molto-project.eu/]
* CORD (corpora historical English) -
  [http://www.helsinki.fi/varieng/CoRD/index.html]
* TVÄRSLÅ (the scandinavian lexicon is accessible for single-word
  queries on the net, but not available for developers) -
  [http://ordbok.nada.kth.se/]

The amount of work for installing GLOSSA here is noticeable, yes, but 
it will make data updating easier. Also, we might easier be able to
taylor the interface to our need.

The obt uses the Oslo fullform list for morphological analysis, with 
a separate component for compound analysis. For syntax, they use CG.

In Bergen they are going to implement a separate corpus interface, 
corpuscle.

!!Summing up the sma-oahpa week

* Aajege var med 3 personer i Tromsø 11-15.10
* de har samlet 2500 ord til Oahpa-leksikonet fra lærebøker - nå unifisert med
  smanob
* resultert i 900 nye lemmaer i smanob-dict for subst, verb og 
  adjektiv (de andre ordklassene er ikke telt opp)
* Aajege arbeider med Oahpa-lemmaene rett i smanob: forbedrer oversettinger 
  (som er tatt rett fra lærebøkene hulter til bulter), og legger til 
  semantiske sett og evt. info om retning i Leksa - til dels erstatter de den   
  opprinnelige smanob-oversettelsen
* vi må finne ut på hvilket nivå vi legger til oversettelser til andre språk i
  samme leksikonet (sme, swe). Konsekvensen vil bli at vi vil ha sma-fil
  istenfor smanob, smaswe osv. og nob-fil istedenfor nobsma, nobsme...
* vi begynner å bli venner med XMLmind
* alfa-MorfaC, beta-MorfaS og beta-Leksa skal være ferdig i løpet 
  av de nærmeste ukene, testes på voksne kurselever 19-21.11 - til da bør 
  også ny versjon av ÅD være kompilert
* en del nye funksjoner legges til i Leksa -> kommer også med i smeOahpa:
** pronomener
** oversettingskommentar
** nettordbok inkludert i Oahpa-grensesnitt
* neste arbeidsmøte er i Røros 22.11 (Trond og Lene drar dit), dessuten 
  bruker vi ichat-møter - torsdager etter lunsj

Whitespace and empty element diffs:
{{{
-              <book name="s4" />
+               <book name="s4"/>
-      <stem class="trisyllabic"></stem>
+      <stem class="trisyllabic"/>
}}}

__TODO:__
* compile new version of ÅD (__Ciprian__)
* fix the XMLMind XML Editor whitespace issue (__Sjur__)