''Korp'' is a Corpus tool and ''Karp'' a Lexicon tool from the Swedish [Språkbanken|http://sprakbanken.gu.se]. We want to install them locally. !!!Links The Korp code: * [Korp Backend|http://spraakbanken.gu.se/swe/forskning/infrastruktur/korp/distribution/backend] * [Korp Frontend|http://spraakbanken.gu.se/swe/forskning/infrastruktur/korp/distribution/frontend] * [Corpus pipeline|http://spraakbanken.gu.se/swe/forskning/infrastruktur/korp/distribution/corpuspipeline] Links to the Karp code are forthcoming. !!!Work plan * Download Korp code * Install at gtweb * Install corpora * Make interface !!!Corpora available * Free ** skuvlahistorja1-6 ** fad * Bound ** news ** ficti ** NT !!!Corpus mixes * smesme: news + ficti * nob2sme: fad + skuvlahistorja1-6 * smedep: news + ficti + facta/skuvlahistorja1-6 + bibel/newtestament !!!Interface Menu: # search for sme wordforms (kwic-snt in corpus ccat) – corpus: smesme # search for sme lemmas (kwic-snt? in analysed corpus syn) – corpus choices: smesme, nob2sme # search for sme and nob in translations (lemma search in sentence aligned sentences) – corpus: nob2sme # deepdict sme (lemma search -> dependency daughters in corpus dep) – corpus: smedep !!!Lemgram !!Definitions * __lexeme__ = member of an open lexical category, having meaning and form but being neither * __lemma__ = wordform used as representative for lexeme * __grammatical word__ pair of lemma+grammatical properties and wordform * __paradigm__ = set of grammatical words realising a lemma * __lemgram__ = set of wordforms in paradigm !!Generation Generation of lemgrams from lexc: Use ''dict-isme-norm.fst'' or ''generator-dict-gt-norm.xfst'' or ''generator-dict-gt-norm.hfst''. We remove the tags v1, v2.. from the fst. It is better for the user that all variants of the same paradigm are in the same lemgram. Many fst-lemmas have more than one entry in lexc, so the list should be uniqed before generating forms. I suggest that we start with these files: For nouns, we pick different 3 lists: The ordinary nouns, the actors (NomAg), and the G3-marked nouns. For the other parts of speech, one command is enough. Commands to filter (ir)relevant forms: !noun-sme-lex.txt: *Ordinary words: {{{ egrep -v "(G3|ACTOR|CmpN/Only|ShCmp|RCmpnd|\+V\+|^\!)" }}} * ACTOR: {{{ grep N+NomAg }}} * G3: {{{ grep N+G3 }}} !verb-sme-lex.txt: {{{ egrep -v "(ENDLEX|\+V|^\!)" }}} !adj-sme-lex.txt: {{{ egrep -v "(LEXICON|Der| Rreal | R |^\!)" }}} !adv-sme-lex.txt: {{{ egrep -v "(LEXICON| K |^\!)" }}} !!!Meetings * 2013: [9.4.|meetings/130409.html]   // [4.12.|meetings/131204.html] * 2014: [8.1..|meetings/140108.html]