TODO:
 - update the documentation for the korp_v7
 - action points for tomorrow 20181205 for @ciprian and @chiara:
   - debug the presentation of the search results
   - install the version of the cwb accoring to the korp_v7 backend documentation
   - feed some vrt files, change configuration accordingly and debug the
     presentation of the search results anew

dir for the korp application
http://spraakbanken.gu.se/swe/korp-info

''Korp'' is a Corpus tool and ''Karp'' a Lexicon tool from the Swedish
[Språkbanken|http://sprakbanken.gu.se]. We want to install them locally.

Installing/Updating the Korp interface:
1. get the latest version of the code
git clone http://spraakbanken.gu.se/pub/korp.git

2. succint and easy documentation on how build the project
https://spraakbanken.gu.se/eng/research/infrastructure/korp/frontend-developers-guide

Open the application locally
>grunt serve

Build and distribute the application
>grunt build

 ==> the built application is generated into the dir dist

3. go to the dist dir and adapt the files for the
   Giellatekno's needs
 - all files that need changes are in this svn repository
   for instance, for the main interface in frontend,
   for the Kven/Meänkieli i f_korp, for Uralic languages in u_korp, etc. 

4. copy the whole dist dir to the required place on the web server

   
Some URLs with documents on CQP query syntax for advanced search:
http://cwb.sourceforge.net/documentation.php
http://cwb.sourceforge.net/files/CQP_Tutorial
http://cwb.sourceforge.net/files/CQP_Tutorial.pdf 


1. Backend
 ==> installed

2. Frontend 
 ==> installed

3. Pipeline
  3.1 input ==> dep-files

  3.2 processing steps
  3.2.1 dep2xml: the xml-structure contains all pieces of information
        to generate both structural and positional attributes for the cwb input
        - script: dep2xml.pl
        - parameter: dir containing the dep-files as input
        - command: PERL_UNICODE=SAD perl dep2xml.pl   
  3.2.2 script(s) to clean up noisy input such as 1- and 2-words sentences and wordforms longer than
        50 characters ==> todo
  3.2.3 xml2vrt.xsl: script to convert xml into the cwb format ==> todo
  3.2.4 xml2sql: script to generate the sql-tables needed for Word Picture ==> todo

  3.3 output ==> vrt-files for cwb, msql-tables for Word Picture

TODO:
  - update the documentation of the pipeline for GT-data
  - document the generation of timespan data

@cip 20181204:
 - all steps from 3.1-3.2 are deprecated
 - the conversion from analysed xml to vrt is done with a python script
   ./hfst_pipeline/dep2xml.py


NOTE:
 - the current Korp version used the old parallel_mode.js due to
   incompatible updates of the parallel data structure (word links)
 - this mode should be updated to the latest version, too