TODO: - update the documentation for the korp_v7 - action points for tomorrow 20181205 for @ciprian and @chiara: - debug the presentation of the search results - install the version of the cwb accoring to the korp_v7 backend documentation - feed some vrt files, change configuration accordingly and debug the presentation of the search results anew dir for the korp application http://spraakbanken.gu.se/swe/korp-info ''Korp'' is a Corpus tool and ''Karp'' a Lexicon tool from the Swedish [Språkbanken|http://sprakbanken.gu.se]. We want to install them locally. Installing/Updating the Korp interface: 1. get the latest version of the code git clone http://spraakbanken.gu.se/pub/korp.git 2. succint and easy documentation on how build the project https://spraakbanken.gu.se/eng/research/infrastructure/korp/frontend-developers-guide Open the application locally >grunt serve Build and distribute the application >grunt build ==> the built application is generated into the dir dist 3. go to the dist dir and adapt the files for the Giellatekno's needs - all files that need changes are in this svn repository for instance, for the main interface in frontend, for the Kven/Meänkieli i f_korp, for Uralic languages in u_korp, etc. 4. copy the whole dist dir to the required place on the web server Some URLs with documents on CQP query syntax for advanced search: http://cwb.sourceforge.net/documentation.php http://cwb.sourceforge.net/files/CQP_Tutorial http://cwb.sourceforge.net/files/CQP_Tutorial.pdf 1. Backend ==> installed 2. Frontend ==> installed 3. Pipeline 3.1 input ==> dep-files 3.2 processing steps 3.2.1 dep2xml: the xml-structure contains all pieces of information to generate both structural and positional attributes for the cwb input - script: dep2xml.pl - parameter: dir containing the dep-files as input - command: PERL_UNICODE=SAD perl dep2xml.pl 3.2.2 script(s) to clean up noisy input such as 1- and 2-words sentences and wordforms longer than 50 characters ==> todo 3.2.3 xml2vrt.xsl: script to convert xml into the cwb format ==> todo 3.2.4 xml2sql: script to generate the sql-tables needed for Word Picture ==> todo 3.3 output ==> vrt-files for cwb, msql-tables for Word Picture TODO: - update the documentation of the pipeline for GT-data - document the generation of timespan data @cip 20181204: - all steps from 3.1-3.2 are deprecated - the conversion from analysed xml to vrt is done with a python script ./hfst_pipeline/dep2xml.py NOTE: - the current Korp version used the old parallel_mode.js due to incompatible updates of the parallel data structure (word links) - this mode should be updated to the latest version, too