!!!Infrastructure and unix intro * Sjur N. Moshagen !!!Presentation overview * Learn unix command line basics * Get to know the newinfra setup ** directory structure * Learn version control ** version control concepts ** svn tools - command line + Cornerstone * Learn work routines ** Testing (write yaml tests to help the twolc development) * standards and standardisation (esp. of tags, build system) !!!Unix intro Some important commands: {{{ cd - Change Directory pwd - Print Working Directory open - (Mac-only command) opens the specified document in the default application see - open the specified document in SubEthaEdit (command must be installed via the SubEthaEdit preferences) grep - extract lines matching a regular expression less - read a text file, one screen-full at a time cut - extract parts of lines separated by TABs or some other, specified character make - use the make tool to build your transducers, as described in the Makefile svn - SubVersioN, the version control system and the command line tool used by us ssh - Seacure SHell, remote connection to other systems }}} * unix command line vs graphical interface - we want both! They both have their strengths and weaknesses. Use the right tool for the job at hand. More commands described at: [http://giellalt.uit.no/tools/docu-unix.html] !!!Version control * Version control is essential when working in teams. * It allows everybody to work on the same code at the same time * the only restriction is that you should not work on the same ''line'' of text * works best with text documents !!Version control concepts * central repository * everyone has their own local copy, the ''working copy'' * all checks in their local changes to the central repository - a ''commit'' ** ... thereby making their changes available to the rest * all ''updates'' their working copy with all newer changes from the repository - a ''merge'' ** merges are automatic and without ''conflicts'' as long as the changes are not on the same line of text * sometimes conflicts happen, they must then be ''resolved'' !!Svn tools - command line + Cornerstone * We use both a graphical client (Cornerstone) and the command line client. * The concepts and operations are the same, but the interface is very different. * Each has its own strength. !Command line tool * view logs * simple local diffs * simple commits !Cornerstone * resolve conflicts - easy together with Filemerge * browse file history (''blame'') * commits - you get the list of modified files and can easily exclude some file, and can see the diff as you write the commit message !!!The (new) infrastructure setup * Directory structure * Basic commands !!Directory structure Two parts: * the Giellatekno core * the languages - the actual source code for each language These are found in the dirs: {{{ gtcore/ langs/ }}} !GT Core * shared resources * schemas * common scripts * templates {{{ gtcore/ ├── gtshared # resources used by all languages │   └── src │   ├── filters # filters common to all languages │   └── orthography ├── schemas ├── scripts │   └── xsl └── templates ├── plxtools ├── smi ├── und └── urj-Cyrl }}} !Templates for shared resources Example: {{{ gtcore/templates/urj-Cyrl/ └── src └── morphology └── stems }}} # This template contains proper names typically found in most or all Uralic languages written in the Cyrillic alphabet. # The idea is that these names, with small and regular changes to the stems, can be reused or shared among all these languages # by editing them in one place and use them "everywhere", we greatly reduce the editing work, and at the same time ensure consistent coverage of this category of words # There's a similar one for the Sámi languages, the {{smi}} template !Languages {{{ sma/ ├── am-shared # build instructions - don't touch! ├── doc # documentation ├── m4 # configuration macros - don't touch! ├── src # the real meat │   ├── filters │   ├── hyphenation │   ├── morphology │   │   ├── affixes │   │   └── stems │   ├── orthography # capitalisation handling │   ├── phonetics # conversion to IPA │   ├── phonology # twolc etc │   ├── syntax # disambiguation, dependency structure │   └── transcriptions # conversion between different │  # orthographical representations │ ├── test # well, tests... │   ├── data │   ├── src │   │   ├── morphology # only one used so far │   │   ├── phonology │   │   └── syntax │   └── tools │   └── spellcheckers └── tools └── spellcheckers ├── hfst # here we build spellers for LibreOffice └── hunspell }}} !!Basic commands Some important commands used in our infrastructure: {{{ ./autogen.sh # initialise the build environment ./configure --with-hfst # configure the build environment for your system w hfst make # build all transducers make check # run all defined tests make doc # soon, not yet functional }}} The two first ones should in principle only be needed once pr language, but will in practice be needed to rerun from time to time when chnages are made to the build environment. !!!Work routines * version control routines * coding style * testing routines !!Version control # commit often # always update before commit ## resolve conflicts ASAP - they must be solved before committing # always check your changes before committing # write clear log messages - you might need to find that commit one year from now Cornerstone is a very good tool for checking diffs while committing and for resolving conflicts. !!Coding style * keep documentation close to the code * keep test cases close to the code From this week on, there will be support for writing your yaml test cases and lexicon documentation as part of the lexc files. See examples in the Finnish sources. (Not yet ready as of today, we'll return to this later in the week.) !Coding style - visual layout Which is easier to read? {{{ LEXICON N_ODD +N+Sg: N_ODD_SG ; +N+Pl: N_ODD_PL ; +N:N_ODD_ESS ; +N+SgNomCmp:e%^DISIMP R ; +N+SgGenCmp:e%>%^DISIMPn R ; }}} or {{{ LEXICON N_ODD +N+Sg: N_ODD_SG ; +N+Pl: N_ODD_PL ; +N: N_ODD_ESS ; +N+SgNomCmp:e%^DISIMP R ; +N+SgGenCmp:e%>%^DISIMPn R ; }}} !Coding style - visual layout (cont.) Use whitespace (blanks and empty lines, avoid tabs) generously to add visual structure to your code. It: * helps you and others read the code * helps you spot bugs (or avoid bugs in the first place) * helps make inconsistencies stand out * makes the code easier to maintain !!Testing * make sure you have tests for all word classes / stem types * always run {{make check}} after each editing pass * fix bugs or inconsistences as soon as they are discovered * keep the test results clean - strive to make sure all tests pass, this will make it easier to spot new bugs or regressions