Text is preprocessed and made into words and sentences. In order to do the latter we need to handle abbreviations. The linguistic sides of the issue are found [in this document|/lang/sme/docu-sme-preprocessor.html], here is [a more specific documentation on the linguistic reasoning|../../ling/preprocessor.html] see also the [Preprocessor Specification|../../proof/gramcheck/doc/PreprocessorSpecification.html] on the pmatch fst behind the hfst method. Here we look at how to compile and use the preprocessor that deals with the abbreviations. !!!Abbreviation handling with hfst This is the recommended approach. Compile and test with the following setting (here with sme as example): {{{ ./configure --with-hfst --enable-tokenisers make echo "dr. Watson."|hfst-tokenise $GTHOME/langs/sme/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst }}} The result should treat the first period as part of the abbreviation "dr.", but the second as a period separated from the word it was attached to. !!!Abbreviation handling with xfst This method is not actively maintained, but documented here in case you have not installed hfst. Standing in the catalogue {{$GTHOME/langs/$LANG}} check whether you have a file abbr.txt in the folder {{tools/tokenisers}}. If you do, you should be fine, and can write {{{ echo "dr. Watson."|preprocess --abbr=tools/tokenisers/abbr.txt }}} The result should be as above. If you don't have this file, you may compile it as follows: In the $LANG catalogue (the catalogue of your language), give the compilation setting and compile as follows: {{{ ./configure --enable-abbr cd tools/tokenisers make abbr }}} The result should be a file {{abbr.txt}} in {{tools/tokenisers}}, and you may test it with the {{preprocess}} command as given above.