The parser uses the standard Xerox tools, but they must be invoked via -utf8 mode. This is done the following way:
$xfst -utf8 $lexc -utf8 $cat myfile.utf8 | tokenize tokenizer.fst -utf8 | ... $lookup -flags mb my.fst -utf8