Okapi Components - InfoString Filter

- Overview
- Filter Properties
- Processing Details
- Parameters - Options Tab
- Parameters - Inline Codes Tab

Overview

The InfoString Filter is an Okapi component that implements the Okapi Filter Interface for InfoString files.

InfoString files are proprietary files. The following is an example of such file:

STRING_FILE_TABLE_START
IDActionText1 "The XZ infusion rate of #!{infusionRate} L exceeds the limit." 255 255 255 0x0000 50 182 710 560 18 Arial 1
ID2MessageLine "XZ infusion rate entered exceeded recommended maximum rate." 255 247 57 0x0011 0 0 800 25 18 Arial 1
XZText "XZ" 0 0 0 0x0010 215 173 70 0 20 Arial 1
STRING_FILE_TABLE_END

The strings may have \n, \r, and \t codes that need to be seen as item breaks.

Filter Properties

The properties for the InfoString Filter are the following:

Property	This Filter
INPUTFILE	Yes
INPUTSTRING	No
BILINGUALINPUT	No
TEXTBASED	Yes
OUTPUTFILE	Yes
OUTPUTSTRING	No
ANCILLARYOUTPUT	No
XMLOUTPUT	No
RTFOUTPUT	Yes
USEKEY	No
ISINDEMOMODE	No

Processing Details

Input Encoding

There is not encoding identifier in the InfoString format. So, the filter decides which encoding to use for the input file using the following logic:

If the file has a Unicode Byte-Order-Mark:
- The corresponding encoding (e.g. UTF-8, UTF-16) is used.
Otherwise, the input encoding of the common parameters is used.

Note that usually files are in UTF-8.

Output Encoding

The output encoding used is the one specified by the user.

Extracted Text

String are extracted, with \n, \r, and \t used as item separators. The translate flag (last number in the line) can be set to 0 or 1. The value 1 indicates the string should be extracted, 0 that the string is not translatable.

If the option Pre-segment the strings for sentences is set, each string is segmented and sent as a separate item.

The ID of the full original string is set with the last segment for the given string. That item has also a property called "lastSeg" set to "last". This allows utilities to see where an original string stops. Each item has also a property "paraID" with the identifier of the original string it belongs to.

Parameters - Options Tab

Pre-segment the strings for sentences -- Set this option to pre-segment the string by sentences. Each segment is send as a separate item.

Parameters - Inline Codes Tab

Mark as inline codes the text parts matching this regular expression -- Set this option to use the specified regular expression to be use against the text of the extracted items. Any match will be converted to an inline code. By default the expression is:

((((%(([-0+#]?)[-0+#]?)((\d\$)?)(([\d\*]*)(\.[\d\*]*)?)[dioxXucsfeEgGpn])
|(\\a|\\b|\\f|\\v)|(\{\d.*?\}|%%|#\!\{.*?\}|#13#)))|(#\!\[.*?\]))

This matches the default inline codes found in InfoString strings, including the variables.

Edit Expression -- Click this button to edit the regular expression and its options. This button opens the Inline Codes Rules dialog box.

See the Regular Expressions section for more information about the syntax and rules for building regular matching patterns.