- Overview
- Filter Properties
- Processing Details
- Parameters - Rules Tab
- Rule Regular Expression
- Parameters - Strings Tab
- Parameters - Characters Tab
- Parameters - Options Tab

Overview

The Script Filter is an Okapi component that implements the Okapi Filter Interface for script-type files where the translatable text can be identified using regular expressions and a minimal set of pre-defined behaviors. The files processed with this filter must be text-based.

The parts to be extracted are specified through a set of user-defined rules. A default rule is used to specify what to do for the parts that are not matched by any other rules. For example, if you have the following rules defined:

<Default>: action: Do not extract
    Rule1: start: string\s*?\(.*?\)\s*?=\s*?{
           content: .*?
           end: \}
           action: Extract

The extractable text in the following input file is marked in bold:

string(str01) = {Text of string 1}
string (str02)={Text of string 2}

You can also detect information associated to the text to extract using special named groups in your regular expressions. For example, to identify the text between the parenthesis of the example above as resname attribute of the text, you can change the expression for the start to:

Rule1: start: string\s*?\((?<N>.*?)\)\s*?=\s*?{

This will provide you access to the resname values (in bolded red) along with the text (in bolded black):

string(str01) = {Text of string 1}
string (str02)={Text of string 2}

Filter Properties

The properties for the Script Filter are the following:

Property This Filter
INPUTFILE Yes
INPUTSTRING Yes
BILINGUALINPUT No
TEXTBASED Yes
OUTPUTFILE Yes
OUTPUTSTRING Yes
ANCILLARYOUTPUT No
XMLOUTPUT No
RTFOUTPUT Yes
USEKEY No
ISINDEMOMODE No

Processing Details

Input Encoding

The filter uses the input encoding defined by the user. Unicode files with Byte-Order-Marks are detected automatically and the proper encoding used regardless of the user-defined encoding.

Output Encoding

The output file is generated in the encoding defined by the user.

Localization Directives

The filter supports localization directives. They are special comments you can use to override the default behavior of the filter regarding the parts to extract.

Directives can be in any part of the file that is matched by a rule with its action set to Do not extract and the option Treat as comment set as well.

The syntax and behavior of the directives are the same across all Okapi filters. See the Localization Directives pages for detail information about what you can do with the mechanism.

Line-Breaks

The line-breaks type of the output are of the same type as the input.

Parameters - Rules Tab

Add -- Click this button to add a new rule.

Remove -- Click this button to remove the current rule. You cannot remove the default (first) rule. The program will ask confirmation before deleting the rule and all its associated data.

Move Up -- Click this button to move the current rule upward. The first rule cannot be moved.

Move Down -- Click this button to move the current rule downward. The first rule cannot be moved.

Edit Expression -- Click this button to modify the regular expression associated with the current rule. This command opens the Rule Regular Expression dialog box.

Rename Rule -- Click this button to change the name of the current rule.

Extract the content -- Select this option to indicate the content of the blocks matched by the current rule should be extracted.

Extract the strings -- Select this option to indicate the strings within the content of the blocks matched by the current rule should be extracted. The options that define what a string is are set in the Strings tab.

Treat as group beginning -- Select this option to indicate that a start of group should be generated when the current rule is matched.

Treat as group ending -- Select this option to indicate that an end of group should be generated when the current rule is matched.

Do not extract -- Select this option to indicate the block matched should be left as it.

Treat as comment -- Set this option to allow localization directives to be recognized in the block. The action must be set to Do not extract to use this option.

Unwrap text -- Set this option to replace in the extracted text of a content any sequence of spaces, line-breaks, and tabs by a single space.

Mark inline codes -- Set this option to process inline codes in the extracted text. Use the Edit Inline Codes button to define the inline codes.

Edit Inline Codes -- Click this button to add, edit or remove inline codes definitions. This button opens the Inline Codes Rules dialog box. The expressions for the inline codes are displayed below the button.

Restype -- Enter the resource type identifier to associate with the text extracted with the current rule. This is the value of the XLIFF restype attribute. You can leave it empty.

Rule options

These options apply to all rules.

Ignore case -- Set this option to ignore the cases in the matches. for example "bear" matches "Bearcat", "BEARCAT", and "bearcat".

Multiline -- Set this option to changes the meaning of ^ and $ so that they match at the beginning and end of any line, not just the beginning and end of the whole string.

ECMAScript-compliant behavior -- Set this option to enable ECMAScript-compliant behavior for the expression.

Dot also matches line-feed -- Set this option to changes the meaning of the period character '.' so that it matches every character instead of every character except \n.

Rule Regular Expression

Start -- Enter the expression for the start of the block for this rule.

Content -- Enter the expression for the content of this block for this rule. Most of the time you will use .*? which takes everything between the start and end expressions. Note that the possible parts between the start and end matches and the content match are not processed by the default rule (unlike the parts outside the start and end matches).

End -- Enter the expression for the end of the block for this rule.

Insert Pattern -- Click this button to insert a regular expression pattern in the last of the edit boxes Start, Content, and End that was active. The inserted text replaces whatever text is currently selected.

Test -- Click this button to check the syntax of the current rule, and to watch the effect it has on the sample text. The result of the test are displayed in the Result box.

Sample -- Enter the text to process for testing the expressions.

Regular Expression Help -- Click this button to get detailed help on the syntax and the usage of regular expressions. The button opens the Regular Expression help section.

Parameters - Strings Tab

This tab lets you define what is a "string" for the file format you are dealing with. These options are using to detect strings when you select the action Exact strings for a given rule.

Start character(s) -- Enter the character or characters that can be used to start a string.

End character(s) -- Enter the character or characters that close a string. You must have as many ending characters as you have starting characters. When processing, the closing character searched for is the one at the same position as the start character that started the parse.

Escape sequences

Escape by doubling the character -- Set this option to indicate that any double character is an escaped character.

Escape by adding the prefix -- Set this option to indicate that a character is escaped when the specified escape character prefix it.

No-escape strings use the prefix -- Set this option to allow the use of the specified prefix to mark strings that do no support escape sequences. In such strings the escape mechanism is disabled. For example: @"The character \ is a back-slash."

Parameters - Characters Tab

This tab lets you define how the escape notations for special and extended characters for the file format you are dealing with. The conversion from escaped form to raw characters is done to both strings and content text.

Note that only extended characters are converted, escaped ASCII characters are left in their escaped form (and therefore remain that same form in the output as well).

Input conversions

Convert Java-style notation -- Set this option to convert any sequence of extractable text in the form \uHHHH (where HHHH is the Unicode hexadecimal value of the character) into its raw character form.

Convert C-style notation -- Set this option to convert any sequence of extractable text in the form \xHH (where HH is ???) into its raw character form.

Output conversions

For all extended characters -- Select this option to convert into the selected escaped form all extended characters (supported or not in the output encoding).

For un-supported characters only -- Select this option to convert into the selected escaped form only the extended characters that are not supported in the output encoding.

No escape -- Select this option to leave alone any escapable characters. By default, un-supported characters will be lost and replaced by a question mark ("?") or by a close ASCII character (for example: "a" for "à", "á", "â", "ã", etc.)

Java-style escape -- Select this option to write the escapable characters using the Java-style notation. For example, the character "à" would be written "\u00e0".

Decimal NCR -- Select this option to write the escapable characters using the decimal numeric character reference notation. For example, the character "à" would be written "&#224;".

Hexadecimal NCR -- Select this option to write the escapable characters using the hexadecimal numeric character reference notation. For example, the character "à" would be written "&#xe0;".

User-defined -- Select this option to write the escapable characters using the pattern of your choice. If this option is selected, you must enter in the following edit box the formatting pattern to use.

The pattern must be a valid formatting pattern as defined for the String class in C#. If the pattern is invalid you will get an error such as "Input string was not in a correct format" when processing the file. The pattern must be set for a integer parameter (the Unicode 16-bit code-point of the character).

Here are a few examples of valid patterns:

Pattern Result for "à"
{0} 224
{0:000000} 000224
\u{0:X4} \u00E0
&#x{0:X4}; &#x00E0;
&#x{0:x4}; &#x00e0;
unsupported->{0:X4}< unsupported->00E0<

The general syntax for the pattern is: "{0[,alignment][:formatString]}". The most simple patter is "{0}". To specify a literal opening brace ("{") use "{{", for a closing-brace ("}") use "}}". For example, to get the output "${224}" (for an unsupported 'à') specify the pattern "${{{0}}}".

Note that if some characters of the pattern should be escaped when used in one of the output file, you must specified the escaped form in the pattern. For example: If you want to get the output ">00E0<" (for an unsupported 'à') in an HTML or XML file, specify the pattern "&gt;{0:X4}&lt;" and not ">{0:X4}<".

Parameters - Options Tab

Use localization directives when they are present -- Set this option to enable the filter to recognize localization directives. If this option is not set, any localization directive in the input file will be ignored.

Extract items outside the scope of localization directives -- Set this option to extract any translatable item that is not within the scope of a localization directive. Selecting to extract or not outside localization directives allows you to mark up fewer parts of the source document. This option is enabled only when the Use localization directives when they are present option is set.

See the Localization Directives section for more details on how the filter deals with directives.

Use Do-Not-Localize list if a DNL file is present -- Set this option to enable the filter to utilize any Do-Not-Localize list file found along with a given input file. The DNL file has the path and name as the input file, with an additional .dnl extension. It contains a list of entries that should not be extracted. Each entry is made of the resname, restype and text of a filter item. Use the DNL List Editing utility to create and maintained DNL files.

Datatype identifier -- Enter the identifier to use with the extracted text. This is the value that will be assigned to the XLIFF datatype attribute.