Okapi Components - Text Rewriting Utility

Overview

- The utility set identifier for this utility is: oku_set01
- The utility identifier is: rewriting

The Text Rewriting utility allows you to process the input files through their respective filter and generate an output file the way it would be if you had extracted and then merged the file. You can also make various modifications to the extractable text during the process, such as pseudo-translation.

Common Parameters

The common parameters are the options specified from the application calling the utility rather than in the options dialog box of the utility itself. For this utility the common parameters you need to specify are the following:

Files of the first input list	- Needed (the files to re-write)
Root for the first input list	- Not Needed
Files of the second input list	- Not Needed
Root for the second input list	- Not Needed
Files of the third input list	- Not Needed
Root for the third input list	- Not Needed
Input language	- Needed
Output language	- Needed
Input default encoding	- Needed
Output default encoding	- Needed
Location and names for output files	- Needed

Options - Output Tab

Keep the original content -- Select this option to generate an output file with the same text as the input file. This will show how much close the output file can resemble to the input file for the given filter. Some modifications may still occur: not all filters are able to merge back exactly the same text as changes such as line unwrapping, or white spaces collapsing may be part of the process.

Remove the text, keep the inline codes -- Select this option to generate an output file where all the text is removed, but all the codes are preserved. This option can be used to generate two files without text to identify more easily changes in the codes. It can also be utilized to spot instantaneously hard-coded text.

Mask the text, keep the inline codes -- Select this option to generate an output file where all the extractable text is changed to generic letters: all uppercases are changed to 'X', all lowercases are changed to 'x', and all digits are changed to 'N'. All the other characters and the content of inline codes remain untouched. The length of the text stays the same as the original. This option can be used to detect any hard-coded text: it will easily stands out in the masked content, while the general layout and formatting will help you locate it.

Add identifiers to the text -- Select this option to generate an output file where all extractable text has an added prefix. The prefix is made of the Resname value for the item if one exists, or an ID "iN" where N is the item identifier for the given item. In addition, if the option Use file identifier is set, the prefix starts with marker "fN" where N is the number of the file (in base-32 notation) in the order it was processed.

Pseudo translate the text, keep the inline codes -- Select this option to generate an output file where all the extractable text is pseudo-translated according the rules described below:

Apply the modifications also to the text items already translated -- Set this option to overwrite any translation already existing in the input file. Some formats have both source and possible target text (like the PO format), when this option is set, the text is replaced even if it is already an existing translation.

Mask coordinates -- Set this option to generate an output were all extractable coordinates (x, y, width, and height) have all the same value: 1. This is helpful when you want to produce files to make a code-only comparison.

Use file identifier -- Set this option to start the text identifier with a file identifier when using the Add identifier to the text option.

Pseudo-Translation Options

Use machine translation -- Set this option to use a machine translation result for the pseudo-translation. You need to be connected to Internet to use this option.

Parameters -- Click this button to open a dialog box where you can specify the parameters needed to use the MT engine currently selected. Note that not all MT engines have parameters.

Note

Using this option may result in very slow processing: The speed depending on the speed of your connection as well as the server's traffic.

Note

Text with inline codes (XML, HTML for example) may have trouble being rewritten because the MT system may re-arrange the order of the tags, placing a closing tag before its corresponding opening tag. This depends on the sentence, the target language, and many other factors.

If the machine translation option is not used, the pseudo-translation is generated by replacing any character defined in Original by its counterpart defined in New, and by adding or subtracting a number of characters computed from the length of the original text with the value defined in Ratio. Added characters are taken randomly from the characters listed in Seed,

Ratio -- Set the ratio of change in length between the original text and the one to generate. You can use values from +500% to -500%.

Original -- Enter the list of original characters that should be replaced in the generated file. You must specify as many characters as there are characters in the New entry.

New -- Enter the list of new characters that should be used instead of the original characters. You must specify as many characters as there are characters in the Original entry.

Seed -- Enter the characters to use as seed when increasing the length of the original text. Insert more or less spaces depending whether you want the expanded string to contain more or les spaces.

Prefix text -- Set this option to add one or more characters at the front of the original text.

Suffix text -- Set this option to add one or more characters at the end of the string. If the original text is expended, the suffix text is added at the end of it. If the last character of the original text was a white-space or was some inline code, the suffix text is added after it.