Okapi ComponentsText Rewriting Utility |
|
- The utility set identifier for this utility is: oku_set01
- The utility identifier is: rewriting
The Text Rewriting utility allows you to process the input files through their respective filter and generate an output file the way it would be if you had extracted and then merged the file. You can also make various modifications to the extractable text during the process, such as pseudo-translation.
The common parameters are the options specified from the application calling the utility rather than in the options dialog box of the utility itself. For this utility the common parameters you need to specify are the following:
Files of the first input list | - Needed (the files to re-write) |
Root for the first input list | - Not Needed |
Files of the second input list | - Not Needed |
Root for the second input list | - Not Needed |
Files of the third input list | - Not Needed |
Root for the third input list | - Not Needed |
Input language | - Needed |
Output language | - Needed |
Input default encoding | - Needed |
Output default encoding | - Needed |
Location and names for output files | - Needed |
Keep the original content -- Select this option to generate an output file with the same text as the input file. This will show how much close the output file can resemble to the input file for the given filter. Some modifications may still occur: not all filters are able to merge back exactly the same text as changes such as line unwrapping, or white spaces collapsing may be part of the process.
Remove the text, keep the inline codes -- Select this option to generate an output file where all the text is removed, but all the codes are preserved. This option can be used to generate two files without text to identify more easily changes in the codes. It can also be utilized to spot instantaneously hard-coded text.
Mask the text, keep the inline codes -- Select this
option to generate an output file where all the extractable text is changed
to generic letters: all uppercases are changed to 'X
', all lowercases are
changed to 'x
', and all digits are changed to 'N
'. All the other characters
and the content of inline codes remain
untouched. The length of the text stays the same as the original. This
option can be used to detect any hard-coded text: it will easily stands out in the masked content, while the general
layout and formatting will help you locate it.
Add identifiers to the text -- Select this option to generate
an output file where all extractable text has an added prefix. The prefix is
made of the Resname
value for the item if one exists, or an ID
"iN
" where N
is the item identifier for the given
item. In addition, if the option Use file identifier is set,
the prefix starts with marker "fN
" where N
is the number of the file (in base-32 notation) in the order it was
processed.
Pseudo translate the text, keep the inline codes -- Select this option to generate an output file where all the extractable text is pseudo-translated according the rules described below:
Apply the modifications also to the text items already translated -- Set this option to overwrite any translation already existing in the input file. Some formats have both source and possible target text (like the PO format), when this option is set, the text is replaced even if it is already an existing translation.
Mask coordinates -- Set this option to generate an output were all extractable coordinates (x, y, width, and height) have all the same value: 1. This is helpful when you want to produce files to make a code-only comparison.
Use file identifier -- Set this option to start the text identifier with a file identifier when using the Add identifier to the text option.
Use machine translation -- Set this option to use a machine translation result for the pseudo-translation. You need to be connected to Internet to use this option.
Parameters -- Click this button to open a dialog box where you can specify the parameters needed to use the MT engine currently selected. Note that not all MT engines have parameters.
Note
Using this option may result in very slow processing:
The speed depending on the speed of your connection as well as the server's
traffic.
Note
Text with inline codes (XML, HTML for example) may have trouble being
rewritten because the MT system may re-arrange the order of the tags,
placing a closing tag before its corresponding opening tag. This depends on
the sentence, the target language, and many other factors.
If the machine translation option is not used, the pseudo-translation is generated by replacing any character defined in Original by its counterpart defined in New, and by adding or subtracting a number of characters computed from the length of the original text with the value defined in Ratio. Added characters are taken randomly from the characters listed in Seed,
Ratio -- Set the ratio of change in length between the original text and the one to generate. You can use values from +500% to -500%.
Original -- Enter the list of original characters that should be replaced in the generated file. You must specify as many characters as there are characters in the New entry.
New -- Enter the list of new characters that should be used instead of the original characters. You must specify as many characters as there are characters in the Original entry.
Seed -- Enter the characters to use as seed when increasing the length of the original text. Insert more or less spaces depending whether you want the expanded string to contain more or les spaces.
Prefix text -- Set this option to add one or more characters at the front of the original text.
Suffix text -- Set this option to add one or more characters at the end of the string. If the original text is expended, the suffix text is added at the end of it. If the last character of the original text was a white-space or was some inline code, the suffix text is added after it.