RTF Conversion Utility
- The utility set identifier for this utility is:
- The utility identifier is:
The RTF Conversion utility allows you to convert RTF files to text-based files.
Hidden text and deleted text is not written in the output file. You can use this utility to create a text file from an "un-clean" RTF files with Trados markers.
Important: If the output file has some statement indicating its encoding, and the output encoding selected is different from that statement, this utility does not modify the statement. For example, if the file is an XML document with an encoding declaration set as "encoding='windows-1252'" and the output encoding you select is UTF-8, this utility will not automatically change the encoding declaration to "encoding='utf-8'". The same goes for HTML files, RC files, and any other file format with encoding declaration.
The common parameters are the options specified from the application calling the utility rather than in the options dialog box of the utility itself. For this utility the common parameters you need to specify are the following:
|Files of the first input list||- Needed (the RTF files to convert)|
|Root for the first input list||- Not Needed|
|Files of the second input list||- Not Needed|
|Root for the second input list||- Not Needed|
|Files of the third input list||- Not Needed|
|Root for the third input list||- Not Needed|
|Input language||- Not Needed|
|Output language||- Not Needed|
|Input default encoding||- Needed|
|Output default encoding||- Needed|
|Location and names for output files||- Needed|
Line-break -- Select the type of line-break to use for the output file.
Fix right single quotation marks (U+2019) used as apostrophes --
Set this option to replace any right single quotation mark character (U+2019)
incorrectly used as apostrophe by an apostrophe character (U+0027). A character
is replaced only if the characters on each side are not white-spaces. For
is replaced by "
Warn if any extended character is not supported by the output encoding -- Set this option to get a warning message if any extended characters in the file is not supported by the output encoding. Note that using this option (as well as providing an escape notation for un-supported extended characters) forces the program to write the output character by character, making it slower.
Override \fcharset0 and \ansi with input encoding -- Set this option to
\ansi as well as any font set to
\fcharset0 to use the encoding associated to
the input file instead of windows-1252 (the "normal"
Use this option to process RTF files generated by applications (like old
versions of Word) that assume
\fcharset0 corresponds to the current
encoding of the machine. When this option is set: the text set to a font associated
\ansi is parsed with whatever input encoding is set.
When this option is not set: the same text is always parsed using windows-1252.
This option has no effect with the text written in Unicode escape sequences.
Update XML/HTML encoding declarations -- Set this option to automatically update the XML and HTML encoding declaration in the output so they match the selected output encoding.
For XML: This option works only on XML files that have
an XML declaration (i.e. "
<?xml ...?>") and if that declaration is
on a single line, and that line in the 20 first lines of the file. If there is
an encoding declaration it is updated, if there is no encoding declaration it is
inserted after the version declaration. Note that this utility does not know
anything about XML comments.
For HTML: This option works only on HTML files that have an HTML start tag
<html ...>") and if that tag is in the 20 first lines of the
file. If there is a charset declaration it is updated, if there is no charset
declaration nothing is added. Note that this utility does not know
anything about HTML comments.
Notation for non-breaking spaces -- Select the type of notation to use for the non-breaking character (U+00A0). The choices are:
. Be careful with this notation in XML files as the character entity must be defined.
Notation for un-supported extended characters -- Select the type of notation to use for characters not supported by the output encoding. The choices are: