- Overview
- Common Parameters
- Options - Options Tab

Overview

- The utility set identifier for this utility is: oku_set02
- The utility identifier is: rtfconversion

The RTF Conversion utility allows you to convert RTF files to text-based files.

Hidden text and deleted text is not written in the output file. You can use this utility to create a text file from an "un-clean" RTF files with Trados markers.

Important: If the output file has some statement indicating its encoding, and the output encoding selected is different from that statement, this utility does not modify the statement. For example, if the file is an XML document with an encoding declaration set as "encoding='windows-1252'" and the output encoding you select is UTF-8, this utility will not automatically change the encoding declaration to "encoding='utf-8'". The same goes for HTML files, RC files, and any other file format with encoding declaration.

Common Parameters

The common parameters are the options specified from the application calling the utility rather than in the options dialog box of the utility itself. For this utility the common parameters you need to specify are the following:

Files of the first input list - Needed (the RTF files to convert)
Root for the first input list - Not Needed
Files of the second input list - Not Needed
Root for the second input list - Not Needed
Files of the third input list - Not Needed
Root for the third input list - Not Needed
Input language - Not Needed
Output language - Not Needed
Input default encoding - Needed
Output default encoding - Needed
Location and names for output files - Needed

Options - Options Tab

Line-break -- Select the type of line-break to use for the output file.

Fix right single quotation marks (U+2019) used as apostrophes -- Set this option to replace any right single quotation mark character (U+2019) incorrectly used as apostrophe by an apostrophe character (U+0027). A character is replaced only if the characters on each side are not white-spaces. For example, "Lhistoire" is replaced by "L'histoire"., but "Yves mistake." remains unchanged.

Warn if any extended character is not supported by the output encoding -- Set this option to get a warning message if any extended characters in the file is not supported by the output encoding. Note that using this option (as well as providing an escape notation for un-supported extended characters) forces the program to write the output character by character, making it slower.

Override \fcharset0 and \ansi with input encoding -- Set this option to treat \ansi as well as any font set to \fcharset0 to use the encoding associated to the input file instead of windows-1252 (the "normal" \fcharset0). Use this option to process RTF files generated by applications (like old versions of Word) that assume \fcharset0 corresponds to the current encoding of the machine. When this option is set: the text set to a font associated with \fcharset0 or \ansi is parsed with whatever input encoding is set. When this option is not set: the same text is always parsed using windows-1252. This option has no effect with the text written in Unicode escape sequences.

Update XML/HTML encoding declarations -- Set this option to automatically update the XML and HTML encoding declaration in the output so they match the selected output encoding.

For XML: This option works only on XML files that have an XML declaration (i.e. "<?xml ...?>") and if that declaration is on a single line, and that line in the 20 first lines of the file. If there is an encoding declaration it is updated, if there is no encoding declaration it is inserted after the version declaration. Note that this utility does not know anything about XML comments.

For HTML: This option works only on HTML files that have an HTML start tag (i.e. "<html ...>") and if that tag is in the 20 first lines of the file. If there is a charset declaration it is updated, if there is no charset declaration nothing is added.  Note that this utility does not know anything about HTML comments.

Notation for non-breaking spaces -- Select the type of notation to use for the non-breaking character (U+00A0). The choices are:

Notation for un-supported extended characters -- Select the type of notation to use for characters not supported by the output encoding. The choices are: