Okapi ComponentsWordfast Filter |
|
- Overview |
The Wordfast Filter is an Okapi component that implements the Okapi Filter Interface for Wordfast translation memory files. See the Wordfast Web site for more information on Wordfast.
The filter supports Wordfast TM version 5, as described in the Wordfast manual.
The following is an example of a very simple Wordfast TM file. The source text
is marked in blue bold. and the target text
is marked in green bold. The marker
<tab>
represent a tabulation character.
%20051224~235959<tab>%User ID,YS,YS Yves,<tab>%TU=00000000<tab>%EN-US<tab>%Wordfast TM v 5.0/00<tab>%FR-FR<tab>%---45399785. 20051224~235959<tab>YS<tab>1<tab>EN-US<tab>Item 1<tab>FR-FR<tab>Article 1 20051224~235959<tab>YS<tab>1<tab>EN-US<tab>Item 2<tab>FR-FR<tab>Article 2
The properties for the Wordfast Filter are the following:
Property | This Filter |
---|---|
INPUTFILE | Yes |
INPUTSTRING | No |
BILINGUALINPUT | Yes |
TEXTBASED | Yes |
OUTPUTFILE |
Yes |
OUTPUTSTRING | No |
ANCILLARYOUTPUT | No |
XMLOUTPUT | No |
RTFOUTPUT | No |
USEKEY | No |
ISINDEMOMODE | No |
If the UTF-8 or UTF-16 has been auto-detected as the encoding of the input file, that encoding is used, otherwise the encoding specified by the user is used. (Note that Wordfast files are normally never in UTF-8).
Be careful when selection the input encoding: Wordfast file are bilingual file, and often the correct encoding of the input file is the one corresponding to the target/output language.
The encoding of the output is the one specified by the user, except when UTF-8 or UTF-16 has been auto-detected as the input encoding. In that case, the output encoding used is the same as the input encoding.
If one or more characters are not supported in the output encoding, the
unsupported characters are escaped into \uHHHH
(where HHHH
is the hexadecimal Unicode value of the character) and a warning message is
displayed when closing the output file.
TODO: line-break MAC support
The date and time information in Wordfast TMs is in local time. Ideally the filter should try to convert the TM date/time to UTC, however because the entries cover dates that can be in both daylight saving periods and non-daylight periods using a single time difference information may result in onerous data anyway. In addition as the TM circulate between translators, the entries may be in different local times. Overall since there is no certainty that a conversion would be accurate the filter does not make any.
As a result, the filter simply assumes the date and time information is in UTC.
This filter implement support for a few additional properties that give you access to the all information set for each entry. The properties are:
ChangeDate |
This corresponds to the field 1 of the entry.
The property value is in the format "yyyyMMddTHHmmssZ " |
ChangeUser |
This corresponds to the field 2 of the entry. |
Flag |
1 if the entry is flagged, 0 otherwise. |
UsageCount |
This corresponds to the field 3 of the entry. |
@AAttr2 |
This corresponds to the field 8 of the entry.
The @A prefix is used for compatibility with other filters. The
character @ indicates an user-defined field, the flag
A indicates it's an attribute with a possible pick-list
(as opposed to a T flag indicating a text-field). Note
that there is no @AAttr1 attribute, as in Wordfast the first
attribute is the second field of the entry (UserID) and the filter
maps it to the ChangeUser property. |
@AAttr3 |
This corresponds to the field 9 of the entry. |
@AAttr4 |
This corresponds to the field 10 of the entry. |
@AAttr5 |
This corresponds to the field 11 of the entry. |
Note that all these fields may not be available for all entries. The
IFilterItem
method GetProperty()
return null when
the property does not exist. You can use the ListProperties()
method to get a semi-colon delimited list of all properties in the current
filter item.
Special thanks to Yves Champollion for providing information on the format of the Wordfast TM.