IMPORTANT NOTICE - Apr-30-2009:
This .NET implementation of Okapi is no longer actively developed.
Instead, a NEW JAVA IMPLEMENTATION IS
AVAILABLE and is being actively developed.
The material on this Web site is for archive propose. Some applications of the old .NET implementation
(e.g. Olifant) will be maintained to some degree until they have a replacement in the Java project.
Filter Interface Specification
Revision information: Version 1.1
Latest revision:
http://okapi.sourceforge.net/IFilter.html
- Document Status
-
Overview
-
Usage
-
Defined Values
-
Methods
This document is a stable specification. Feedback about the content of this document is encouraged. Send your comments to the Okapi Framework administrator, or alternatively to post them in the Okapi Tools users group.
DISCLAIMER: Parts of the documents are generated automatically from the source code documentation of Okapi's implementation of the specification. Because the generation tool is not completely working yet, this may result in incomplete or broken text or links in this document. See the source code documentation of the interface for complete text.
The Okapi Filter Interface is an object model that allows any given input source to be parsed so its different localization-related parts are presented in a common manner to a given calling program.
Essentially, the Filter Interface allows you to abstract the intricacies of the original data source you are processing, and develop applications that act upon a single common and standardized type of input.
Other related specifications:
This interface provides a common way of reading the localizable information of an input. It also offers the option to re-write the input into a new output at the same time.
Just after creating an object that implements IFilter, you should call theIFilter.Initialize
method. At this point you can query the filter to get more
information about the filter: use IFilter.GetName
to retrieve its name,
IFilter.GetIdentifier
to retrieve its identifier, or
IFilter.GetDefaultDatatype
to know what kind of data it processes.Use the IFilter.LoadSettings
method to specify which filter, variant and
parameters file to use.The IFilter.QueryProperty
method allows you to detect whether the filter
supports or not different types of features.
The following defined values are used with the IFilter interface:
The filter properties are used with the
QueryProperty
method to know if a filter supports a given property.
The properties currently defined are the following:
0 (INPUTFILE) | Constant value = 0. Queries whether the filter supports input file. |
1 (INPUTSTRING) | Constant value = 1. Queries whether the filter supports input for strings. |
2 (BILINGUALINPUT) | Constant value = 2. Queries whether the filter input may be bilingual. |
3 (TEXTBASED) | Constant value = 3. Queries whether the format supported by the filter is text-based. |
4 (OUTPUTFILE) | Constant value = 4. Queries whether the filter supports output file. |
5 (OUTPUTSTRING) | Constant value = 5. Queries whether the filter supports string output. |
6 (ANCILLARYOUTPUT) | Constant value = 6. Queries whether the filter supports ancillary output(s). |
7 (XMLOUTPUT) | Constant value = 7. Queries whether the filter supports XML and HTML output. |
8 (RTFOUTPUT) | Constant value = 8. Queries whether the filter supports RTF output. |
9 (USEKEY) | Constant value = 9. Queries whether the filter is currently using a key. |
10 (ISINDEMOMODE) | Constant value = 10. Queries whether the filter is currently in demonstration mode. |
The filter item types are used to identify what type of item is returned by
the filter. The ReadItem
method return such a
value, and an object that implements the IFilterItem interface is labeled with
such a type as well.
0 (ERROR) | Constant value = 0. Error. An item of this type indicates that an error has occurred. The processing of the input should be stopped. |
1 (USERCANCEL) | Constant value = 1. User cancellation. An item of this type indicates that the user has cancelled the operation. The processing of the input should be stopped. |
2 (ENDINPUT) | Constant value = 2. End of the input. An item of this type indicates that the end of the input file or of the input string has been reach. The processing of the input should be stopped. |
3 (TEXT) | Constant value = 3. Text item. An item of this type contains text. All text items are not necessarily to be translated. Use theIFilterItem.IsTranslatable method to see if the text
is translatable. Use IFilterItem.IsTranslated to detect if text item
has a corresponding translation (in the case of bilingual files). |
4 (STARTGROUP) | Constant value = 4. Beginning of group. |
5 (ENDGROUP) | Constant value = 5. End of group. |
6 (BLOCK) | Constant value = 6. Block of data. An item of this type is a block of data not relevant for localization. |
7 (BINARY) | Constant value = 7. Standalone binary data (e.g. bitmap) An item of this type corresponds to an ancillary file. The text content is the path, mime-type must be set. |
8 (STARTBINARY) | Constant value = 8. Start of a binary data (e.g. graphic with extracted text). An item of this type corresponds to an ancillary file with some additional data (possible TEXT items). The text content is the path, mime-type must be set. |
9 (ENDBINARY) | Constant value = 9. End of a binary data. |
10 (STARTSECONDARYFILTER) | Constant value = 10. Start of a part parsed with a secondary filter. |
11 (ENDSECONDARYFILTER) | Constant value = 11. End of a part parsed with a secondary filter. |
Any item with a type value of 0, 1, or 2 (so any possible value below 3) should result in stopping the process. A possible way to program the parsing of the input is something like this:
int nRes; do { nRes = myFilter.ReadItem(); switch ( nRes ) { case 0: // Fatal error myLog.Error("A fatal parsing error has occurred."); continue; case 1: // User interruption myLog.Message("Process interrupted by the user."); continue; case 2: // Normal end of input continue; } } while ( nRes > 2 );
The Filter Interface provides the following methods:
IFilter.Initialize | Initializes the filter object. |
IFilter.GetInterfaceVersion | Gets the version of the IFilter interface the object implements. |
IFilter.GetIdentifier | Gets the identifier of the filter. |
IFilter.GetName | Gets the name of the filter. |
IFilter.GetCurrentEncoding | Gets the current encoding used to process the input file. |
IFilter.GetCurrentLanguage | Gets the current language of the processed input. |
IFilter.GetInputLanguage | Gets the language of the input. |
IFilter.GetOutputLanguage | Gets the language of the output. |
IFilter.GetOutputEncoding | Gets the name of the encoding used for the output. |
IFilter.GetItem | Gets the current filter item. |
IFilter.GetTranslatedItem | Gets the translation of the current filter item. |
IFilter.GetDefaultDatatype | Gets the default datatype identifier. |
IFilter.QueryProperty | Queries whether a specified property is supported by the filter. |
IFilter.GetVariantCount | Gets the number of variants supported by the filter. |
IFilter.GetVariantID | Gets the identifier for a specified variant of the filter. |
IFilter.GetVariantDescription | Gets the description of a specified variant of the filter. |
IFilter.LoadSettings | Loads the filter settings. |
IFilter.SaveSettingsAs | Saves the current settings to a specified path |
IFilter.GetSettingsString | Gets the current settings string for the filter. |
IFilter.EditSettings | Edits a specified parameters file for the filter. |
IFilter.GetLocalizationDirectives | Get the current localization directives of the filter. |
IFilter.SetLocalizationDirectives | Passes a localization directives context to the filter. |
IFilter.OpenInputFile | Opens the file to process. |
IFilter.OpenInputString | Set a given string as the input to process. |
IFilter.ResetInput | Resets the input. |
IFilter.GetLastItemID | Gets the last ID used for an item. |
IFilter.SetLastItemID | Sets the last ID for an item. |
IFilter.ReadItem | Reads the next item of the input file or input string. |
IFilter.CloseInput | Closes the current input. |
IFilter.SetOutputOptions |
Sets the options for the output. Call this method before calling |
IFilter.UseOutputLayer | Sets the layer information for the output. |
IFilter.GetOutputLayer | Gets the parameters of the current output layer. |
IFilter.OpenOutputFile |
Creates the output file. Call |
IFilter.OpenOutputString | Creates an output string. |
IFilter.SetAncillaryDirectory | Sets the directory for the ancillary data. |
IFilter.GenerateAncillaryData | Generates the ancillary data. |
IFilter.WriteItem | Writes the last item read. |
IFilter.CloseOutput | Closes the current output. |
Initializes the filter object.
Okapi.Library.Base.ILog | p_Log | The log object to use. |
This method must be called first after the creation of the object.
Gets the version of the IFilter interface the object implements.
The version of the IFilter interface.
Gets the identifier of the filter.
The string that identify the filter.
Use this method to retrieve the unique identifier of the filter. The value is case-sensitive.
The identifier used in the name of the parameters files, along with other parts. For example, inokp_myfilter@Test.fprm
the
okp_myfilter
is the filter identifier.Gets the name of the filter.
The name of the filter.
Use this method to retrieve the localized name of the filter. For example:
Okapi PO Filter
.
Gets the current encoding used to process the input file.
The IANA charset name of the current input eencoding.
As some format may provide mechanism for declaring their encoding, the
encoding provided when calling IFilter.OpenInputFile
or
IFilter.OpenInputString
may get modified. The IFilter.GetCurrentEncoding
method allows you to retrieve the encoding really used for the processing, after opening
the input.
Gets the current language of the processed input.
The string that identifies the current language.
The value is a BCP47 language code (e.g. pt-BR). Note that the value is not case-sensitive.
Gets the language of the input.
The string that identify the input language.
The value is a BCP47 language code (e.g. pt-BR). Note that the value is not case-sensitive.
Gets the language of the output.
The string that identify the output language.
The value is a BCP47 language code (e.g. pt-BR). Note that the value is not case-sensitive.
Gets the name of the encoding used for the output.
The name of the encoding. It is a IANA charset name. Returns null if no encoding is defined.
Gets the current filter item.
The IFilterItem interface to the current item.
You must call the IFilter.ReadItem
method
before using this method. Depending on the value of
IFilterItem.GetItemType
the data available with this object are
very different.
Gets the translation of the current filter item.
The IFilterItem interface to the translated text of the current item.
You must call the IFilter.ReadItem
method
before using this method. If the item returned is a text item it may have an existing
translation (in the case of bilingual files), use the IFilterItem.IsTranslated
method on the item returned by IFilter.GetItem
to detect if such translation is available.
If there is no translation available, the result of IFilter.GetTranslatedItem
is null.
Gets the default datatype identifier.
The datatype identifier of the filter.
Queries whether a specified property is supported by the filter.
Integer | p_nProperty | Property to query. The value must be one of the
Filter.FilterProperty values. |
True if the specified property is supported, false if it not supported.
Gets the number of variants supported by the filter.
Number of variant supported by the filter.
Each filter must support at least one variant (the default one).
Gets the identifier for a specified variant of the filter.
Integer | p_nIndex | Index of the variant. The value must be between 0 and the
value returned by IFilter.GetVariantCount - 1. |
The string identifying the specified variant.
Gets the description of a specified variant of the filter.
Integer | p_nIndex | Index of the variant. The value must be between 0 and the
value returned by IFilter.GetVariantCount - 1. |
The short description of the specified variant.
Loads the filter settings.
String | p_sFilterSettings | Filter settings string. |
System.Boolean | p_bIgnoreErrors | True if no error is generated if the file cannot be loaded. False if the methods generates an error and return false when the file cannot be loaded. |
True if the settings could be loaded (or set to their defaults values), false if an error occured.
The filter settings string has the following syntax:
[<folder>]<filterID>[#<variantID>][@[<F>%]<parametersID>]Where:
<folder>
is the optional folder where the parameters file is located. This folder must
not be specified if the <F>
component of the parameters file identifier is set.
If <folder>
and <F>
are both omitted the default Okapi Parameters folder is assumed.
Note that the folder portion of the filter settings string is not relevant if there is no
<parametersID>
since there is no physical file for the default settings.<filterID>
is the filter identifier (the value returned by the IFilter.GetIdentifier
method).<variantID>
is an optional variant identifier for the given filter (one of the values
returned by the IFilter.GetVariantID
method).[<F>]<parametersID>
is the optional identifier of a parameters file. The optional
<F>
component is the folder identifier: S for the sytem folder, U for the user folder,
and P for the project folder. If no folder identifier is specified the system folder is assumed.
The system folder is the sub-folder named Parameters
in the Okapi Shared folder, so by default
in Windows: C:\Program Files\Okapi\Shared\Parameters
.The user folder is the sub-folder named Okapi\Parameters
in the Application data folder, so by
default in Windows: C:\Documents and Settings\USERNAME\Application Data\Okapi\Parameters
.The project folder is the folder specified by the environment variable OKAPIPROJECTPARAMETERS
.
Each application is responsible for setting the variable. If the variable is not set or empty, the project
folder is assumed to be the current folder of the running application.Loads the Okapi PO Filter with the parameters file named okf_po@myOptions.fprm
located
in the Okapi Parameters system older. An error is generated if the file does not exist.
LoadSettings("okf_po@myOptions", true);The actual file loaded will be
C:\Program Files\Okapi\Shared\Parameters\okf_po@myOptions.fprm
assuming the application is installed with its default parameters file location. The parameter could also
have been okf_po@S%myOptions.fprm
.
Saves the current settings to a specified path
String | p_sPath | Full path of the file where to save the settings. |
String | p_sPrefix | Optional prefix. If not null, it will be placed in front of any settings-related additional file(s) the filter saves along with the parameters file. |
True if no error occured, false otherwise.
Gets the current settings string for the filter.
The Filter settings string.
Edits a specified parameters file for the filter.
String | p_sFilterSettings | Filter settings string.
See IFilter.LoadSettings for the string format. |
System.Boolean | p_bNew | True if it is a new file, false otherwise. |
True if the edit was succesful, false if the user cancels the edit or if an error occurs.
Get the current localization directives of the filter.
The current localization directive object of the filter, or null if the filter does not support localization directives.
Passes a localization directives context to the filter.
Okapi.Library.Filter.ILocalizationDirectives | p_LD | Context to pass. |
True if the context was passed without error, false otherwise.
When the filter is used to process data extracted from inside another dataset (e.g. a script within an HTML file) you may need to pass the context of the outer filter.
Opens the file to process.
String | p_sPath | Full path of the input file. |
String | p_sLanguage | The code of the language to process. The code must be a BCP47 tag. |
String | p_sEncoding | The name of the encoding of the file. The name must be an IANA charset name. This parameter is used only if the filter cannot detect automatically the encoding of the file. |
True if the file is opened successfully, false if an error occurs.
Use this method to open the input file to process. You must call
IFilter.CloseInput
after the file has been processed. To process a string,
use the IFilter.OpenInputString
method.
See IFilter.ReadItem
for an example.
Set a given string as the input to process.
String | p_sInput | String to process. |
String | p_sLanguage | The code of the language to process. The code must be a BCP47 tag. |
String | p_sEncoding | The name of the encoding of the file. The name must be an IANA charset name. This parameter is used only if the filter cannot detect automatically the encoding of the file. |
Integer64 | p_lOffsetInFile | Position in an input file where the string starts. |
True if the string to set sucessfully, false if an error occurs.
Use this method to set the string to process. Call
IFilter.CloseOutput
after the string has been processed. To process
a file, use the IFilter.OpenInputFile
method.
Resets the input.
Use this method to to reset the current input for re-processing,
without re-calling IFilter.OpenInputFile
or IFilter.OpenInputString
.
You do not need to call this method if you have just open the input.
Gets the last ID used for an item.
The last item ID used.
Sets the last ID for an item.
Integer | p_nID | Value of the last item ID. |
Reads the next item of the input file or input string.
The type of the item processed. The value is one of the
Filter.FilterItemType
values.
Use this method to read the next ietm in the current input.
You must call IFilter.OpenInputFile
, IFilter.OpenInputString
,
or IFilter.ResetInput
prior call this method a first time.
IFilter.GetItem
method to access the different data that have been
parsed from the input. The data available depend on the type of item
returned (see Filter.FilterItemType
for details).myLog.BeginProcess("Test extraction"); myFilter.OpenInputFile("myFile.xyz", "en-us", "windows-1252"); int nRes; do { nRes = myFilter.ReadItem(); switch ( nRes ) { case FiterItemType.ERROR: myLog.Error("An error has occurred."); continue; case FiterItemType.USERCANCEL: myLog.Message("Operation cancelled."); continue; case FiterItemType.ENDINPUT: continue; // Normal end case FilterItemType.TEXT: case FilterItemType.STARTGROUP: case FilterItemType.ENDGROUP: case FilterItemType.BINARY: // Do something with the item ... break; } } while ( nRes > FiterItemType.ENDINPUT ); myFilter.CloseInput(); myLog.EndProcess(null);
Closes the current input.
If you are also using the output of the filter, you must call IFilter.CloseInput
after IFilter.CloseOutput
as some filters may need to have some input information
in order to generate the output.
Sets the options for the output. Call this method before calling IFilter.OpenOutputFile
.
String | p_sLanguage | The code of the language for the output. The code must be a BCP47 tag. |
String | p_sEncoding | The name of the encoding of the file. The name must be an IANA charset name. This parameter is used only if the filter cannot detect automatically the encoding of the file. |
True if the options are set successfully, false if an error occurs.
Use this method to specify the language and encoding of the output.
If they are both the same as for the input, you do not have to call this method.If you call this method, you must call it before callingIFilter.OpenOutputFile
or IFilter.OpenOutputString
.Sets the layer information for the output.
Integer | p_nLayer | Type of layer to use in the output. The value must be
one of the Filter.FilterOutputLayer values. |
String | p_sStartDocument | Codes to place a the start of the layered output. |
String | p_sEndDocument | Codes to place at the end of the layered output. |
String | p_sStartCode | Codes to place before runs of external codes. |
String | p_sEndCode | Codes to place after runs of external codes. |
String | p_sStartInline | Codes to lace before runs of internal codes. |
String | p_sEndInline | Codes to place after runs of internal codes. |
String | p_sStartText | [Reserved for future use] |
String | p_sEndText | [Reserved for future use] |
True if the data are set successfully, false if an error occurs.
This method should be called before you call
IFilter.OpenOutputFile
or IFilter.OpenOutputString
. You do not
need to call this method if the output has no extra layer.
Gets the parameters of the current output layer.
Integer | p_nLayer | Type of layer to use in the output. |
String | p_sStartDocument | Codes to place a the start of the layered output. |
String | p_sEndDocument | Codes to place at the end of the layered output. |
String | p_sStartCode | Codes to place before runs of external codes. |
String | p_sEndCode | Codes to place after runs of external codes. |
String | p_sStartInline | Codes to lace before runs of internal codes. |
String | p_sEndInline | Codes to place after runs of internal codes. |
String | p_sStartText | [Reserved for future use] |
String | p_sEndText | [Reserved for future use] |
Creates the output file. Call IFilter.SetOutputOptions
before calling this method.
String | p_sPath | Full path of the file to create. |
True if the file was created successfully, false otherwise.
Use this method to open an output file. The path of the output file can be the same as the input file.
If you want to use an encoding or a language in the output that is different from the input encoding or language, you must callIFilter.SetOutputOptions
before calling this method.If you use a layered output, you must call IFilter.UseOutputLayer
before calling this method (and after calling IFilter.SetOutputOptions
).
Note that some filters may generate the actual final output only when
IFilter.CloseOutput
is called, at the end of the process.See IFilter.WriteItem
for an example.
Creates an output string.
Use this method to open an output string. You can open an output string only when working with an input string.
If you want to use an encoding or a language in the output that is different from the input encoding or language, you must callIFilter.SetOutputOptions
before calling this method.Sets the directory for the ancillary data.
String | p_sInputRoot | Root directory of the input files. |
String | p_sAncillaryRoot | Root directory for the ancillary output. |
Generates the ancillary data.
String | p_sId | (Output) [TBD] |
String | p_sType | (Output) [TBD] |
String | p_sPath | (Output) [TBD] |
Use this method to generate the ancillary data for a BINARY item.
Writes the last item read.
If you call this method, you must call it before the next call to the
IFilter.ReadItem
or IFilter.CloseInput
methods.
myFilter.OpenInputFile("myFile.xyz", "en-us", "windows-1252"); myFilter.OpenOutputFile("myFile.out.xyz"); int nRes; do { nRes = myFilter.ReadItem(); switch ( nRes ) { case FiterItemType.ERROR: myLog.Error("An error has occurred."); continue; case FiterItemType.USERCANCEL: myLog.Message("Operation cancelled."); continue; case FiterItemType.ENDINPUT: continue; // Normal end case FilterItemType.TEXT: case FilterItemType.STARTGROUP: case FilterItemType.ENDGROUP: case FilterItemType.BINARY: // Do something with the item ... break; } myFilter.WriteItem(); } while ( nRes > FiterItemType.ENDINPUT ); myFilter.CloseOutput(); myFilter.CloseInput();
Closes the current output.
The output string if the output was a string or if the output
was a file.
Use this method to close the last output you opened.
The method will return if the output was opened as a file
(using
IFilter.OpenOutputFile
). It will return the actual output string
if the output was opened as a string (using IFilter.OpenOutputString
).
You must call IFilter.CloseOutput
before calling IFilter.CloseInput
, as some filters
may need to have some input information in order to generate the output.