Lexer plugins API

Lexers are text processors. They can modify the text loaded from documents before it is processed by word broker & matcher.

This type of plugin can be used to implement text decyphers, translators, segmentators (for Chinese for example).

Location

Usually the lexers are installed in c:\program files\integra\plugins\lexers.

API

1. Plugin instance construction and initialization

 void* Constructor(void)

This procedure is called once per search engine session. Its main purpose is to load all necessary DLL, read configuration files and so on.

It returns the pointer to plugin object which is used in all subsequent calls as This argument.

2.  Plugin instance destruction

void Destructor( void *This )

It frees resources allocated by plugin instance during Constructor call.

This procedure is called on search engine termination.

3. Plugin features and options retrieval

const wchar_t* GetSolarixPluginProperty( void *This, int iProp, int iSub )

Search engine determines the features and characteristics of the plugin using this procedure. iProp and iSub are required property id and sub-id. String value of property is returned if possible. If property is not supported just returns NULL.

There is some minimal set of required properties:

iProp iSub Meaning
0 Must be "lexer_plugin" for lexer plugins
1 Plugin human readable name, e.q. "Chinese segmentator"
2 Copyright string
3 Internal name for usage in -preprocess command

Future vesions of search engine may acquire another prorerties. Return NULL if you don't know the meaning of the acquired property.

4. Text processing

wchar_t* Process( void *This, IGrammarEngine *IGrammarEnginePtr, const wchar_t *OriginalText, const wchar_t *Options )

This procedure does all necessary actions over original text OriginalText using arbitrary options string Options. On success it return the pointer to new text buffer.

The search engine uses Free API call in order to free this memory block.

IGrammarEngine is a pointer to Grammar Engine interface.

5. Memory block deallocation

void Free( void *This, wchar_t *Ptr )

The search engine calls this procedure to free the memory block after Process call.

Samples

Source code for plugins is available as part of search engine source code.

LanguageFilter (\lem\demo\ai\solarix\search\lexer_plugin\LanguageFilter) removes improper characters from text. Specified language character set is used to filter the content. Also this plugin filters the words by using  IGrammarEngine interface and accessing the search engine dictionary.


SourceForge.net Logo BerliOS Developer Logo