Stemmer is used by search engine as simplified morphology analyzer. Integra and FAIND distributives include a multilingual stemming engine. It implements Russian, English, Spanish, Finnish, French, Italian and some other languages stemming. You can replace it with your own stemmer plugin very easily.
API is extremely simple. Stemmer DLL must export the following C procedures:
1. Stemmer initialization and creation.
HSTEMMER sol_CreateStemmerForLanguage( const char *lang2 )
lang2 is a 2-char language identifier, "en" for English, "ru" for Russian, "de" for German and so on.
Return value is a stemmer object handle (pointer) which is used by subsequent API calls.
2. Stemmer object cleanup and destruction.
void sol_DeleteStemmer( HSTEMMER hStemmer )
3. Stemming the word
int sol_Stem( HSTEMMER hStemmer, wchar_t *Word )
Word is UNICODE (wide) string containing the single word.
If possible the stemmer truncates the Word buffer to word's stem and returns 0.
If errors occurs it returns -1 or -2.
Stemmer is an optional dictionary module. The path to module file is specified in dictionary.xml file which is usually stored in c:\program files\integra.
The XML entry <stemmer>...</stemmer> contains something like dictionary\empty\stemmer.dll that is relative path to stemmer DLL. Replace it to your stemmer DLL
filename and restart the search system.
Lemmatizator API (ru)
Grammatical dictionary API (ru)
Morphology analyzer API (ru)
Syntax analyzer API (ru)
|© Mental Computing 2009|