You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an application generator for conflation algorithms in perl language.
This package includes the following:
- application generator for stemmers
* stemgen.pl - main perl routine
* apgen.pm - application generator class
* corpus.pm - corpus parsing class
* ngram.pm - utility class supporting n-gram specific operations
* porterd.pm - class implementing porter algorithm (developed version)
* sv.pm - utility class supporting successof variety specific operations
* util.pm - generic utlity class
* tempstem.pm - stemmer template used by the application generator
-o r - to run a stemmer
supported algorithms:
* porterd - developed porter
* and any other algorithm generated by using -g option
-w file with words to generate stem for
-s word to generate stem for
-d dictioary name (valid for dictionary type of stemmers)
-c corpus name (valid for n-gram, successor variety)
-t dice thrshold of ngram (valid for n-gram type of stemmers)
-u successor variety cutoff (valid for successor variety type of stemmers)
-o g - to generate a stemmer from a rule file
-k type of the stemmer:
n - n-gram type algorithm
s - successor variety type algorithm
a - affix removal type algorithm
d - dictionary based algorithm
-r rule_file with .rl extension
rule files in this package:
* porter
* lovins
* kstem
* sremoval
* paice
* succvar (successor variety)
-o p - to parse a corpus file
-p output file (valid for parsing a corpus)
Examples:
-- to generate a stemmer:
perl stemgen.pl -o g -a porter -k a
perl stemgen.pl -o g -a lovins -k a
perl stemgen.pl -o g -a sremoval -k a
perl stemgen.pl -o g -a paice -k a
perl stemgen.pl -o g -a kstem -k d
perl stemgen.pl -o g -a succvar -k s
-- to run a stemmer:
perl stemgen.pl -o r -a porterd -w w.txt
perl porter_gen.pl -o r -a porter -s greatings
perl paice_gen.pl -o r -a paice -w w.txt
perl lovins_gen.pl -o r -a lovins -w w.txt
perl sremoval_gen.pl -o r -a sremoval -w w.txt
perl kstem_gen.pl -o r -a kstem -d usd.txt -w w.txt
perl succvar_gen.pl -o r -a succvar -c corpus.txt -w -w.txt
-- to parse a corpus file:
perl stemgen.pl -o p -c unprocessed_corpus.txt -p c.txt
About
This is an application generator for conflation algorithms in Perl language.