How can I plot the frequency of word?
LindsayKelevra
MemberPosts:5Newbie
Hello everyone!
I'm trying to use the operator Generate Gaussian in order to plot the frequency of words, but comparing my results (calculated manually) with them they're really different. I need this operation to understand which values to discard through the pruning. What's the formula that RapidMiner uses to create the Gaussian?
Thank you.
Tagged:
0
Answers
I am also not clear how conformity to a hypothetically pure statistical distribution affects pruning. You might be better off simply setting pruning thresholds by frequency or by percentage at a few different levels and seeing what words are dropped as a consequence. Typically having a lot of words with only a handful of occurrences does nothing at all for model performance but can lead to large datasets and long runtimes.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts