{excerpt}Is a weight measure often used in information retrieval and text
mining. This weight is a statistical measure used to evaluate how important
a word is to a document in a collection or corpus. The importance increases
proportionally to the number of times a word appears in the document but is
offset by the frequency of the word in the corpus.{excerpt} In other words
if a term/word appears lots in a document but also appears lots in the
corpus/collection as a whole it will get a lower score. An example of this
would be “the”, “and”, “it” but depending on your source material it maybe
other words that are very common to the source matter.
See Also:
- http://en.wikipedia.org/wiki/Tf%E2%80%93idf
- http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html