TF-IDF - Term Frequency-Inverse Document Frequency

{excerpt}Is a weight measure often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.{excerpt} In other words if a term/word appears lots in a document but also appears lots in the corpus/collection as a whole it will get a lower score. An example of this would be “the”, “and”, “it” but depending on your source material it maybe other words that are very common to the source matter.