TF-IDF (Term Frequency-Inverse Document Frequency) is a numerical statistic that reflects how important a word is to a document in a collection of documents.
TF (Term Frequency): How often a term appears in a document.
Formula: TF = (Number of times term appears in document) / (Total number of terms in document)
IDF (Inverse Document Frequency): How rare or common a term is across all documents.
Formula: IDF = log(Total number of documents / Number of documents containing the term)
TF-IDF: The product of TF and IDF.
Formula: TF-IDF = TF × IDF