A study on the evaluation of tokenizer performance in natural language processing
A study on the evaluation of tokenizer performance in natural language processing
Blog Article
The present study aims to compare and analyze the performance of two tokenizers, Mecab-Ko and SentencePiece, in the context of natural language processing for sentiment analysis.The study adopts a comparative approach, employing five algorithms - Naive Bayes (NB), k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Artificial Neural Networks (ANN), and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) - to read more evaluate the performance of each tokenizer.The performance was assessed based on four widely used metrics in the field, accuracy, precision, recall, and F1-score.The results indicated that SentencePiece performed better than Mecab-Ko.To ensure the validity of the results, click here paired t-tests were conducted on the evaluation outcomes.
The study concludes that SentencePiece demonstrated superior classification performance, especially in the context of ANN and LSTM-RNN, when used to interpret customer sentiment based on Korean online reviews.Furthermore, SentencePiece can assign specific meanings to short words or jargon commonly used in product evaluations but not defined beforehand.