A study on the evaluation of tokenizer performance in natural language processing

Blog Article

The present study aims to compare and analyze the performance of two tokenizers, Mecab-Ko and SentencePiece, in the context of natural language processing for sentiment analysis.The study adopts a comparative approach, employing five algorithms - Naive Bayes (NB), k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Artificial Neural Networks (ANN), and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) - to read more evaluate the performance of each tokenizer.The performance was assessed based on four widely used metrics in the field, accuracy, precision, recall, and F1-score.The results indicated that SentencePiece performed better than Mecab-Ko.To ensure the validity of the results, click here paired t-tests were conducted on the evaluation outcomes.

The study concludes that SentencePiece demonstrated superior classification performance, especially in the context of ANN and LSTM-RNN, when used to interpret customer sentiment based on Korean online reviews.Furthermore, SentencePiece can assign specific meanings to short words or jargon commonly used in product evaluations but not defined beforehand.

Report this page

A STUDY ON THE EVALUATION OF TOKENIZER PERFORMANCE IN NATURAL LANGUAGE PROCESSING

A study on the evaluation of tokenizer performance in natural language processing

A study on the evaluation of tokenizer performance in natural language processing

Blog Article

Comments

Unique visitors

Report page

Contact Us