Text Analytics on Customer Reviews

When the input data is text, such as customer reviews, we cannot use the quantitative techniques without prior coding of the text.

To experiment with text analytics, you can use a dataset such as this: https://www.kaggle.com/daishinkan002/men-women-shoes-reviews containing product reviews (of shoes). In this dataset, the variable “reviews” contains text from up to 10 reviews for each product – so not all reviews are included here. They are combined to one string, joined by “||”, so if you are interested in each separate review you must split them before your analysis.

There are many different approaches in the field called Natural Language Processing (NLP) aimed at “understanding” the text as a human language. This is very complex, but there are some relatively simple analytics we can apply to make use of the text in the reviews.

One simple way is to look for, and perhaps count, certain keywords. If, for example, you are interested in how the quality of the products are highlighted in reviews, you could count the number of times words such as “quality”, “durability” or “broke” are mentioned. This forms a numeric representation of the quality aspect of the reviews.

Another useful technique is called sentiment analysis. This means associating certain words (or combination of words) with either positive or negative sentiments. This can be achieved by training a classifier yourself, where you assign scores (positive/negative) to words. You can also use a pre-trained classifier in your favourite toolkit. These are often very good for generic texts, but can fail you if you are analysing a language other than English, or with many domain-specific words. In Python you can check out NLTK, https://realpython.com/python-nltk-sentiment-analysis/#using-nltks-pre-trained-sentiment-analyzer.

The “Bag of words” technique looks at the frequency of (all) words in the text and can use that as a multidimensional representation of the text. This total frequency can be compared to other texts, and be used to classify texts either based on supervised categories or through cluster analysis. Check out this tutorial: https://www.mygreatlearning.com/blog/bag-of-words/

In the above mentioned dataset there are also other variables, such as price and rating. The outcome of the text analytics and processing can thus feed into a quantitative analysis, for example looking for links between ratings and certain types of reviews.

A visual and qualitative analysis of text is word clouds, which you have probably seen. They are visual representations of the words found in a text. The higher frequency (more occurrences), the larger the size. To make the cloud a good representation, it is normally a good idea to apply a “stemmer” (see NLTK in Python) which makes sure “car” and “cars” are not treated as two different words (if that is in line with your analysis). It is also good to remove “stop words” such as “a”, “the”, “it”. Finally, the cloud can probably only handle the top 20-50 words. Give it a try! There are many word cloud generators on the web. You can also use tools such as the Python package wordcloud.

The human mind is great at understanding texts. But when we design a structured analysis and method or model, we must forget all the interpretations we do when we read a text. Try explaining what to look for in the review texts to someone who doesn’t speak English. And the contrary, what would you need to know to be able to analyse reviews in a language you do not speak? Have a look at these reviews in Swedish (and do not use Google Translate..): https://www.clasohlson.com/se/Xiaomi-Roborock-S6-Pure,-robotdammsugare/p/44-4376-2#collapsible__product__reviews

Lots of text on the topic of text analytics.. Now go perform some sort of analysis on text! What ideas have this brief given you?

Leave a Reply

Your email address will not be published. Required fields are marked *

Please reload

Please Wait