We Asked An Analytics Consultant More About Processing Customer Reviews

Customer reviews are the best, aren’t they? They help shed light on things that businesses are good at and what businesses can improve. It’s a signal to other potential customers about how good a business is at serving its customers. Or the exact opposite because some customers are the worst.

In today’s world, customer reviews are abundant to the point whereby vast amounts of streaming unstructured data need to be transported, integrated, stored, processed, analysed and visualised in real-time to provide actionable sentiment insights.

This was exactly what Daniel Wrigley, the Lead Search & Analytics Consultant of SHI GmbH, spoke about in his talk “Actionable Insights with Real-time Streaming Analytics of Customer Reviews” at Big Data Conference Europe 2019.

With his experience in search and big data applications, modern open-source projects and even co-writing a book about SOLR, his talk touched on several requirements of a real-time streaming analytics platform and the reference architecture consisting of open-source software components like Beats, Kafka, Logstash, Elasticsearch, Spark, Zeppelin and Kibana.

You can check out the video of his talk for your reference.

Watching his talk, we had plenty of questions for him, as shown below.


[Iunera] You had a lot of reasons to use the tools you used for data transport, integration and storage like Beats, Kafka, Logstash and Elasticsearch. Have you considered using other tools for these functions? What if other tools met the requirements you put forth to get insights from the unstructured data?

[Daniel] Excellent question! Nowadays, there are a lot of awesome tools and frameworks to work with data and each one has its right to exist. We try to use the tools which are best suited for the task within implementing a business case.

Last year, I used the tools you mentioned to showcase how machine learning can be applied to text in a real-time streaming use case. This year I used a combination of open-source tools and Artificial Intelligence cloud services in my talk at the Big Data Conference because they were a better fit in this scenario.

Each tool comes with advantages and disadvantages. All of these need to be taken into account when choosing the tools for any data-driven project.

Oh look, we have an article which features a comparison between Kafka, Spark and Flink, tools for real-time streaming data processing.

[Iunera] How do you know if the reviews are truthful? What if there are some reviewers who give bad reviews to sabotage the reputation of their rival’s brand or give good reviews to boost the reputation of their friend’s brand? Can the method also be used to detect spammy comments?

[Daniel] A naïve way of making sure that the reviews are honest reviews is to compare them with other reviews. Simply put, if a dozen reviewers state that this phone case fits on a Samsung Galaxy S7 and one reviewer states that the product does not fit at all, then this can be an indication for a crooked review.

You can also double-check reviews where a positive rating is given with regard to the given number (e.g. 5 out of 5 stars) but the accompanying text has a negative tone. This can be identified by applying machine learning.

bad rating for Sala
Does the rating match the text?

You could also match reviews against product features to identify whether the review actually covers the product.

These are only a few examples of what can possibly be done to identify fake reviews and comments in your e-commerce platform.


[Iunera] Do you think that reviews can be paid for by the brand itself without mentioning that these reviews are sponsored?

[Daniel] Unfortunately, this is being done to promote products. A recent Fakespot study indicates that up to 42 per cent of reviews written this year between March and September on Amazon might be fake. This number is, in my opinion, incredibly high and clearly shows that there is an issue and e-commerce managers need strategies to encourage honest reviews and identify fake ones. Machine learning can help to tackle this task.


[Iunera] Do you think there’s a possibility that unhappy customers who want to give negative reviews are silenced by the brand through blackmail?

[Daniel] Well, my belief is that clever companies and brands take their customer’s feedback and use it to improve their products and services – no matter how good or bad that feedback might be. This kind of customer interaction is a great way of finding out what needs to be done to perform better. And you don’t even have to pay for it! Brands should not blackmail customers that give negative reviews. In my opinion, brands should reward customers that give feedback – be that good or bad.


[Iunera] Do you think that the brands actually listen to their customers and make improvements based on the reviews/feedback? Or do they just pretend to be open to feedback but not act on it?

[Daniel] Yes, I absolutely think that there are brands that use and leverage this feedback. In my opinion, listening to customers and acting on this feedback will be one of the key differentiators in digital commerce in the very near future.


[Iunera] For the e-commerce streaming analytics use cases in your presentation, what are “bad” reviewers and how can they be rewarded?

[Daniel] In my simplified demonstration, “bad” reviewers are persons that judge the products they bought negatively.

How they can be rewarded depends on the situation. Gift cards or discounts are the most basic rewards virtually any online shop can offer.

If the review is well-written, you could also offer these reviewers a job as content producers to promote your products or services.

Although negative reviews may add to your credibility (no one believes your shop if there are only positive reviews), you do not want too many negative reviews.

Giving out rewards should be used to reduce the number of negative reviews. In general, bad reviews should be taken seriously, though. There is no clearer indication that something is wrong with the goods or services you offer.

negative review on niche vegan kitchen
A clever business would genuinely accept negative feedback like this one and make the necessary improvements.

[Iunera] SOLR and Elasticsearch have often been compared with one another. I noticed that you’ve written a book about SOLR but used Elasticsearch in your streaming analytics demo. Surely, you would know the similarities and differences very well, so what are they from your experience?

[Daniel] One could write several hundred pages to compare Solr and Elasticsearch. That makes answering this question in an interview a tricky task!

Since both are based on Apache Lucene, they share the same technological foundation and they can both be used to build powerful applications and platforms.

One key difference is the ecosystem around them.

The development of Elasticsearch is driven by the company elastic. They provide tools around Elasticsearch to support its main use cases like log analytics extremely well since these tools are tailored for Elasticsearch as the central search engine.

Solr’s main use cases usually are more search-driven and less analytics-driven: Search in a digital commerce setting or an enterprise setting.

Since my use case wasn’t search-driven and tools from the Elastic stack, Beats and Logstash were a good fit for my demo, I chose Elasticsearch over Solr.


[Iunera] The main (and obvious) benefit of real-time customer analytics is that it cuts the time of gathering and analysing the data and we all know that time is money. Besides that, what are the lesser-known benefits of real-time customer analytics?

[Daniel] Doing analytics in real-time usually means acting on relatively small data very fast.

Some use cases like fraud detection in credit card transactions or anomaly detection in your log data to prevent machines from going down are pointless if the analysing part cannot be done within a couple of seconds – at most!

I would go a step further and say doing analysis in real-time is a requirement for certain use cases. If it isn’t a requirement then the only advantage really is cutting time, but this is a big one.

Nowadays, you expect answers to questions or information needs quickly to act fast. Not acting fast means losing profit or missing opportunities and this may be the key differentiator between you and your competition.


[Iunera] Why does the data need to be processed so fast? Are there special real-time business scenarios which can use the reviews of one customer directly for another customer in milliseconds?

[Daniel] Acting on customer reviews in real-time helps you to publish those reviews that are looking good immediately which helps users in their buying process. As the business owner, you can focus on those reviews that are not looking good for whatever reason and act on these very fast.

The focus of my talk was to show how a real-time streaming architecture can look like and which components in the open source world can help – especially when working with unstructured data. Taking this approach and placing it in mission critical scenarios like fraud detection in the financial sector is absolutely possible.


[Iunera] For which other applications can ratings and similar items be used too?

[Daniel] Personalisation, recommendations and tuning your search engine are three examples of use cases that may profit from ratings and customer reviews. In any of these use cases, I would not rely on ratings and reviews only since these explicit signals come with downsides. But altogether, they are a valuable source of data for improving the use cases.


[Iunera] Can there be concept drifts which render the models invalid, and if they exist, how can they be detected?

[Daniel] The machine learning model is trained on language. When language changes – and this happens constantly – models will become out of date.

Just think about the word “cool” and the slang forms “c00l”, “k00l “or “kewl”. A couple of years ago no one used these alternative forms.

I don’t want to state that everyone uses slang in reviews today. But language changes and this happens in reviews as well.

Training new and updated models with new data is necessary for the model to stay up to date.

Apart from that, the usual challenges from machine learning scenarios apply here as well, of course. Beware of bias in your data, try to avoid overfitting your model etc.

Concept drifts are applicable in processing customer reviews.

[Iunera] What other challenges have you stumbled upon in real-time customer analytics?

[Daniel] Reviews are only a very small part of digital customer interactions with an online platform. Thus, it is only one small part you can use to understand your customers and identify their intent at any given time.

Bringing all the data together and acting on it in one personalised connected experience across all touchpoints is the ultimate goal and there are always many challenges in the way.

Since any data-driven project relies on data, I would probably say that breaking up the silos in which the data lies and getting the data are often the first challenges and the main challenges we meet as every following step relies on this.

Categories: