NLP and Text Mining: A natural fit for business growth
NLP and text mining have grown together in recent years.
The value of NLP in combination with text mining for business growth has become too hard to ignore.
Today I'll explain why Natural Language Processing (NLP) has become so popular in the context of Text Mining and in what ways deploying it can grow your business.
Before we get started, let's define both terms:
Text Analysis (a.k.a Text Mining) definition: it's the process of understanding and sorting text, making it easier to manage. Text analysis could possibly be the last piece of the puzzle of growth every business is trying to solve. After all, in the information-saturated era we live in, what can be of more value than the organising of this information in a structured and meaningful way that we humans can understand.
Natural language processing (NLP) definition: it's a subfield of artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to understand, interpret and manipulate human language.
In this article, we will walk through a business case. We'll look at all the solutions and compare them, so that you can see why NLP takes text mining to the next level.
NLP text mining: Understanding customer support tickets
Tom is the Head of Customer Support at a successful product-based, mid-sized company. Tom works really hard to meet customer expectation and has successfully managed to increase the NPS scores in the last quarter. His product has a high rate of customer loyalty in a market filled with competent competitors. Things are going well in Tom’s perspective.
But suddenly, he starts to notice a higher volume of support tickets. Tom is really worried because he can't view each ticket manually to be sure what's caused the sudden spike.
He needs to understand the voice of his customer
At first, he goes the laborious route.
He decides to hire a data analyst. The analyst sifts through 1,000s of support tickets, manually tagging each one over the next month to try to identify a trend between them.
After about a month of thorough data research, the analyst comes up with a final report bringing out several aspects of grievances the customers had about the product. Relying on this report Tom goes to his product team and asks them to make these changes.
Afterwards, Tom sees an immediate decrease in the number of customer tickets. But those numbers are still below the level of expectation Tom had for the amount of money invested.
He also has the following concerns:
- The process was slow. The data the support data and conversations internet was ever-expanding and evolving. The core drivers of customer contact surely changed during the time it took to do analysis. In any case, some issues were ad-hoc and needed solving much faster than the product team were able to. The whole delay meant more unhappy customers leaving to a competitor.
- The hired team wasn't capable of answering Tom’s dynamic queries about the data. They could only present the insight they came up with, any of Tom’s further queries could only be taken into account when processing the next batch of data.
E.g. How many people do not like a particular new aspect of the product.
NLP text analytics versus manual human work
In a quest for alternate solutions, Tom begins looking for systems that were capable of delivering quicker and could also cater to his changing needs/queries. It didn’t take long before Tom realized that the solution he was looking for had to be technical. Only leveraging computational power could help process hundreds of thousands of data units periodically and generate insights that he’s looking for in a short span of time.
Having realised that, Tom reaches out to a software consultancy company.
Thanks to technology, their solution:
- Measures topic volume: It statistically counts the number of times specific aspects (given to them by Tom) were being mentioned
- Does it rapidly: 100,000s of conversations can be analysed in minutes. This means as a customer ticket comes in, it can be labelled with a very detail topic and categorised. If a sudden spike happens (for example, if a batch of food being delivered all had broken packaging) Tom can be instantly notified. Allowing Tom to contact customers and offer an apology and a refund, getting ahead of negative sentiment and reducing resource requirment in his call centre.
- Visual representation: Now support chat analysis is done so quickly, Tom can log in to a dashboard an monitor the drivers of customer contact.
Text mining with NLP: Process behind the scenes
How text mining works
1) Word identification
Tom’s manual queries are treated as a problem of identifying a keyword from the text. So for example if Tom wants to find out the number of times someone talks about the price of the product, the software firm writes a program to search each review/text sequence for the term “price”.
The main principle being that if a word appears in text it can be assumed that this piece of text is “about” that particular word.
E.g. "I like the product but it comes at a high price."
2) Rule creation
This approach is closely linked to the former one. Both operate on the principle of pattern identification, but only predefined ones.
More often than not a text is not about just any particular word. For instance, in the example above ("I like the product but it comes at a high price"), the customer talks about their grievance of the high price they’re having to pay.
So there is an inherent need to identify phrases in the text as they seem to be more representative of the central complaint. These phrases are what is referred to as rules.
Any system that uses these pattern rules to mine aspects from the text are called rule-based systems and they have the following benefits:
- Can be easily understood by humans - marketing teams can come up with rules and pass them on to the software team to implement them.
E.g. Tom's Head of Marketing wanted to understand any grievances surrounding the size of product and so “product size” was used as a key phrase that was being monitored in the incoming data.
- Tweaking rules is fairly simple so the time is reduced.
These two principles have been the go-to text analytics methods for a long time. Most services in this domain are based majorly on creation of rules.
Rule creation has been a win for Tom:
- He gets insights from the massive abundance of data from customer support in a streamlined manner;
- He is able to monitor custom aspects that he believes to be affecting the product.
If text analytics is so good, why do we need NLP to be involved?
Like with any good story, there's a catch. A few months down the line, Tom sees similar trends in increasing tickets. He doesn’t understand, he’s already made iterations to the product based on his monitoring of customer feedback of prices, product quality and all aspects his team deemed to be important.
Worried about the growth of his company, Tom seeks advice from an NLP scientist - Sarah. After a brief conversation with Sarah, Tom realises he’s been getting it all wrong...
Why Natural Language Processing and text analytics work better together
In the context of Tom’s company, the incoming flow of data was high in volumes and the nature of this data was changing rapidly.
Rule-based methods lacked the robustness and flexibility to cater to the changing nature of this data.
Sarah further explains that although Tom was monitoring the data with respect to aspects he considered to be red flags (like pricing, size etc.), the red flags in the data were constantly changing and it’s almost impossible to move at the pace of the changing data using handcrafted rules.
The problems with text analytics:
- The mention of words didn’t really indicate the core topic of concern at times. Presence of high price doesn’t necessarily mean that the customer is complaining about it all the time.
E.g. “Really love the product since it’s so cheap compared to the alternate options that come at such a high price”.
- Multiple meaning of words were making it hard to create rules. People often express the same sentiment in multiple ways.
E.g. Good price – Awesome discount – Value for Money
- These words point towards the same sentiment but are merely different ways of expression. Taking all such occurrences into account becomes a tedious task and the inability to do so compromises the accuracy of the system
- Maintenance of rule set was becoming harder. There are only so many aspects we can think of, but these aspects might only be covering 15-20% of all the customer grievances. And the problem of multiple meanings indicated the need for a comprehensive list of sub-rules for each aspect.
- Computational time taken to process each review was increasing as the rules kept increasing. If we have 20 rules, that would mean each new review needs to be searched for those 20 rules. As the rule set increases in size the system starts to become computationally more complex and hence taking more time to generate insights.
Tom realises he was only seeing what he wanted in the data. He wasn’t really seeing what the data had to show.
Sarah advises that Tom works with an NLP-powered Customer Experience Analytics company and explain his problems to them. And Tom does so.
How machine learning in text analytics works
A deep-tech AI company uses the power of Machine Learning & Statistics through NLP. The central idea revolves around:
- A machine learning algorithm seeing previously manually categorised examples (training data) figures out rules of its own (extracted feature models) for categorising new examples. Also known as Supervised Machine Learning. Its beauty lies in the fact that we just feed it categorised examples and it learns to do everything on its own. Just like a human would after the job is explained to them.
- Highly efficient ways of representing words, where-in words aren’t treated as separate entities but as clouds of senses and hence solving the problem of multiple meanings of words. Academic research shows that text categorisation can achieve near-perfect accuracy using NLP. Deep Learning algorithms can be thought of as the next generation of machine learning algorithms that learn to do things even more smartly and can handle tasks much better than their ancestral machine learning algorithms.
The benefits of NLP text analytics
- Higher accuracy: across all automated tasks, ensuring the insights outputted are reliable and actionable.
- Reduced effort: No handcrafted rules, hence breaking Tom free of the manual effort and required brainstorming.
- The power to understand everything: Tom can now understand trends and other data aspects coming in from all the channels important to his company, such as Zendesk, social media comments or NPS surveys. All he needs to do is ask for it.
- Improved depth in the understanding: with the power of curated data, Tom also sees granular insights which better represent the strengths and shortcomings of his product and service. These insights help him quickly understand how to act on them so that he can strengthen the pillars which hold his product.
- Computational freedom: once trained, the models are lightweight and hence reduce the production load as compared to rule-based approaches.
- Time: Tom can now finally focus on things that matter, as he knows that the voice of his customers reach him transparently and not through his own coloured looking glass.
If there is anything you can take away from Tom's story, it is that you should never compromise on short term, traditional solutions, just because they seem like the safe approach. Being bold and trusting technology will definitely pay off both short and long time.
As most scientists would agree the dataset is often more important than the algorithm itself. We, at Sentisum, have mastered the use of deep learning models and curating your data to gain insights for our customers and we do the same for not one but multiple tasks like Sentiment Analysis, Keyword Extraction, and many others.