5 Trends in NLP to Watch

Natural Language Processing (NLP) is a subfield of Artificial Intelligence that deals with computers' understanding of human language. It is by far one of the most researched and applied topics in the field of AI. The rise in the adoption of AI across industries has contributed to the rising success of NLP in recent years, including an exponential increase in the availability of text data and increased research and success with Deep Learning algorithms.

The Association of Computer Linguistics (ACL), a leading research organization in the field of Natural language processing, has seen a huge increase in the number of papers submitted at their annual conference (see chart below).

Source: https://acl2019pcblog.fileli.unipi.it/?p=156; click to enlarge — *Source:* *https://acl2019pcblog.fileli.unipi.it/?p=156; click to enlarge*

As the research community is continuously advancing the field of NLP through new breakthroughs, there has been a rise in the enterprise adoption of NLP at the same time. Companies across all industry sectors have increasingly started adopting NLP technologies like chatbots, document parsers, etc to improve their operational efficiency and customer experience.

While there is the adoption of NLP in areas such as information discovery and retrieval, report summarization, document parsing, etc., the area where NLP applications are on the rise is in Customer Service. Chatbots and smart assistants can easily solve trivial and repetitive tasks, acting as the first layer of support for customer service and passing on the more complex tasks to humans. This allows scaling up customer service reducing operating costs and increasing customer satisfaction.

As interest from the research community and across industries continues to grow, here’s a few trends shaping the future of NLP:

5 trends in NLP to watch

1. Rise in data engineering pipelines for text-data

The rise in social media has led to the exponential growth of text data and the rise of NLP applications built on top of it. While more data is almost always preferred when it comes to artificial intelligence, huge volumes of data, especially coming from democratized sources, pose significant challenges to parse and extract information. Successful industry applications for NLP will require pipelines to extract and process this huge volume of data, to separate it from noise and extract meaningful insights. As the interest in NLP grows, there will be a big need for tools to process text data. There are tons of startups coming up in this space as well as the big cloud providers like Microsoft, Amazon, and Google offering end-to-end data pipelines to solve NLP tasks.

2. Focus on solutions for Data labeling and annotation

Today’s successful NLP applications have been mostly where there are huge volumes of labeled data available. However, getting huge volumes of labeled data is an extremely hard and laborious process that most industries cannot afford. There are 2 major trends on the rise to solve this issue:

Combining humans and machines to crowdsource data annotation

The most common approach to solving the problem of lack of data labels is using machine intelligence to crowd-source the task of data labeling to humans and building machine learning systems on top of it. These systems are also built with assistive labeling capabilities to teach and validate the quality of the results produced by human labelers. There are platforms like DefinedCrowd and Amazon Sagemaker Ground Truth that provide these services in the market today.

Unsupervised and Semi-supervised learning techniques for data annotation

The other alternative which is the focus of the NLP research community is leveraging unsupervised and semi-supervised learning techniques to improve data labeling and annotation. There is an increased focus in research combining the techniques of active learning, meta learning and semi-supervised learning techniques to detect and synthesize good vs bad labels and at the same time, focus on generating synthetic labeled data to augment the original labels.

3. Increased adoption of Pre-trained models

Given that NLP requires huge volumes of data, the companies that are in abundance of data volumes have started building pre-trained models such as BERT and Word2Vec from Google, as well as FastText from Facebook, which can be tweaked for various industry applications These pre-trained models have seen huge success, allowing companies to quickly build NLP applications without having to collect and process millions of rows of data. The rising trend in pre-trained models has increasingly been in word-embeddings that can form a foundation for almost any type of NLP application.

4. Shift in solving higher-order NLP tasks

5 years ago, NLP primarily solved simple tasks such as sentiment analysis and word-level semantics. Today, NLP can solve higher-order tasks such as Question Answering systems and Dialogue systems that cannot just understand the semantics of the words but can also understand the context and meaning of the sentences and paragraphs.

The focus of NLP today has turned to NLU (Natural Language Understanding) and NLG (Natural Language Generation) systems. The shift towards solving higher-order tasks has been made possible due to success in multiple areas:

Off-the-shelf pre-trained language models for simpler tasks
GPU technology acceleration allowing for faster computing
Increased size in deep learning models
Success in Deep Learning models for few-shots or one-shot learning

The latest breakthrough has been OpenAI’s GPT-3 which made waves in the NLP community for its shockingly accurate results and a strong understanding of text data. As research teams continue to focus on these higher-order tasks, we will continue to see more breakthroughs like GPT that can widen the scope of NLP applications.

5. Focus on ethical implications of NLP and AI

With the rise in adoption of AI and NLP, there has been an increased focus on the ethical concerns surrounding AI and NLP. Even with the state-of-the-art NLP models like OpenAI’s GPT-3, there is evidence of inherent unintended bias in existing NLP systems. Adding to that, NLP has also been used in negative contexts, such as to generate fake news and information. These issues have led the AI community to increase our focus on the transparency of AI models to allow researchers and companies to better understand the functioning of the AI systems and develop allow better decision making that will truly benefit the society overall.

Conclusion

NLP has made huge strides in the last few years and the recent success in both the research community and industry has allowed increased focus and investment in the field. There has been a shift in NLP & machine learning, from simple applications towards high-level tasks, such as extracting the context and meaning of language. Companies are investing in data and building end-to-end pipelines to ensure success with NLP. While there is still a lot of work needed around its societal and ethical implications of AI, the recent advances in the field show the promise of a brighter future with NLP.

About the authors

Ganesh Prasad is a manager in the Artificial Intelligence division of Mastercard responsible for driving the adoption of AI across various parts of the company. Ganesh holds a bachelor's degree in Information Science from R.V. College of Engineering, India, and a Master’s degree specializing in Data Analytics from Carnegie Mellon University. Prior to his current role, Ganesh has worked as a Software Engineer at Cisco and a Data Scientist at Mastercard. With his education and experience, Ganesh brings together the principles of Software Engineering, Data Science, and Business knowledge to lead the design and build of AI products and services.

Aakriti Srikanth holds a BS & MS in Computer Science with a specialization in AI, and was one of the founding AI product managers at IBM Watson/Red Hat (acquired by IBM), where she worked for 5 years. She started her career at the Wall Street firm D.E.Shaw & Co. She has created a VC Executive Council with Microsoft for Startups and has been introducing startup founders to firms like General Catalyst, NEA, Lightspeed, Data Collective, Norwest Venture Partners and a16z. In the past, she has advised AI startups such as Vectice (founded by Sequoia Capital & NEA Entrepreneurs), H2O.ai ($146M funded by Goldman Sachs), Adeptmind and CyCognito ($23M led by Lightspeed). She is on the Forbes 30 Under 30 List and was one of the 4 finalists for "VentureBeat Women in AI Awards 2020.”

Aakriti Srikanth, Ganesh PrasadAakriti SrikanthAugust 12, 2020NLP, Natural Language ProcessingComment