Our workshop at the upcoming London R community event on 17th February takes a musical turn as we explore how computers infer meaning from human language in the context of lyrics.
Led by Ascent Data Scientists Andrew Little and Daniel de Bortoli, this practical hands-on workshop aims to demonstrate a full text analysis workflow in R, in a musical context. The subject of NLP is a rapidly growing field in data science and in the last decade has become a key tool to drive insight in many sectors, such as healthcare and financial services. We caught up with Andrew ahead of the workshop to find out what to expect and why you should consider attending.
Hi Andrew! Tell us about yourself and the session you’re running.
I am a data scientist primarily coding in R, which I’ve been doing for a few years now. Daniel and I delivered a workshop at Ascent’s EARL conference in September 2021 – ‘Web Scraping and Text Mining Lyrics in R’ – and this workshop is loosely based on the same theme. I really enjoy the statistics and machine learning parts of data science as it reflects my Maths background.
What is NLP?
NLP stands for Natural Language Processing: a subfield of linguistics, computer science and artificial intelligence. It draws connections between computers and human language, giving machines the ability to parse, understand and infer meaning from the text or speech patterns. It’s a very wide field, with applications ranging from speech recognition to sentiment analysis.
Although this workshop is focussed on text-based NLP, to add further context, a typical use case could involve voice recognition, where audio is processed and automatically translated into text. This technology is part of what enables popular speech-to-text services such as Siri or Alexa, for instance.
Is there a growing trend for NLP applications?
The volume of language understood and managed on a daily basis continues to grow at an exponential rate. The market is expected to grow at a compound annual growth rate of about 27% over the next five years, resulting in about 230% total growth in the period as a whole. This represents a significant opportunity in terms of modelling for insights and decision-making.
Can you tell us some typical use cases of NLP in data science?
Common applications include creating chatbots that can enhance customer service without the need for human interaction, in addition to inferring the sentiment of customer reviews/ feedback/ survey responses, which can be done at scale with NLP algorithms. These algorithms can also highlight specific topics occurring frequently in the data. Another example is in processing or analysing social media posts. Twitter is often used here due to the volume and specificity of tweets, and the immediate availability of the data.
Your workshop application is related to music. How might data scientists relate this to everyday use cases?
Our workshop focuses on lyrics. Lyrics suffer many of the same pitfalls that exist for more standard text analysis applications, and therefore provide a fair representation of the common obstacles faced by an analyst. Common pitfalls include the existence of stop words, of unstemmed words and the need to generate numerical features from text data, which will all be covered in the workshop.
What are the key takeaways for the workshop?
Our primary goals are to showcase:
The main concepts behind NLP and their implementation in R
Popular text analysis algorithms
Their application in a machine learning context using a web-scraped dataset, with a focus on best-practice coding
The basics of tidymodels, RStudio’s brand new machine learning framework in R
If this sounds like a workshop you would like to attend, it’s free and fully supported with a vibrant R community for networking afterwards.
Find out more about LondonR and book your place.