Blog - 15 Feb 2022
3 minute read
Data Science & Analytics
R Community


LondonR is the UK’s premier R User group. Whether you’re R curious or an expert, we meet regularly to share and gain knowledge through workshops and presentations from the R community.

LondonR, data science. data and analytics, R community, r language, NLP, Tidy, Test Mining, natural language processing, applications





Workshop @LondonR - Tidy text classification in R: predicting musical genre from lyrics.

LondonR is the UK’s premier R User group. Whether you’re R curious or an expert, we meet regularly to share and gain knowledge through workshops and presentations from the R community.


Data Science & Analytics


Our workshop at the upcoming London R community event on 17th February takes a musical turn as we explore how computers infer meaning from human language in the context of lyrics.

Led by Ascent Data Scientists Andrew Little and Daniel de Bortoli, this practical hands-on workshop aims to demonstrate a full text analysis workflow in R, in a musical context. The subject of NLP is a rapidly growing field in data science and in the last decade has become a key tool to drive insight in many sectors, such as healthcare and financial services. We caught up with Andrew ahead of the workshop to find out what to expect and why you should consider attending.

Hi Andrew! Tell us about yourself and the session you’re running.

I am a data scientist primarily coding in R, which I’ve been doing for a few years now. Daniel and I delivered a workshop at Ascent’s EARL conference in September 2021 – ‘Web Scraping and Text Mining Lyrics in R’ – and this workshop is loosely based on the same theme. I really enjoy the statistics and machine learning parts of data science as it reflects my Maths background.

What is NLP?

NLP stands for Natural Language Processing: a subfield of linguistics, computer science and artificial intelligence. It draws connections between computers and human language, giving machines the ability to parse, understand and infer meaning from the text or speech patterns. It’s a very wide field, with applications ranging from speech recognition to sentiment analysis.

Although this workshop is focussed on text-based NLP, to add further context, a typical use case could involve voice recognition, where audio is processed and automatically translated into text. This technology is part of what enables popular speech-to-text services such as Siri or Alexa, for instance.

Is there a growing trend for NLP applications?

The volume of language understood and managed on a daily basis continues to grow at an exponential rate. The market is expected to grow at a compound annual growth rate of about 27% over the next five years, resulting in about 230% total growth in the period as a whole. This represents a significant opportunity in terms of modelling for insights and decision-making.

Can you tell us some typical use cases of NLP in data science?

Common applications include creating chatbots that can enhance customer service without the need for human interaction, in addition to inferring the sentiment of customer reviews/ feedback/ survey responses, which can be done at scale with NLP algorithms. These algorithms can also highlight specific topics occurring frequently in the data. Another example is in processing or analysing social media posts. Twitter is often used here due to the volume and specificity of tweets, and the immediate availability of the data.

Our workshop focuses on lyrics. Lyrics suffer many of the same pitfalls that exist for more standard text analysis applications, and therefore provide a fair representation of the common obstacles faced by an analyst. Common pitfalls include the existence of stop words, of unstemmed words and the need to generate numerical features from text data, which will all be covered in the workshop.

What are the key takeaways for the workshop?

Our primary goals are to showcase:

  • The main concepts behind NLP and their implementation in R

  • Popular text analysis algorithms

  • Their application in a machine learning context using a web-scraped dataset, with a focus on best-practice coding

  • The basics of tidymodels, RStudio’s brand new machine learning framework in R

If this sounds like a workshop you would like to attend, it’s free and fully supported with a vibrant R community for networking afterwards.

Find out more about LondonR and book your place.

Amanda Cleverly

Content Lead


Amanda leads Ascent’s content strategy as part of her senior marketing role. With a strong background in data science and a passion for technology and innovation, her flair for compelling communications enables her to explore human, technical and business subject matter across diverse industries.

Lets get started

Lets get started section - Home page

Let’s get started.

We help customers build game-changing products, deliver pivotal data and software projects and build strong internal teams. Got a challenge in mind?

We’re ready when you are.


Get In Touch