Process unstructured text data using Terrene's natural language processing toolkit

1

Acquire Training Data

For this guide, we will be using the Scotch Whisky Reviews from Kaggle. This dataset contains a description about each scotch and we will be using Terrene to extract features from this dataset.

This guide assumes you know how to upload data to Terrene and will not cover that.

2

Calculate subjectivity and polarity of each review

Right click on the "description" column and from "Text Processing" submenu, pick "Calculate Sentiment". This will take a few seconds and will then calculate the sentiment of each review.

3

Count number of nouns and tokenize top 5 most used adjectives

Right click on the "description" column and from "Text Processing" submenu, pick "Count Nouns". This will take a few seconds and will then calculate the number of nouns in each review.

Again right click on the "description" column and from "Text Processing" submenu, pick "Top 5 Adjectives". This will take a few seconds and will then tokenize the top 5 most frequently used adjectives as new columns.

4

One-hot encode review categories

Right click on the "category" column and from "Text Processing" submenu, pick "One-Hot Encode". This will take a few seconds and will then one-hot encode your categories.


It’s easy to get started

Download Terrene and start your 14 days free trial

Download Now