{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sentiment Analysis with VADER" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sentiment analysis estimates whether whether a piece of text is negative, neutral or positive.\n", "There are two approaches towards sentiment analysis: binary or polarity based or intensity/valence based.\n", "The polarity based only provides information if a certain text is postive or negative. \n", "For example, 'good' and 'perfect' would score the same. \n", "In contrast, a valence based sentiment analysis also takes the intensity of a word into account, therefore giving a higher value to 'perfect' in the example above.\n", "\n", "One of the major applications of sentiment analysis is judge social media streams (Twitter, facebook posts).\n", "For example, companies use it to identify negative feedbacks on social media and react/answer to them.\n", "\n", "\n", "Python provides the excellent [Natural Language Toolkit (NLTK)](http://www.nltk.org/) for automatic text processing.Among others, NLTK includes the sentiment analysis package [VADER (Valence Aware Dictionary and sEntiment Reasoner)](https://github.com/cjhutto/vaderSentiment). \n", "\n", "VADER used a valence based approach for the sentiment analysing, building upon a lexicon of sentiment-related words.\n", "When VADER analysis a text, it check for the word in its lexicon, adding their negative or positive sentiments.\n", "The final (compound) score is the standardized sum of the ratings.\n", "\n", "You can find out more about VADER in the article [Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.](http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To start with VADER, install NLTK (you might need sudo rights to do this).\n", "\n", "```\n", "pip install -U nltk\n", "```\n", "\n", "For further information see http://www.nltk.org/install.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first time you use ntlk and VADER, you need to download the VADER lexicon. To do so (dont do that if you run it on the IE Server):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# import nltk\n", "# nltk.download()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Either you now download \"all\" packages, or just choose the \"vader_lexicon\" in the category \"Models\".\n", "You only need to download the packages the first time you use them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having setup everything your are good to go:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "import nltk\n", "from nltk.sentiment.vader import SentimentIntensityAnalyzer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you can start analysing the first sentence:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'compound': 0.4404, 'neg': 0.0, 'neu': 0.408, 'pos': 0.592}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = \"This was easy\"\n", "SentimentIntensityAnalyzer().polarity_scores(text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most important part of the output dictonary is the 'compound' part, which gives you the overall rating. \n", "Positive values indiacte an overall positive rating, negative values the opposite. The 'compound' value can range from -1 to +1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To automatically process a bunch of text, run VADER in a loop (tricky sentences from the [nltk - VADER docs](http://www.nltk.org/howto/sentiment.html):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "tricky_sentences = [\n", " \"Most automated sentiment analysis tools are shit.\",\n", " \"VADER sentiment analysis is the shit.\",\n", " \"Sentiment analysis has never been good.\",\n", " \"Sentiment analysis with VADER has never been this good.\",\n", " \"Warren Beatty has never been so entertaining.\",\n", " \"I won't say that the movie is astounding and I wouldn't claim that the movie is too banal either.\",\n", " \"I like to hate Michael Bay films, but I couldn't fault this one\",\n", " \"It's one thing to watch an Uwe Boll film, but another thing entirely to pay for it\",\n", " \"The movie was too good\",\n", " \"This movie was actually neither that funny, nor super witty.\",\n", " \"This movie doesn't care about cleverness, wit or any other kind of intelligent humor.\",\n", " \"Those who find ugly meanings in beautiful things are corrupt without being charming.\",\n", " \"There are slow and repetitive parts, BUT it has just enough spice to keep it interesting.\",\n", " \"The script is not fantastic, but the acting is decent and the cinematography is EXCELLENT!\",\n", " \"Roger Dodger is one of the most compelling variations on this theme.\",\n", " \"Roger Dodger is one of the least compelling variations on this theme.\",\n", " \"Roger Dodger is at least compelling as a variation on the theme.\",\n", " \"they fall in love with the product\",\n", " \"but then it breaks\",\n", " \"usually around the time the 90 day warranty expires\",\n", " \"the twin towers collapsed today\",\n", " \"However, Mr. Carter solemnly argues, his client carried out the kidnapping \\\n", " under orders and in the ''least offensive way possible.''\"\n", "]\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Most automated sentiment analysis tools are shit. - score: -0.5574\n", "VADER sentiment analysis is the shit. - score: 0.6124\n", "Sentiment analysis has never been good. - score: -0.3412\n", "Sentiment analysis with VADER has never been this good. - score: 0.5228\n", "Warren Beatty has never been so entertaining. - score: 0.5777\n", "I won't say that the movie is astounding and I wouldn't claim that the movie is too banal either. - score: 0.4215\n", "I like to hate Michael Bay films, but I couldn't fault this one - score: 0.3153\n", "It's one thing to watch an Uwe Boll film, but another thing entirely to pay for it - score: -0.2541\n", "The movie was too good - score: 0.4404\n", "This movie was actually neither that funny, nor super witty. - score: -0.6759\n", "This movie doesn't care about cleverness, wit or any other kind of intelligent humor. - score: -0.1338\n", "Those who find ugly meanings in beautiful things are corrupt without being charming. - score: -0.3553\n", "There are slow and repetitive parts, BUT it has just enough spice to keep it interesting. - score: 0.4678\n", "The script is not fantastic, but the acting is decent and the cinematography is EXCELLENT! - score: 0.7565\n", "Roger Dodger is one of the most compelling variations on this theme. - score: 0.2944\n", "Roger Dodger is one of the least compelling variations on this theme. - score: -0.1695\n", "Roger Dodger is at least compelling as a variation on the theme. - score: 0.2263\n", "they fall in love with the product - score: 0.6369\n", "but then it breaks - score: 0.0\n", "usually around the time the 90 day warranty expires - score: 0.0\n", "the twin towers collapsed today - score: -0.2732\n", "However, Mr. Carter solemnly argues, his client carried out the kidnapping under orders and in the ''least offensive way possible.'' - score: -0.5859\n" ] } ], "source": [ "for sentence in tricky_sentences:\n", " sentiment = SentimentIntensityAnalyzer().polarity_scores(sentence)['compound']\n", " print(\"{} - score: {}\".format(sentence, sentiment))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'compound': 0.5106, 'neg': 0.0, 'neu': 0.233, 'pos': 0.767}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SentimentIntensityAnalyzer().polarity_scores(\"Have fun\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" }, "nikola": { "category": "", "date": "2019-11-27 11:11:18 UTC+01:00", "description": "", "link": "", "slug": "sentiment-analysis-with-vader", "tags": "", "title": "Sentiment Analysis with VADER", "type": "text" } }, "nbformat": 4, "nbformat_minor": 4 }