You have been hired as a developer at SentimentScope, a startup that is creating a sentiment analysis app for businesses. The software is meant for business owners who want to measure and understand customer sentiment from feedback and reviews. The app will serve as a prototype that allows your clients to test different sentiment analysis models with their own inputs.
The most relevant files in the project are the following:
File name | Description |
---|---|
app.py | The main application file. It contains the code for the application. |
templates/form.html | The template file for the application. It defines the structure and layout of the web form. |
static/style.css | The CSS stylesheet for the application. It defines the visual style of the web form. |
tests/test.py | The pytest unit test file. It contains test cases to ensure the functionality of the code. |
notebooks/Vader_Sentiment_Analysis.ipynb | A colab notebook that demonstrates how to use the VADER library for sentiment analysis. |
You will mostly work with the app.py
and form.html
files.
You will need to create and upload the following files:
File name | Description |
---|---|
notebooks/UCI_Sentiment_Analysis.ipynb | The notebook that you use to create your own model. You will create and upload this file. |
models/tokenizer.pickle | The tokenizer object used for predicting sentiment analysis. You will create and upload this file. |
models/uci_sentimentanalysis.h5 | Your keras sentiment analysis model. You will create and upload this file. |
- Fork this repository.
- Create a Codespace for your project.
- GitHub Codespaces will automatically create a fully-configured development environment for your project in the cloud. If you need to install the requirements manually run:
pip install -r requirements.txt
- To run the app, use the following command:
python app.py
orgunicorn app:app
- To run the tests, use the following command:
pytest tests/*.py
Tip: Review the Github Codespaces documentation if you are unsure on how to run and test your app from Codespaces.
Note: It's not required to pass all the unit tests to complete this assessment, but it is recommended that you use the tests as a guide to deliver a functional application and meet the rubric requirements.
The product manager has already created the user stories for SentimentScope - Sentiment Analyzer. Each of the user stories is listed below, and your product manager wants them to be implemented in the order in which they are listed. Another developer has already written the tests for the application so that you don't have to.
Each of the user stories is listed below. The user stories are to be implemented in the order in which they are listed. Find the TODO comments in the code and create the necessary functionality.
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a popular open-source tool for sentiment analysis of text. It is specifically designed to analyze social media texts, but can be used for any form of text that needs sentiment analysis. VADER uses a combination of a sentiment lexicon (a list of words with assigned sentiment scores) and grammatical rules to determine the sentiment of a text, or whether it is positive, negative, or neutral. The sentiment scores generated by VADER can also reflect the intensity or strength of the sentiment. VADER is known for its effectiveness in handling noisy, informal, and contextualized text data.
Here is an example of how to get the sentiment analysis of a text with VADER:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
s = sid.polarity_scores("hey! this is bad!")
print(s)
Output:
{'neg': 0.577, 'neu': 0.423, 'pos': 0.0, 'compound': -0.6229}
The output returns a dictionary with the results. In this case, the string "hey! this is bad!" is thought to be 0.577 negative, 0.423 neutral, and 0.0 positive according to VADER.
Use this Colab notebook to get familiar with Vader.
Import and use VADER in your application so that it calculates the sentiment analysis scores for the input given by a user.
Tip: Use
text = request.form.get("user_text")
to get the text from the form.
Tip: Use
sentiment = analyzer.polarity_scores(text)
to calculate the scores.
Add the sentiment analysis results that are returned by VADER to form.html
.
Multiply the results by 100 so that it prints a percentage, such as{{ sentiment['pos'] * 100 }}
.
Overall, it should look something like the following code:
{% if sentiment %}
<p>Positive: {{ sentiment['pos'] * 100 }}%</p>
<!-- add the negative, neutral and compound scores here-->
{% endif %}
- The app uses the VADER analyzer to calculate results.
- Positive, negative, neutral, and compound scores are displayed after hitting the submit button.
To set your product apart from others using VADER, you can enhance it by developing tailor-made sentiment analysis models for your clients. It is important for your application to demonstrate that it can accomodate custom sentiment analysis models. In this step, you will construct a personalized sentiment analysis model using Keras.
You will use the UCI Labelled Sentences dataset for this step. The dataset consists of a collection of sentences that have been labeled with their respective sentiment or category. Each sentence in the dataset is labeled with a specific category or sentiment.
Create a new Colab notebook and proceed with the steps described below:
Tip: Feel free to experiment with the code snippets provided below to see how the accuracy of the model is affected.
Use the following commands to download and unzip the UCI Labelled Sentences dataset in your Colab notebook:
# download dataset from the UCI website
!curl -o uci-labelled-sentences.zip https://archive.ics.uci.edu/static/public/331/sentiment+labelled+sentences.zip
# unzip dataset in Colab
!unzip uci-labelled-sentences.zip
Import the necessary libraries for data manipulation and deep-learning modeling. Include the following code in your notebook:
import pandas as pd
import pickle
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM
from keras.callbacks import EarlyStopping
The datasets include data from Yelp, Amazon, and IMDB, each with labeled sentences. Load the datasets using the following code:
df_list = []
# Yelp
df_yelp = pd.read_csv('sentiment labelled sentences/yelp_labelled.txt', names=['sentence', 'label'], sep='\t')
df_yelp['source'] = 'yelp'
df_list.append(df_yelp)
# Amazon
df_amazon = pd.read_csv('sentiment labelled sentences/amazon_cells_labelled.txt', names=['sentence', 'label'], sep='\t')
df_amazon['source'] = 'amazon'
df_list.append(df_amazon)
# IMDB
df_imdb = pd.read_csv('sentiment labelled sentences/imdb_labelled.txt', names=['sentence', 'label'], sep='\t')
df_imdb['source'] = 'imdb'
df_list.append(df_imdb)
# Concatenate the dataframes
df = pd.concat(df_list)
df.head()
Note: The code above creates a list called
df_list
to store multiple pandas dataframes. Each dataframe corresponds to a dataset from sources like Yelp, Amazon, and IMDB. The code reads the CSV files of these datasets and assigns column names ('sentence' and 'label') to the dataframe. It also adds a new column, 'source', to each dataframe to indicate the source of the data. Finally, the code concatenates all the dataframes indf_list
into a single dataframe calleddf
. This allows for a combined dataset from multiple sources.
Tokenize the sentences by converting them into sequences of numbers using the Tokenizer class. Include the code segment below:
max_features = 2000
tokenizer = Tokenizer(num_words=max_features, split=' ')
tokenizer.fit_on_texts(df['sentence'].values)
X = tokenizer.texts_to_sequences(df['sentence'].values)
X = pad_sequences(X)
y = df['label'].values
Split the dataset into training and test sets using the train_test_split
function from sklearn.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.12)
Create the sentiment analysis model using Keras. The model includes an embedding layer, LSTM layer, and dense layer. Here's an example of how to define the model:
def create_model():
model = Sequential()
model.add(Embedding(max_features, 64, input_length=X.shape[1]))
model.add(LSTM(16))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = create_model()
Train the model using the training set and evaluate its performance on the test set. Include the code below:
model.fit(X_train, y_train, epochs=6, batch_size=16, validation_data=(X_test, y_test), callbacks = [EarlyStopping(monitor='val_accuracy', min_delta=0.001, patience=2, verbose=1)])
Use the code below to save the trained model and tokenizer for future use.
model.save("uci_sentimentanalysis.h5")
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.DEFAULT_PROTOCOL)
The tokenizer and the trained model are ready to be used in your app. Download the files to your computer.
- A notebook was created and used to produce the tokenizer and the model.
- The notebook is included in the
notebooks
folder.
You now have a model and a tokenizer that can be used in your app. You will need to test that the model and tokenizer files that you created work as expected before integrating them into your Flask application.
Use this Colab notebook to test that your model and tokenizer work as expected.
Upload the tokenizer.pickle
and uci_sentimentanalysis.h5
files to the Google Colab environment that you are working on.
Run the code in the notebook to verify that your model works. The output should look similar to the screenshot below.
- The model and the tokenizer work as expected.
Add the tokenizer and model in your models
folder.
Your imports should look similar to the code segment below.
import pickle
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import sequence
from flask import Flask, render_template, request
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
Use the code below to load the keras model and the tokenizer.
model = None
tokenizer = None
def load_keras_model():
global model
model = load_model('models/uci_sentimentanalysis.h5')
def load_tokenizer():
global tokenizer
with open('models/tokenizer.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
@app.before_first_request
def before_first_request():
load_keras_model()
load_tokenizer()
Tip: To avoid a memory leak, it's necessary to load the tokenizer and keras files exactly as shown in the snippet above. Loading models without using the
@app.before_first_request
decorator may lead to a memory leak.
The sentiment analysis method will calculate the sentiment analysis score. Preprocessing the text as you did in the notebook is necessary for the model to work correctly. The code snippet below returns the rounded result of the prediction.
def sentiment_analysis(input):
user_sequences = tokenizer.texts_to_sequences([input])
user_sequences_matrix = sequence.pad_sequences(user_sequences, maxlen=1225)
prediction = model.predict(user_sequences_matrix)
return round(float(prediction[0][0]),2)
Call the sentiment_analysis()
function to calculate the sentiment analysis score with the user input. Add the result to the sentiment
dictionary where you are already storing the VADER results.
sentiment = analyzer.polarity_scores(text) # VADER results
# create a new key in the dictionary to store the custom model sentiment analysis results
sentiment["custom model positive"] = sentiment_analysis(text)
Note: If you encounter errors while loading an H5 file into your application, it could indicate that the file got corrupted during the upload process or when it was generated. If you encounter an error of this nature, it is recommended that you generate a new H5 file. The simplest solution is to re-upload the H5 file, as this often resolves file corruption issues. By creating a fresh file, you can ensure the integrity of your data and minimize potential errors during file loading.
Display the result in form.html
:
{{ sentiment['custom model positive'] }}
- Your custom sentiment analysis model is used in the app.
- The model's prediction is displayed in the app.
The user interface design is completely up to you. Feel free to use ChatGPT or another AI assistant to build the UI.
Here are some examples of what your app may look like:
Tip: If you want to integrate plotly charts (as shown in the Example 2 screenshot), feel free to adapt this code in your app.
- The app displays the results of the sentiment analysis prediction.
- An original UI design is used in the app.
You made it! You finished the first version of your app. Now, it's time to deploy that app so that it can be tested by your clients.
Deploy the app on Render so that it becomes part of your portfolio. Use the following settings in Render to build and start the app:
-
Build command:
pip install --upgrade pip && pip install -r requirements.txt
-
Start command:
gunicorn app:app
Render uses an older version of Python as default but this can be easily changed by setting an environment variable. Inside the Environment tab, create a new environment variable called PYTHON_VERSION
and set its value to 3.10.12
. This will ensure that Render uses Python version 3.10.12 (the same version that you used in your notebook) for your app. Refer to the screenshot below for a visual guide.
Note: Render automatically spins down a free web service if it remains inactive for 15 minutes without any incoming traffic. When a request is received, Render promptly spins up the service. However, the spinning up process may take a few seconds or, in some cases, up to a couple of minutes. During this time, there might be a noticeable delay for incoming requests, resulting in brief moments of hanging or slower page loads in a browser.
- The project is deployed on Render.
Functionality:
- The app displays the sentiment analysis predictions using VADER.
- The app displays the sentiment analysis predictions using a custom sentiment analysis Keras model.
- A model was trained using the UCI sentiment labelled sentences dataset and the notebook is included in the repository.
- An original user interface was created for the app.
General code organization:
- Minimal code duplication
- Comments are used to describe the functions.
- Follow the order of the user stories.
- If you are stuck, take a careful look at the provided resources. If you are still stuck, ask a friend or a mentor for help.
- Read the user stories and tests carefully.