PROPOSED them. 4. SENTIWORDNET Sentiwordnet is a sophisticated

PROPOSED METHOD

The proposed method is
to get the score of a sentence based on the features extracted. Once the
features are extracted the data will get a score and based on that we can come
to a conclusion if the sentence falls towards the positive or the negative side.
If the score is above 0.5, it is a positive data and below 0.5 it is a negative
data.

1.  
WORD TOKENIZING

It is
the process of breaking the words in the sentence. They are called as tokens.
By this way, the features can be analyzed in the data.

2. 
 STOPWORDS

Stop
words are the most common words occurring in the data. For example, the grammar
used in the data will be removed. The stop words can be imported using the
nltk.corpus package.  

 

3.   PART OF SPEECH TAGGING

This
functionality tags each word with its part of speech that is based on the word
it tags if it is a noun, verb, adjective, adverb etc. This will help when
sentiwordnet is applied on them.

4.  
SENTIWORDNET

Sentiwordnet
is a sophisticated feature that can be imported using the package wordnet. It
is a default package present in the natural language tool kit. Synset is a
functionality which helps to find the score of each word.  We need to tag the word with its part of
speech and it will give us a score.

 

5.   ALGORITHM

The algorithm is used for getting the
sentiment score of the data in the dataset.

 

The algorithm is used for
classification.

 

6.   TF-IDF VECTORIZATION

As the name suggests, it states
the number of times the word has occurred in the dataset. Term Frequency –
Inverse Document Frequency13 helps in retreiving the data too. It is majorly
used in text mining. The value of tf-idf increases when the word appears in the
dataset. This can be imported using the nltk tool kit by importing it from
sklearn.feature_extraction.text package. Once the features are extracted then
it can be used for training the classifier. For example, let us consider the
following sentences,

It is a windy day today.

It is going to rain today.

In both these sentences, the stop
words are removed and only the features are taken, which is “windy”, “day”,
“today” from first sentence and “going”, 
“rain”, “today” from the second sentence. It then calculates the term
frequency that is number of times the term has occurred in the data set and how
relevant it is. “today” has occurred two times.