DEPARTMENT OF INFORMATION
Submitted in fulfillment of the requirements
For the degree of
Bachelor of Technology in Information Technology
PREKSHA B. PAREEK
DEPARTMENT OF INFORMATION TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOGY
This is to certify that the project/Seminar entitled
“TEXT MINING” submitted by SHAH BRIJESH (16BIT128) , towards the partial
fulfillment of the requirements for the degree of Bachelor of Technology in Information
Technology of Nirma University is the record of work carried out by him/her
under my supervision and guidance. In my opinion, the submitted work has
reached a level required for being accepted for examination.
PREKSHA PAREEK Dr.
Dept. of Information Technology,
of Computer Science & Engg.,
Institute of Technology,
of Technology, Nirma
Table of Contents
List of figures
List of tables
1.1 General 1
Text mining is basically defined as conversion of huge data or
documents into useful numbers. Text mining is used to analyze useful or
meaningful information from raw data with use of various algorithms and patterns
in the data. Text mining is used for unstructured data or Semi structured data
such as Emails, text message. It used to filter out spam message in emails by
identifying certain text common is such emails. After certain information
retrieval from the data/documents this data is used in data mining projects
(clustering and factoring, graphics, predictive data mining).
Some Common aspects of Text Mining include removing certain keyword
like “THE”, punctuation marks etc. from the important data to improve search
quality. We will learn about it in preprocessing text
Application of Text Mining
The main objective of text mining is to reduce time utilization
and filtering out unnecessary data from the main keywords or important data. It
is used to provide better services to the users by giving proper feedback. It
is used to by businesses to analyze consumer base and provide services
accordingly by targeting the potential customers.
Filtering based on IP address is not sufficient certain techniques of Text
Mining are uses to detect salting. Salting is basically adding certain
information to make it look like original or official content. Email service providing
companies uses text mining to filter out spam messages, promotional message from
the rest of important messages thus saving users time and resources. This can
be used for further filtering out messages according to the suitable age group.
It is used to provide protection against phishing and spamming.
Analysis is used to identify positive, negative or neutral reviews about a
subject. Consider a watching a TV SERIES based on the reviews of viewers. The
text used in writing reviews is analyzed and according to the keywords used the
emotion of the user is identified which can be used for marking them as positive
or negative reviews of the show. It also focuses on the words and phrases to
identify how negative or positives these words are.
this Statement -“I LOVED THE NEW MOBILE. BUT IT IS VERY EXPENSIVE AND DOES NOT
HAVE GREAT BATTERY LIFE”.
to the first line the customer seems impressed but the overall the customer has
a negative impression of the product.
Analysis are used to give indication about products such as while reading reviews about a hotel you
come across a word ROTTEN this
a negative impression about the hotels.
IN BIOMEDICAL DOMAINS
by Year the numbers of researches in medical fields are increasingly
significantly thus the necessity of text mining is evident text mining is used
for quickly sorting out the necessary data from medical record which are available.
IN FIELDS like Cancer treatment text mining means improvising diagnostics, treatment,
and prevention of cancer by mining of database.
important use of text mining is mining EHR (Electronic Health Record) is used to
search the patients previous records of certain diseases and medical history.
Mining is used in for comparing gene markers with the previous
and identifying different pattern in genes for checking diseases.
SOCIAL MEDIA PLATFORMS
media is used connecting people i.e. interactions and conversations. Some of these
well known platforms are twitter,facebook,orkut.