Predicting Customer Behavior Using Evolutionary Associative Clustering Computer Science Essay

Spatial Association Rule Mining ( SAR ) is an interesting country of the spacial information excavation which involves several stairss and complexness. We have introduced a two measure algorithm in which the first measure dressed ores on the optimisation of SAR utilizing the Hybrid evolutionary algorithm which uses Genetic algorithm and Ant Colony Optimization ( ACO ) . Since Association regulation with multiple aims can be considered as the NP difficult job we are utilizing the Multi nonsubjective familial algorithm and the ACO. The consequences are appreciable when compared to the bing 1s.

In the 2nd measure we try to constellate the generated association regulations and that can be used for the mark group cleavage. We have studied the Customer behavior of the nomadic phone industry based on their location.

Index Terms- SAR, MOGA, ACO, bunch, cleavage.

1. Introduction

Market is characterized by being planetary ; merchandises are indistinguishable and tremendous supply. This leads to the client centric market instead than merchandise centric market. Because of the size of the clients mass selling is expensive and the returns are non guaranting. It leads to the research on the targeted clients. The clients to be targeted can be identified utilizing the theoretical account for foretelling the client behaviour.

Customer profiling is depicting clients by their properties. This can be used to prospect new clients or to drop out bing bad clients [ 1 ] . Customer profiling signifiers a base for the sellers to market with the bing loyal clients and offer them better services and retain them. This can be achieved by pull stringsing the collected information. Depending on the demand of the hr 1 has to make up one’s mind which profile will be good at that clip. We can utilize the specialised information excavation technique such Spatial informations excavation for accomplishing the client behaviours based on the spacial properties.

Spatial Association Rule excavation ( SAR ) is about bring forthing association regulations about spacial informations objects. Either the ancestor or the consequent of the regulation must incorporate some spacial predicates ( such as near ) [ 2 ] . Spatial association regulations are deductions of one set of informations by another such as the mean monthly household income in Madurai for households populating near Annanagar is Rs. 100, 000. Due to the relationships involved the spacial constituents ; one entity can impact the behaviour of other entity. Spatial information points are of course linked to neighbouring informations elements ( e.g. , immediate geographic places ) , these informations elements are non statistically independent. This makes the spacial information excavation different from the normal transactional information excavation.

The assorted activities involved in the SAR is calculating the spacial relationships, bring forthing the frequent sets and pull outing the association regulations. In this paper we are concentrating on the 2nd and 3rd measure for the SAR. The bing attacks use quantitative logical thinking, which computes distance relationships during the frequent set coevals [ 3 ] [ 4 ] . These attacks deal merely with points, see merely quantitative relationships and do non see non spacial properties of geographic informations, which may be cardinal importance of cognition find. Qualitative spacial concluding [ 5 ] [ 6 ] [ 7 ] considers distance and topological relationships between a mention geographic object type and a set of relevant characteristic types represented by any geometric primitive ( e.g. points, lines, and polygons ) . [ 8 ] utilizations qualitative spacial concluding attack with anterior cognition and removes good known forms wholly by early sniping the input infinite and the frequent point sets.

We present a fresh two measure polish algorithm based on intercrossed evolutionary algorithm ( HEA ) which uses familial algorithm with ant settlement optimisation for bring forthing the spacial association regulations and constellating the generated regulations for the needed groups. In the first measure HEA algorithm is used to heighten the public presentation of Multi nonsubjective familial algorithm ( MOGA ) by integrating local hunt with Ant settlement optimisation ( ACO ) , for Multi nonsubjective association regulation excavation. In the proposed HEA algorithm, MOGA is conducted to supply the diverseness of associations thenceforth ; ant settlement optimisation is performed to come out of local optima. From the experiment consequences, it is shown that the proposed HEA algorithm has superior public presentation when compared to other bing algorithms. In the 2nd measure we group the regulations generated for happening the assorted mark groups by constellating. Rules are grouped based on attendant information of the regulations generated by Step 1. Groups of regulations are in the signifier Xi – & gt ; Y for i=1,2, aˆ¦ , n. That is, different regulation ancestors Xi ‘s are collected into one group for a same regulation consequent Y.

The paper is organized as follows ; Section 2 trades with the constructs of SAR and their interesting steps, Section 3 trades with MOGA and the ACO applied for the optimisation of the regulation coevals, method of constellating the regulations is discussed in the Section 4. Section 5 trades with attack followed in this paper. Section 6 discusses the consequences obtained and the Section 7 gives the decision of the paper.

2. Spatial Association Rule Mining

A spacial association regulation is of the signifier X – & gt ; Y, between two disjoint point sets, where Ten is called ancestor and Y is the consequent of the regulation. The ancestor contains a set of predicates from the researching database, the consequent merely represents one predicate, which is non yet included in the ancestor. The regulation itself so reflects an bing relationship between predicates in ancestor and consequent. The association regulation generated is by and large measured by the two prosodies called support and the assurance. Support is defined as the ratio between figure of minutess that contains both X and Y to the entire figure of minutess. Assurance is the ratio of the figure of minutess with all the points to the figure of minutess with merely the “ if ” points. Another metric used is the Lift ( betterment ) tells us how much better a regulation is at foretelling the consequence than merely presuming the consequence in the first topographic point. It is defined as the ratio of the records that support the full regulation to the figure that would be expected, presuming there was no relationship between the points. Spatial association regulations represent object/predicate relationships incorporating spacial predicates. For illustration, the undermentioned regulations are spacial association regulations.

Nonspatial consequent with spacial ancestor ( s )

is_a ( x, house ) ^ close_to ( x, beach ) i? Is_expensive ( x )

Spatail consequent with non-spatial /spatial ancestor ( s ) .

is_a ( x, gasoline station ) i? close_to ( x, main road ) .

Assorted sorts of spacial predicates can be involved in spacial association regulations [ 9 ] .

3. Optimizing the regulation coevals utilizing Evolutionary Computation Techniques

Existing algorithms for the SAR attempt to mensurate the quality of generated regulation by sing merely one rating standard, but because of the turning demand of the cognition from the spacial information we can see the job as a Multi nonsubjective one instead than the individual aim. Multi-objective optimisation trades with work outing optimisation jobs which involve multiple aims. Most real-world hunt and optimisation jobs involve multiple aims ( such as minimising fiction cost and maximise merchandise dependability and others ) and should be ideally formulated and solved as a multi-objective optimisation job.

Over the past decennary, population-based evolutionary algorithms ( EAs ) ( familial algorithms ( GAs ) and development schemes ( ESs ) ) have been found to be rather utile in work outing multi-objective optimisation jobs, merely because of their ability to happen multiple optimum solutions in a individual simulation tally. In general the chief motive for utilizing Genetic Algorithms in the find of high-ranking anticipation regulations is that they perform a planetary hunt and get by better with attribute interaction than the greedy regulation initiation algorithms frequently used in informations excavation. [ 10 ] .

Familial algorithms for regulation find can be divided into two wide attacks, the Michigan attack and the Pittsburgh attack [ 11 ] . The biggest separating characteristic between the two is that in the Michigan attack ( besides referred to as Learning Classifier Systems ) an person is a individual regulation, whereas in the Pittsburgh attack each single represents an full set of regulations [ 12 ] . In this paper we follow the fist attack ie the Michigan attack for the SAR.

The MOGA is used to accomplish the multi nonsubjective by with a Pareto based multiple-objective familial algorithm. The possible regulations are represented as chromosomes and a suited encoding/decoding strategy has been defined, it besides provides the diverseness of associations among the regulations generated by elitism. To increase the efficiency of the MOGA we are utilizing the ACO, which limits the algorithm from falling to the local optimum solution.

ACO is a paradigm for planing meta heuristic algorithms for combinative optimisation jobs. The ACO algorithm was foremost introduced by Colorni, Dorigo and Maniezzo [ 13 ] [ 14 ] and the first Ant System ( AS ) was proposed by Dorigo in his Ph.D. thesis [ 15 ] . The ACO is a meta-heuristic algorithm, which utilizes the inspiration from existent ant settlements behaviours to happen a shortest way from a nutrient beginning to the nest without utilizing ocular cues by working pheromone information [ 16 ] [ 17 ] [ 18 ] . When ant settlements are seeking for nutrient, they leave a sort of chemical composings, which is called pheromone. The more emmets walk through the way, the more pheromone left on the land. Then, the following emmet will take one way with a chance proportional to the sum of pheromone. Finally this positive feedback procedure will build a shortest way from their nest to the nutrient beginning.

The feature of ACO algorithms is their expressed usage of elements of old solutions.

Edge Choice:

An emmet will travel from node I to node J with chance


I„i, J is the sum of pheromone on border I, J

I± is a parametric quantity to command the influence of I„i, J

I·i, J is the desirableness of border I, J ( a priori cognition, typically 1 / di, J )

I? is a parametric quantity to command the influence of I·i, J

Pheromone Update

I„i, J = ( 1 a?’ I? ) I„i, J + I”I„i, J


I„i, J is the sum of pheromone on a given border I, J

I? is the rate of pheromone vaporization

and I”I„i, J is the sum of pheromone deposited, typically given by

where Lk is the cost of the kth emmet ‘s circuit ( typically length ) .

4. Clustering the regulations

Clustering association regulations is one of the meaningful ways of grouping association regulations into different bunchs. When the Spatial Association regulations are generated in order to place the group of marks we are utilizing the bunch attack. In [ 19 ] , the writers selected extremely ranked ( based on assurance ) association regulations one by one and formed bunch of objects covered by each regulation until all the objects in the database are covered. The writers of [ 20 ] formed bunch of regulations of the signifier Xi – & gt ; Y, that is, regulations with different ancestor but with same consequent Y and they extracted representative regulations for each bunch as cognition for the bunch. In [ 21 ] , the writers formed bunch of regulations based on construction distance of ancestor. The writers of [ 22 ] formed hierarchal bunch of regulations based on different distance methods used for regulations. In [ 23 ] , the writers discussed different ways of sniping excess regulations including regulation cover method. All Associative Classifier ( AC ) CBA, CMAR [ 24 ] , RMR [ 25 ] , and MCAR [ 26 ] generate bunch of regulations called class-association regulation ( CAR ) with category label as same consequent and they use database ( regulation ) screen to choose possible regulations to construct ( AC ) classifier theoretical account. In most of the ARM work, assurance step is used to rank association regulations. Besides, other steps such as chi-square, laplace-accuracy is used to choose extremely graded regulations.

In this paper we are utilizing the classifier theoretical account which uses the attendant information for grouping. The bunchs will be formed who are holding their consequent as similar form. We have foremost grouped based on the properties ; it may be homogenous like urban nucleus, suburbs, rural or Hierarchical groups like Metropolitan country, major metropoliss, and vicinities. Then this is further grouped based on the intent like sectioning the population by consumer behaviour. We have used the algorithm proposed in [ 27 ] .

The bunch algorithm groups of regulations are in the signifier Xi – & gt ; Y for i=1,2, aˆ¦ , n. That is, different regulation ancestors Xi ‘s are collected into one group for a same regulation consequent Y. following measure is to choose little set of representative regulations from each group. Representative regulations are selected based on regulation case screen as follows.

Let Ry= { Xi – & gt ; Y | i=1,2, aˆ¦ , N } be a set of n regulations for some item-set Y and m ( Xi Y ) be rule screen, which is the set of tuples/records covered by the regulation Xi – & gt ; Y in the dataset D.

Let Cy be the bunch regulation screen for a group or bunch of regulations Ry. i.e. ,

Cy = m ( Ry ) = U i=1,2, aˆ¦n m ( X i Y )

from bunch regulation set Ry, find a little set of K regulations ry called representative regulation set such that m ( ry ) is about equal to m ( Ry ) . i.e. ,

m ( ry ) a‰? m ( Ry ) , or

U i=1,2, aˆ¦k m ( X i Y ) a‰?U i=1,2, aˆ¦n m ( X i Y ) , where K & lt ; & lt ; n

To happen representative regulation set ry from Ry, we use the regulation screen algorithm proposed in [ 20 ] .

5. Application of HEA for Spatial Association regulation excavation

The processs of HEA are as follows. First, MOGA searches the solution infinite and generates association lists to supply the initial population for ACO. Next, ACO is executed, when ACO terminates, the crossing over and mutant operations of MOGA generate new population. ACO and GA search alternately and hand in glove in the solution infinite. Then the regulations are clustered utilizing the regulation screen based on the attendant information.

Measure 1: Pseudo codification for optimisation of regulation coevals

1. while ( t & lt ; = no_of_gen )

2. M_Selection ( Population ( T ) )

3. ACO_MetaHeuristic

while ( not_termination )

generateSolutions ( )

pheromoneUpdate ( )

daemonActions ( )

terminal while

terminal ACO_MetaHeuristic

4. M_Recombination_and_Mutation ( Population ( T ) )

5. Measure Population ( T ) in each aim.

6. t = t+1

7. terminal while

8. Decode the persons obtained from the population with high fittingness map.

Measure 2: Pseudo codification for constellating the regulations generated

Input signal: set of regulations generated by the HEA Ry= { Xi – & gt ; Y | i=1,2, aˆ¦ , N } and the regulation screen.

Generate the bunch regulation screen

count = figure of records in the bunch screen

while ( no of records in the bunch screen & gt ; 2 % of count )

Sort all the regulations in the Ry in the falling order of the regulation screen.

Take the first regulation R with highest regulation screen

If the no of records in the regulation screen is & lt ; = 2 % of count

Exit while cringle

End if.

ry = ry U R

Delete the highest regulation screen from the bunch screen

End While

End product: the representative regulation set.

The representative regulation set is used for the cleavage of the consequent.

6. Consequences and treatments

We have used the synthesized dataset for our research. The country of survey is Madurai City.

Fig 1: Madurai City

Data has been collected in and around the metropolis of Madurai. The chief purpose of the informations aggregation is about the Mobile phone users based on their service suppliers, Mode of use and the sum of recharge done by the clients on the location footing. The general process of informations excavation is: inquiry rise a†’ informations readying ( including informations choice, informations pretreatment and informations transmutation ) a†’ data agreement a†’ theoretical account building/data excavation a†’ consequence rating and account. Data readying is the key which determines the success of informations excavation. The procedure of spacial information is much more complex [ 28 ] . After preprocessing we have transformed the spacial informations in term of.xls file. We have implemented the footing of the apriori algorithm of association regulation, we programmed to finish the computation in virtuousness of M-language in Matlab. The specific process is as following.

( 1 ) Take advantage of “ import ace ” in Matlab to carry through the import of informations file. Until now, the information Fieldss and character Fieldss are saved individually. For illustration, the default uses a matrix named “ data_num ” to maintain numerical Fieldss and a matrix named “ textdata ” to maintain character Fieldss.

( 2 ) Run algorithm step 1to generate the regulations.

( 3 ) Run algorithm step 2 to bring forth the mark group utilizing Java.

Keeping the assurance as 50 % we have computed the consequences. In fig 1 the comparing has been done for the figure of regulations generated to the support count given with the Apriori algorithm, Apriori algorithm optimized with the MOGA and the Apriori algorithm optimized with HEA proposed in Step 1.

Fig 2: Comparison of the three algorithms based on the figure of regulations generated

From Fig 2 we can hold the undermentioned observations

1. When the Support is increased the Numberss of regulations generated are diminishing and the usage of HEA besides performs a important alteration in the figure of regulations generated.

2. HEA public presentation is near with the MOGA, but the application of the ACO reduces the figure of needful regulations generated.

In fig 3 the comparing has been done for the lift ratio for the top 500 regulations generated to the support count given with the Apriori algorithm, Apriori algorithm optimized with the MOGA and the Apriori algorithm optimized with HEA proposed in Step 1.


Lift ratio says us how much better the regulation is better as foretelling the consequence than merely presuming the consequence in the first topographic point. It is defined as the ratio of the records that support the full regulation to the figure that would be expected, presuming there was no relationship between the points. From Fig 2 we can hold the undermentioned observation, Lift ratio for the HEA is better than the other two algorithms. This shows the efficiency of the HEA to place the regulations for foretelling the consequence.

In fig 4 the comparing has been done for computational clip for the support count given with the Apriori algorithm, Apriori algorithm optimized with the MOGA and the Apriori algorithm optimized with HEA proposed in Step 1.

Fig 4: Comparison of the three algorithms based on the computational clip

By the polish of the regulations generated HEA algorithm in measure 2 by the bunch construct is utile in contracting the cleavage.

The cleavage has been done to happen the popular Service supplier in the assorted locations of Madurai, Mode of use used and the sum of recharge done



Based on the above analysis we can happen the undermentioned facts.

1. The maximal use is based on the BSNL service supplier

2. In the north street country Airtel provides the maximal use

Fig 7: Customer Profiling Based on Area and manner of use

Fig 8: Comparative analysis for Area and manner of use

Based on the above analysis we can happen the undermentioned facts

1. Rate cutter is preferred by the users

2. Students strategy is used largely for the free SMS strategy

3. Life clip card is preferred in the topographic point where their age is above 40


Fig 10: Comparative analysis for Area and sum of use

The information for the sum of recharge is taken for a frequeny for a hebdomad. Based on the above analysis we can happen the undermentioned facts

1. The preferable recharging is for the sum of Rs. 50 is more

2. In three countries the sum of recharge is largely around Rs. 100-200.

7. Decision

Prediction of Customer Behavior for the Mobile communicating in the country of survey gives us knowledge about the behavioural tendencies of the Customers. The use of the Associative bunch algorithm optimized by the Hybrid evolutionary algorithm provides efficient use of the given informations to organize the cleavages. HEA reduces the clip for the anticipation and increases the lift ratio of the regulations generated. Clustering the Association regulations gives the formation of the cleavage of the preferable client behaviour. In future we can utilize Classification over the Clustered Rules.