Government or concerns may seek determination support systems to help with puting schemes or doing concern related determinations. Naive Bayesian classi- fier ( NBC ) provides an efficient and easy understood categorization theoretical account for informations excavation and machine acquisition needed to make a DSS. NBC is one manner companies ( authorities ) can better understand its clients ( people ) through gathered informations. In actuality, nevertheless, garnering sufficient datasets for building a DSS may be dearly-won ; ensuing in undependable apprehension, taking to hapless determinations and big losingss. Furthermore, NBC is non without its drawbacks: ( I ) decreased categorization truth when attributes exhibits non-independency, and ( two ) unable to manage non- parametric uninterrupted properties. In this thesis, mega-trend-diffusion ( MTD ) technique is introduced to turn to the costliness issue of garnering sufficient sam- ple ; while besides researching proposed method to the jobs of NBC, which includes structural betterment and discretisation. The end is to look into the practicality of Bayesian statistics in the existent universe ; more specifically how a Bayesian attack can significantly lend to better determination devising in the selling context.

Chapter 1

Introduction

Over the past 12 old ages Bayesian statistics have seen an addition in usage for mar- keting intents. Versatility of the Bayesian method have seen it being applied over a broad scope of the selling mix. This includes pricing determinations to promotional runs to variegation determinations. Although constructs of Bayesian statistics have long been recognised since its outgrowth in 1763, it was deemed as impractical in the selling context up until mid-1980s. This is because the ‘class of theoretical accounts for which the buttocks could be computed were no larger than the category of theoretical accounts for which exact trying consequences were available ‘ – Rossi and Allenby [ 2003 ] . Furthermore, the Bayesian attack requires the rating of a anterior distribution, which can take a batch of attempt, clip and cost to set up. However, the recent decennary have brought about an addition in computational power and modeling discovery which lead to a revival in usage of Bayesian methods, peculiarly for the selling field. Bayesian methods with its basic advantages makes it an attractive tool particularly to ease determination devising in selling jobs.

In this thesis, we aim to larn a categorization method that employs the Bayesian method that provides a logical manner of taking into history the available information about the sub-sample associated with the different categories within the overall sample. More specifically, we will look into a simple probabilistic classifier that is based on the Bayesian Theorem with strong independency premise called the naif Bayesian classifier. This independent characteristic theoretical account have seen broad appli- cations in selling, such as for a process to analyzing client features in the paper by Green [ 1964 ] . We will discourse in ulterior chapters how this classifier is constructed and see how it can be applied in a selling context for pricing determination.

1

Firms and authoritiess seek determination support systems ( DSS ) to help in doing determination and puting schemes. To set up an effectual DSS, besides known as concern intelligence, informations excavation techniques and machine acquisition is required as nucleus maps. However in quickly switching markets and an of all time demanding popu- lation, determination shapers are pressured to do determinations within really short period of clip. Classifiers in the machine larning procedure can merely work good when suffi- cient informations is collected for preparation. This leads to the job of roll uping equal sample sets for an enlightening DSS. To turn to the issue of excess cost normally attributed to measuring a anterior distribution in the Bayesian method, this thesis besides introduces and surveies a technique proposed by Li, Wu, Tsai, and Lina [ 2007a ] called the mega-trend diffusion technique. This proposed technique applied to little collected informations sets extracts more utile information that can so be used for the procedure of categorization acquisition. Related research that uses the MTD technique for categorization acquisition have shown promising consequences sing the categorization public presentation. We hope to larn more about this technique and how it can be used with the NBC in this thesis.

The methods being studied in this paper is demonstrated utilizing a instance in- volving a determination to construct extra day care Centres in territory in the UK. This thesis begins by presenting the naif Bayesian classifier in Chapter 2, where a simple naif bays classifier illustration is presented as a segue to the formal definition of the NBC and jobs related to the classifier. Chapter 3 introduces the mega- tendency diffusor technique to turn to the issue of deficient sample informations, and the experimental survey utilizing the proposed methods on the instance of constructing extra day care Centres is described in Chapter 4. Last, reasoning comments are given in Chapter 5.

2

Chapter 2

Naive Bayesian Classifier

2.1 What are classifiers?

By definition, a category is a set or class of members that have belongingss or char- acteristics that are in common and are differentiated from others by these charac- teristics.

A classifier can be some set of regulations or an algorithm that implements clas- sification. It is a map that maps informations ( which exhibits properties ) to classs ( or categories ) . A few illustrations of the countless categorization methods includes: Sim- ple additive classifier, Logistic Regression, Naive Bayes ‘ Classifiers, Nearest Neighbour Classifier, Decision Tree Classifier, etc.

Classifiers are extensively used for informations excavation ; with applications in recognition blessing, fraud sensing, medical diagnosing, selling ( which we will see in this paper ) , etc. For illustration, classifiers are used to assist place features of prof- itable and unprofitable merchandise lines [ Amat, 2002 ] , in merchandise development determination [ Nadkarni and Shenoy, 2001 ] and in pudding stone variegation determination [ Li et al. , 2009a ] , etc.

A graphical reading

Here is a simple account of the nomenclatures and the thought of categorization through these three points: profiling, cleavage and categorization [ Amat, 2002, Section 3 ] .

3

aˆ? Profiling involves the designation of properties refering to the class ( or category ) .

i?±i?? Properties: –

i?? 1 ) No. of Wheels: 4 i?? 2 ) Color: Blue

i?? 3 ) Windows: Yes i??

## i?? .

## i?? .

Figure 2.1: Profiling.

aˆ? Segmentation is the procedure of placing categories in a given information. ( Figure 2.2 ) In this illustration we are sectioning vehicles into classs, we map { Bicycle, Tricycle, Car } a†’ { 1,2,3 } for easiness of calculation.

Figure 2.2: Cleavage.

aˆ? Classification is the procedure of delegating a given ( new ) input sample to one of the bing class. A classifier is a map that does this. A important point is that categorization assumes a cleavage already exist through a learning algorithm1. Here ( Figure 2.3 ) , new informations entry of a ruddy bike with its cardinal property of “ 2 wheels ” is classified to Category 1.

1see Dietterich [ 1998 ] for more on acquisition algorithms.

4

## i??i??i??

i??Figure 2.3: Categorization.

2.2 Bayesian statistics ( a insouciant debut )

Reverend Thomas Bayes ( ca. 1702-1761 ) from London, England is a mathemati- cian and Presbyterian curate ; whose bequest lives on in the renown theorem that bears his name: Baye ‘s Theorem. It is believe to hold stemmed from the chance paper, “ Essay Towards Solving a Problem in the Doctrine of Chances ” , published in 1763 by Richard Pierce [ Marin and Robert, 2007 ] . Now, Bayesian theorem serves as a cardinal component to statistical inferencing and statistical modeling, which have applications runing from market analysis to familial research.

Basically, Bayesian statistics provides a rational method of updating be- liefs ( chances ) in visible radiation of new grounds ( observations ) . It coherently combines information from different beginnings utilizing conditional chances through Bayes ‘ regulation:

P ( Cj|A ) = P ( Ao?°?Cj ) ( 2.1 ) P ( A )

P ( Cj ) P ( A | Cj )

= o?°?ni=1 P ( A | Ci ) P ( Ci ) ( 2.2 )

Here we have the posterior distribution P ( Cj|A ) , the anterior distribution P ( Cj ) and likelihood P ( A|Cj ) . Note that the denominator P ( A ) is an invariant across categories ( i.e. changeless, so does non depend on Cj ) , therefore it can be dropped as it does non consequence the terminal consequences of categorization, therefore:

P ( Cj|A ) a?? P ( Cj ) P ( A|Cj ) ( 2.3 ) Say Cj here refers to the Class J ( or class J ) and A refers to the set of properties

5

## i??i??

related to the Class ( i.e. it can mention to a vector of properties A = { A1, … , Ak } ) . Then Bayes ‘ regulation allows us to update our initial beliefs ( i.e. the anterior P ( Cj ) ) about the Class J, by uniting it with new information we gathered about the properties A that are related to the Class J. Resulting in the new belief of Class J, expressed through the posterior P ( Cj|A ) .

2.3 Why Naive?

With the premise of the characteristics ( properties ) being independent given its category, an NBC greatly simplifies the procedure of larning. Here, it is explained really merely. Let C be a binary random variable where

o?°ˆ1 Class 1

C=

0 Class 2

and A1, … , Ak be the set of forecaster variables ( i.e. properties ) . The cardinal point here is the simplistic ( naif ) premise that if the forecasters are conditionally independent given C, such that the joint conditional chances can be written as

K

P ( A1, … , Ak|C ) = o?°… P ( Ai|C )

i=1

so uniting this with Bayesian Theorem ( 2.5 ) , will take to the Naive Bayesian

Classifier ( here presented as odds )

P ( C = 1|A1, … , Ak ) = P ( C = 1 ) o?°…k P ( ai|C = 1 )

i??i??i??log gives

P ( C = 0|A1, … , Ak ) P ( C = 0 ) i=1 P ( ai|C = 0 )

P ( C = 1|A1, … , Ak ) P ( C = 1 ) o?°„k degree Fahrenheit ( ai|C = 1 ) log =log + log

i??i??i??P ( C = 0|A1, … , Ak ) P ( C = 0 ) i=1 degree Fahrenheit ( ai|C = 0 )

where degree Fahrenheit ( ai|C ) is the conditional denseness

of Ai. Figure 2.4 visualises the construction

of the NBC.

Naive Bayes ‘ classifiers are probabilistic, intending it chooses the category with the highest chance given the ( new ) information. It can besides be understood as NBC se- lecting the most likely category Classk, D given the properties a1, … , ak that best lucifers

6

i??Class

A1 A2 A3… Ak

Figure 2.4: Structure of Naive Bayes Classifier. Notice the absence of discharge between

i??i??i??i??i??i??i??attribute nodes meaning independence given category. preparation set D 2:

K Classk, D =arg soap P ( degree Celsius ) o?°…P ( ai|c )

( 2.4 )

ca??Class

I

P ( C = degree Celsius ) and P ( Ai = ai|C = degree Celsius ) needs to be estimated from a preparation data3. The conditional ( chance ) densenesss can be estimated individually by agencies of non- parametric univariate denseness appraisal ; and in this paper, m-estimation [ Cussens, 1993 ; Cestnik, 1990 ] is implemented in the illustration below. This would avoid joint denseness appraisal which is extremely unwanted particularly when the theoretical account have a big figure of forecasters. Furthermore, the fact that the densenesss can be estimated non-parametrically ; allows NBC to hold flexible and unrestricted modeling of the relationship between properties ( Ai ) and category ( C ) [ Larsen, 2005 ] .

The likeliness P ( ai|c ) is by and large a Bayesian appraisal. Execution of m-estimation [ Cussens, 1993 ; Cestnik, 1990 ] is used estimation this chance, where the prior is constraint to a beta distribution with m as the spread or discrepancy of the distribution.

P ( ai|c ) = kc +mA·P ( Army Intelligence ) n+m

2i.e. supervised larning [ ref. 10 ] with informations D such that cleavage ( page 4 ) exist. 3a set of informations with known categories, that are usually provided.

7

( 2.5 )

## i??i??

where

Ns: = figure of developing sample where Class = degree Celsius

kilohertz: = figure of developing sample where Class = degree Celsius and a = Army Intelligence

P ( Army Intelligence ) : = priori estimation of P ( ai|C = degree Celsius )

m: = an tantamount sample size ( see Cussens [ 1993 ] for more on m-estimate )

Here, the posterior distribution is so a beta distribution with the new discrepancy parametric quantity of n + m [ Cussens, 1993 ] .

2.3.1 A simple illustration

Say, for illustration, a motor insurance company is make up one’s minding whether to bear down more or less premium to a client having a peculiar Gray BMW MPV auto ; based on the category of the auto perchance being stolen or non. ( Note that the informations set does non incorporate such a auto. )

I ) The information set

Class

Table 2.1: Example preparation informations set for NBC theoretical account. [ Meisner, 2003 ]

two ) Using m-estimation

To cipher the buttocks ( 2.4 ) we need the conditional chances of properties Gray, BMW and MPV each conditioned on the two categories. That is P ( Gray|Yes ) , P ( BMW|Yes ) , P ( MPV|Yes ) , P ( Gray|No ) , P ( BMW|No ) and P ( MPV|No ) . Then

8

i??i??i??i??i??Attribute 1 Attribute 2 Attribute 3

Brand Color Type

i??Toyota Gray Toyota Blue Toyota Blue

BMW Gray BMW Gray BMW Gray Toyota Gray

Toyota Blue BMW Blue BMW Blue

Sports Sports MPV Sports Sports Sports MPV MPV MPV Sports

## i??i??i??i??i??

multiplying these conditional chances with P ( Yes ) and P ( No ) severally by Bayes ‘ regulation to obtain the posterior chances.

From the information set, there are 5 entries where cj = Yes ; and in 2 of these entries a1 = BMW. So for P ( BMW|Yes ) this implies n = 5 and kc = 2. We construct the undermentioned tabular array for easy referencing:

2.4 Problems making an optimum NBC.

Despite holding the advantages of being fast to develop and sort, robust to noisy informations [ Yang and Webb, 2002 ] and non sensitive to irrelevant properties ; NBC have its drawbacks for its ( I ) inability of managing non-parametric uninterrupted properties, and

11

( two ) lessening in categorization truth when attributes exhibits non-independancy.

In the proceedings, Martinez-Arroyo and Sucar [ 2006 ] proposed two methods which trades with these two issues. These methods, viz. discretisation and structural betterment, reduces categorization mistake and leads to an optimum NBC. In this paper, some of the method are implemented subsequently and combined with mega-trend diffusor in informations analysis.

2.4.1 Discretisation

An property can take either categorical ( Yes or No ) or numerical ( 3.456, etc. ) value. For a categorical ( category ) property, the values are distinct and hence can be used to develop classifiers without the demand of any alteration. On the other manus, values of numerical properties can be either discrete or uninterrupted [ Yang and Webb, 2002 ; Johnson, 2009 ] . The numerical properties are converted, or discretised, to a categor- ical one irrespective of the numerical property being distinct or uninterrupted. This helps better the public presentation of categorization as this preprocessing by discretisa- tion allows numeral properties to presume a normal distribution. Meaning for each numerical property Ai, a categorical property Aa?-i is created, with the value of each Aa?-i corresponding to an interval ( xi, yi ] of Ai. Then Aa?-i is used for developing the classifier, alternatively of Ai.

Information which are capable to uninterrupted measurings can be placed into distinct categories for convenience [ Green, 1964 ] . When properties were discre- tised, public presentation of the NBC is found on norm to somewhat surpass other categorization algorithm such as C4.54 [ Dougherty et al. , 1995 ] . This is due to the fact that attributes that are discretised maximises category anticipation. Hence, a sim- pler categorization theoretical account can be achieved and irrelevant properties will show itself ( though NBC still performs good with irrelevant property, its unneeded presence merely takes up little computational power ) .

A paper that have comparative surveies of a figure of discretisation methods for NBC can be found by Yang and Webb [ 2002 ] . In which, comparative surveies are done for nine discretisation methods such as, fuzzed discretisation ( FD ) , lazy discretisation ( LD ) , weighted relative k-interval discretisation ( WPKID ) etc.

4a statistical classifier that generates determination trees for categorization. C5.0 is an betterment of C4.5, but it is commercial and have non yet been extensively usage or compared with in many research, unlike C4.5.

12

## i??

Yang and Webb [ 2002 ] besides propose a new discretisation method called weighted non-disjoint discretisation ( WNDD ) which is a combination of other discretisation method that reduces categorization mistake.

However, Yang and Webb [ 2002, Section 3 ] argues that discretisation meth-

Doctor of Optometries of pure intervals might non be suited for NBC because the NBC already assumes conditional independence between properties ; and therefore does non utilize at- testimonials combination as forecasters. It is suggested that the categorical Aa?-i be substi- tuted for the numerical Ai to ensue in an accurate appraisal of P ( C = c|A = { a1, … , ak } ) . Keeping this in head, in this thesis the discretisation method called Minimum Description Length ( MDL ) rule, as suggested and used by Martinez-Arroyo and Sucar [ 2006 ] in their paper. The MDL rule will be employed to preprocess informations which is used concept a more optimum NBC. A brief debut to MDL rule

will be stated here, whereas its use in this paper can be found in the ulterior chapters.

A simple account for MDL rule can be noted from this quotation mark: ‘ [ The MDL Principle ] is based on the undermentioned penetration: any regularity in a given set of informations can be used to compact the informations, i.e. to depict it utilizing fewer symbols than needed to depict the informations literally. ‘ – Gru I?nwald [ 2005 ] . Introduced in 1978 by Jorma Rissanen, the MDL rule is a strong method of inductive illation that plays an of import construct for information theory, pattern categorization, and ma- chine acquisition. Being particularly applicable for managing anticipations and appraisal jobs, peculiarly with state of affairss where the considered theoretical accounts may be complex such that over adjustment of the information is a affair of concern ; we will use this method as a footing for the discretisation method used for this research.

However, this paper will non travel into great inside informations about the MDL rule as it forms another subject of its ain. Readers who are interested in comprehen- sive derivation of this method can happen them in the article by Gru I?nwald [ 2005 ] . In kernel, we learn the informations utilizing a mark based on the MDP rule, and utilize them for constructing our NBC. Note that from now on, MDLP is synonymous with MDL Principle.

In R the bundle “ discretization ” contains such a map to implement MDLP and it can be called in the programme utilizing the undermentioned bid lines:

& gt ; install.packages ( “ discretization ” )

& gt ; library ( discretization )

13

& gt ; mdlp ( informations )

Where informations here is the dataset matrix to be discretised.

2.4.2 Structural Improvement

Although in this paper structural betterment will non be used to modify the NBC in the information analysis subdivision, for ground apparent subsequently, it will be highlighted briefly to present the reader to the plausibleness of building a classifier which deals with dependent properties.

As highlighted in Chapter 2, the NBC assumes the properties to be inde- pendant from each other given the category. In world, this may non ever be true. There are two workarounds to this, as Martinez-Arroyo and Sucar [ 2006 ] brought to indicate. The first option is to link the dependent properties with directed discharge. This would take to a NBC Extension that is the Bayesian Network Classifiers ( BNC ) [ Baesens et al. , 2004 ] . In the paper by Ong, Khoo, and Saw [ 2012 ] , they proposed a similar method by structural larning utilizing a hill-climbing algorithm ; that was used to place dependancies and causal relationships between variables. They besides included consequences from accuracy trial of the modified classifier. Figure 2.5 is an illustration that visualises this workaround.

Class

A1 A2 A3… Ak

Figure 2.5: Modified naif Bayes classifier with discharge introduced between property

A2 and A3 to mean dependancy or causal relationship.

However a disadvantage is that this consequences in the doomed of the simplistic mod- elling of a NBC, in exchange of a more complicated BNC. The 2nd method is to transform the construction but still

keeping a NBC structured web. Three ba- sic operations can be used here [ Martinez-Arroyo and Sucar, 2006 ; Succar, 1991 ] : 1 ) property riddance, 2 ) unifying two or more dependent properties, 3 ) presenting an extra property doing two dependent properties independent ( as a hidden

14

## i??i??i??i??i??i??i??i??i??

node ) . These ( except 3 ) operations are illustrated in Figure 2.6 and 2.7.

Class

A1 A2… Aka?’1 Figure 2.6: NBC with attribute A3 eliminated.

Class

A1, A2 A3… Ak Figure 2.7: NBC with two properties combined into one variable.

In this option, riddance of otiose properties is done if those at- testimonials is seen to be below a threshold and hence does non supply any extra common information between the property and category. Then the other properties are examined for conditional common information ( CMI ) [ Fleuret, 2004 ] given its category for each brace of properties. High CMI value implies dependence ; and these one of these properties are either eliminated or merged to organize a individual property [ Martinez-Arroyo and Sucar, 2006 ] . This is repeated until there exist no unneeded or dependent properties.

These two methods can be used after the preprocessing by discretisation. In this paper, the dataset obtain is assumed to hold non reliable variables and hence will non necessitate structural betterment for the NBC used.

15

## i??i??i??i??i??i??i??i??i??i??i??i??

2.5 Formal definition

In the old subdivisions and subdivisions, the naif Bayes classifier was informally introduced. In this subdivision we officially defined the NBC, as in Rish [ 2001 ] .

Let A = ( A1, … , Ai ) be a vector of ascertained random variables and name it attributes. Each property have values from the sphere Di. Then the set of all at- testimonial vectors is I© = D1 A- … A- Dn. Let the category be denoted by C, an unseen random variable. C can take one of K values, i.e. c a?? { 0, … , k a?’ 1 } . Let capital letters denote the variables, while lower-cases denote the corresponding values ( i.e. Ai = Army Intelligence ) . Besides, as per convention, bold letters denotes vectors.

A classifier is defined as a map H ( a ) : I© a†’ { 0, … , k a?’ 1 } that assigns a category degree Celsius to given informations with attribute vector a, where H ( a ) = C. Commonly each category J is associated with a discriminant map fj ( a ) where J = 0, … , k a?’ 1 where the classifier will choose the category with the maximal discriminant map for a given sample such that H ( a ) = arg maxja?? { 0, … , ka?’1 } fj ( a ) .

The divergence from a normal classifier is that a Bayesian classifier uses the

category posterior chances as the discriminant map, intending if we denote a

Bayesian classifier as ha?- ( a ) so fja?- ( a ) = P ( C = j|A = a ) . Using the Bayes regulation we

acquire that P ( C = i|A = a ) = P ( A=a|C=j ) P ( C=j ) and as earlier, we can disregard P ( A = a ) P ( A=a )

as it is a changeless to give fja?- ( a ) = P ( A = a|C = J ) P ( C = J ) . This implies that the Bayesian classifier ha?- ( a ) = arg maxja?? { 0, … , ka?’1 } P ( A = a|C = J ) P ( C = J ) looks for the maximal a-posterior chance hypothesis given the sample attributes a.

Unfortunately, as we dwell into higher dimensional characteristic infinite, straight

gauging the category conditional chance distribute P ( A = a|C = J ) from a given

sample set becomes boring or hard. Therefore, we assume for simplification that

the characteristics are independent given the category. This estimate for the Bayesian

classifier yields the Naive Bayesian Classifier with discriminant map fNB ( a ) = J

o?°‚ni P ( Ai = ai|C = J ) P ( C = J ) . Hence we have the naif Bayesian classifier that finds the maximal a-posterior chance given the sample attributes a, with the premise that the properties are independent given the category.

N

hNB ( a ) = arg soap o?°…P ( Ai = ai|C = J ) P ( C = J ) ( 2.6 )

ja?? { 0, … , ka?’1 }

I

16

## i??

Chapter 3 Mega-Trend Diffuser

3.1 Introduction to the MTD

Insufficient informations in concern intelligence strains uncertain cognition which can take to hapless determination devising and potentially high losingss for a company or an establishment. To do affairs worse, roll uping sufficient informations can incur big disbursals to a com- pany in footings of clip and money. In some instances, deriving adequate existent informations is non ever possible. Consequently the gettable informations may really frequently be uncomplete as a preparation set for classifiers ( or other theoretical accounts for that affair ) .

In this chapter, a proposed method by Li, Wu, Tsai, and Lina [ 2007a ] called the mega-trend diffusion ( MTD ) technique is introduced as a tool to turn to the issue of deficient informations for developing categorization theoretical accounts. For our instance, it is to developing the naif Bayes classifier. In Li et Al. [ 2007a ] the MTD is used to bring forth unreal samples from little datasets to help the acquisition of a modified back propaga- tion nervous web ( BPNN ) for a early fabrication system. The consequences obtained by Li et Al. [ 2007a ] were assuring as the unreal sample set, in add-on to the available preparation set, significantly improved the larning truth in the simulation theoretical account ; even with really little dataset.

Li, Lin, and Huang [ 2009a ] besides explored the building of a selling determination support system ( DSS ) utilizing the MTD in a instance survey of gas station di- versification in Taiwan. In their instance survey, the MTD is used in concurrence with the BPNN and Bayesian web ( BN ) . The MTD explored extra hidden informations related information which were non explicitly available from the dataset itself. This allowed the constructing of a flexible and enlightening DSS given a little dataset,

17

doing it possible for marketing directors to hold a better overview of the market ; to happen possible niche markets they can venture into while avoiding unprofitable 1s.

Two reverses of holding little datasets are the spreads of spares informations and iden- tifying a tendency. The MTD addresses this issues by make fulling the spreads and gauging the tendency in the information. It does this by presuming the unreal samples selected are located within a certain scope and have possibility indexes that are calculated from a rank map ( MF ) [ Li et al. , 2009b ] . Simply put, this method extracts more information from available informations by making new relevant properties. In this paper, we venture into the possibility of incorporating the MTD with NBC to organize a categorization theoretical account which has reduced appraisal mistakes ( therefore better calculating preciseness ) for instances where the collected information is deficient.

3.2

Constructing the MTD

1 Membership Function Membership Value

a min I?set soap B

Figure 3.1: The mega-trend diffusion ( MTD ) technique.

i??i??i??i??i??i??In the MTD technique, the expected population mean is assumed to be located between the lower limit and upper limit of the sample dataset. Then, it is natural to anticipate the true population mean to be located within a wider boundary ( a, B ) , which is bigger than the boundaries [ min, max ] of the collected information, as shown in Figure 3.1. Consequently, a sufficient sample set would hold data-points which are distributed within this larger scope of ( a, B ) . Parameters a and B are calculated utilizing diffusion maps defined below.

3.2.1 Specifying parametric quantities a and B

In the first measure of implementing the MTD, allow min and soap be defined as the min- imum and maximal values of the dataset severally. Then I?set is the norm of the min and soap. NL and NR refers to the figure of informations smaller and larger than I?set severally ; connoting the dataset have NR +NR data-points. Then SkewL and

18

SkewR are the ratio of data-points that are smaller than and larger than or equal to I?set severally. In MTD technique, this method is a simplified manner of calcu- lating the lopsidedness of the informations in distribution. Furthermore the collected datasets are normally little and a sample arrested development line is likely undependable [ Li et al. , 2007a ] . We so have parametric quantities a and B, the possible lower bound and the possible upper bound severally, calculated as follow:

i??Equations 3.1 and 3.2 are known as the diffusion maps for puting a and B, severally. The natural logarithm ln ( a?- ) signifies the plausibleness of the population mean being located within the boundary ( a, B ) ; while 10a?’20 is a really little value connoting how impossible it is for the mean to be located at the boundaries [ Li et al. ,

19

2009a ] . Note that utilizing 1 imply the most likely point, like I?set. 0 can non be used as negative eternity would do the statement shut-in. Another point to observe is that as NL and NR tends to a?z ( i.e. connoting that sufficient information has been collected ) , value of a will be more than min while B will be less than soap. Intuitively this is wrong, as holding sufficient information we would anticipate a larger diffused scope ( a, B ) . Hence if this occurs, values of a and B are assigned to min and max severally, as in equation 3.1 and 3.2.

3.2.2 Specifying the rank map

1

a Bi I?set B

Figure 3.2: The trigon rank map. Here rank value Mi w.r.t

random figure Bi reflects the possibility of Bi happening.

The MTD technique assumes that existent information is likely to be distributed in the ( a, B ) scope. So the following measure in the technique is to make unreal datasets by indiscriminately trying from this wider bounded scope. This random trying would add extra informations, a random figure Bi, to the collected dataset to increase the sample size. This random figure between a and B is selected from a unvarying distribution. A alone characteristic of the MTD technique is that a rank map ( MF ) is employed to do the extra informations more utile. This MF produces the rank values that indicates the possibility of Bi being the true mean. In this paper, for simpleness we are utilizing a triangle-shaped map, as shown in Figure 3.1, nevertheless the MF can be any other uni-model map that shows diminishing possibility [ Li et al. , 2009a ] . Let the rank map Mi of the random figure Bi be defined as:

## i??i??

A point to observe is that each category J has its ain MF for each property I. This is clearly represented in Figure 3.3. Where Mij is the MF of category J with regard

to impute I. This so brings about the inquiry of imbrication of the country under the MF of each category with regard to the same property. Though in this paper this issue of convergence is non addressed in our instance survey, because the extra stairss makes treating more complicated. Here we explain really merely the significance of high and low convergences. If the country of the imbrication of two MF map of two categories in property I is low, this implies that property I is an enlightening categorization index [ Li and Liu, 2012 ] as any datapoint with property I can easy be classified to the right category. Conversely, if the overlapping country is high, the opportunities of sorting the datapoint to the right category is decreased. Should the reader want to larn more about the issue, the paper by Li and Liu [ 2012 ] explains the jobs faced and proposed solutions to build properties in the MTD technique ; and besides includes consequence of the modified MTD technique on difference classifiers.

Membership Value

1

property I.

Coming back to the MTD technique, after specifying the wider scope ( a, B ) and the MF for each category J with regard to impute I, these practical sets ( synonymous with unreal set ) are constructed by bring forthing more random Numberss Bi for each category J utilizing their several MF as explained above. The figure of practical samples needed may change for different categorization methods and requires farther research ( by and large, 100 unreal samples woulds suffice to develop the NBC classifier ) . After building the practical set and its corresponding rank values, the member- ship value of the preparation set is so computed. Then the practical sets and developing sets, along with their rank values, is used to develop the NBC.

21

i??i??i??i??i??i??Class 1

Mi1

i??i??i??Class 2

Mi2 22

B Attributei Figure 3.3: The rank map Mi1 and Mi2 of category 1 and 2 severally in

i??a1 I?Ai a2 B I?Ai

Chapter 4

The instance of constructing extra Day Care Centres

4.1 Introduction to data

The informations used in this survey is from the faculty IB3A70 The Practice of Operational Research, which is provided by the County Councils Network ( CCN ) , a Particular Interest Group within the Local Government Association ( LGA ) . The CCN is re- sponsible to do certain that financess from the authorities are distributed reasonably among the District Authorities to facilitated twenty-four hours to twenty-four hours services. They are peculiarly acute to turn to territories which are prone to hold high demand for bringing of their services. Their services may include transit service between houses and twenty-four hours attention Centres, twenty-four hours attention services in patients ‘ place and besides services in the Centres itself.

They wish to be able to place territories that will potentially be in high demand for their twenty-four hours attention services, from studies and surveies done on the popula- tion of a peculiar territory. The thought is to construct extra twenty-four hours attention Centres, where required, in these possible territories so as to forestall a backlog when a rush of demand occurs, as this would ensue in even higher cost, overworked workers and unhappy population. The usual jobs faced is that roll uping sufficient informations of the territories for every territories in the state requires a batch of man-hours and attempt. To assist undertake this issue and assist them in their determination procedure, this paper suggest the usage of statistical methods such as the naif Bayes classifiers and mega-trend diffusion technique for their determination support system.

The information provides a sample of 72 territories in the UK, out of the entire 352

22

territories, which we are interested in sorting the territories based on the properties that corresponds to the demand for twenty-four hours attention service. To day of the month, the 72 territories have been through studies and client reappraisals that lead to them being categorised as territories with low, impersonal or high demand for twenty-four hours attention service. The CCN hopes to be able to sort the other 281 territories with the limited sum of information they are able to roll up with the limited sum of clip they are allocated, so that immediate action can be taken as edifice more daycare Centres can take months.

This paper aims to use what has been learnt in the last few chapters on naif Bayesian classifiers, discretisation utilizing minimal description length rule and the mega-trend diffusor technique. In this chapter, the first subdivision 4.2 will travel through the methodological analysis used to implement the NBC, MDL rule and MTD for the information set. Then in the following subdivision 4.3, the computed consequences are presented and discussed. Lastly, some issues that were encountered during this research is highlighted in the concluding subdivision 4.4.

4.2 Methodology

This paper has therefore far proposed naif Bayesian classifier as a categorization theoretical account to be used as an assistance in marketing determination procedure of companies and or authoritiess. We besides introduced the MTD engineering as a solution to the job of limited sample sets, by increasing sample sizes through practical sets ; while besides duplicating the figure of properties by including matching rank values for each properties. The generated sets in bend is used to develop the NBC. Additionally, we considered the discretisation of uninterrupted variables utilizing the MDL rule as an intermediary measure in bettering the NBC. Therefore these were following stairss taken in our informations analysis:

1. Low-level formatting: ( a ) Importing information. ( B ) Discarding excess factors. ( hundred ) Load- ing NBC and MDL bundles in R. ( vitamin D ) Specifying maps for MTD in R.

2. Natural Evaluation: ( a ) Define the preparation set. ( B ) Use preparation set to develop the NBC. ( degree Celsius ) Test NBC on full dataset and show consequences.

3. Implement MDLP and evaluate: ( a ) Discretise preparation set utilizing MDLP ( B ) Use discretised preparation set to develop the NBC. ( degree Celsius ) Test NBC on full dataset and show consequences.

23

4. Implement MTD and evaluate: ( a ) Create practical samples from developing set utilizing MTD technique ( B ) Use practical sets and preparation set with their mem- bership values to develop NBC. ( degree Celsius ) Test NBC on full dataset and show consequences.

5. Implement MTD so use MDLP and evaluate: ( a ) Create practical samples from developing set utilizing MTD technique ( B ) Discretise both practical and developing sets utilizing MDLP ( B ) Use discretised practical sets and preparation set with their rank values to develop NBC. ( degree Celsius ) Test NBC on full dataset and show consequences.

In the undermentioned subdivisions, we describe the above stairss in item.

4.2.1 Low-level formatting

This first measure is to import the information set into the application R for constructing the classifier. In entire, the informations that have been collected containsentries for 72 territories with 5 properties and a corresponding category ( excepting the territory names ) . These 5 properties are believed by the CCN to be the outstanding forecasters for the demand of a territory. We considered 20 random informations points to be the preparation set for the classifier and the staying 41 informations points as the trial set to analyze the effectivity of the classifier.

The original informations collected included of import every bit good as redundant inside informations of each territory. Through rational analysis and treatments with the determination shapers [ Gallic et al. , 1997 ] , excess factors are identified and discarded, so in Table 4.1 we list the properties that are believed to be the best Tell narrative marks of a territory potentially holding a high, impersonal or low demand for twenty-four hours attention service.

Attribute Description

Demand Indicates of the possible demand of a territory. ( Class )

A1 Elderly Population 65+

A2 Sum of Radial Distance

A3 Sum of Nearest Neighbour distance

A4 Daycare calls per twenty-four hours

A5 Potential Clients Need Index

Table 4.1: List of chief factors found in probe which helps place possible demand for twenty-four hours attention services for each territory.

24

## i??i??i??

An abstract of the 20 informations preparation set from the collected information for this instance is shown in Table 4.2. The dataset contains 5 uninterrupted numerical properties, including A1, … , A5. The preparation set besides includes a characteristic property “ Service Demand ” ( or Demand ) that measure the possible demand for daycare services of a peculiar territory ; High significance high possible demand, Low means low possible demand, and Neutral means neither high or low demand is expected. Since the CCN is paying attending to the possible demand of a territory, we will specify “ Demand ”

as the categorization variable ; while the other properties as the variables.

input or forecaster

In this paper, the statistical plan R is used throughout for most parts

of the computation and preparation of the NBC, MPL rule and MTD. Software packages for the naif Bayesian classifier every bit good as the minimal description length rule can be found online at hypertext transfer protocol: //cran.r-project.org/web/packages/e1071/index.html ( e1071 bundle ) and hypertext transfer protocol: //cran.r-project.org/web/packages/discretization/index.html ( discretisation bundle ) , severally. Since they are readily available, it is un- necessary to code new maps in R for the NBC or MDL rule as the available maps are sufficient. As mentioned before in Chapter 2, these bundles are loaded utilizing the undermentioned bid lines in Roentgen:

The mega-trend diffusor technique is comparatively new and has yet to be widely used for developing NBC with little informations sets, peculiarly for selling determinations

25

or intents, and there is no freely available bundles for R online at the minute. Hence, new maps have to be coded in R utilizing algorithms for MTD technique ( more in the following subdivision ) .

Now that we have prepared the informations, developing sets and relevant maps required to run the NBC, MTD and MDL rule in R, we can continue to construct the NBC utilizing the preparation set. We study the effectivity of the NBC on the full dataset when the NBC is trained utilizing four different attack: ( 1 ) Raw and unmodified preparation set is used to trian the NBC. ( 2 ) Merely MDLP is applied to discretise the preparation set and this is used to develop the NBC.

( 3 ) Merely the MTD technique is used on the preparation set to make unreal sets and new properties ( rank values ) that is so used to brian the NBC. ( 4 ) Both MTD and MDLP is implemented ; where MTD is foremost used to make practical sets and new properties, so MDLP is applied on the ensuing preparation set which is so used to develop the NBC.

4.2.2 ( 1 ) Raw rating utilizing NBC

In this measure the 20 indiscriminately chosen preparation informations is used to develop the NBC, without any use of the information. The preparation informations train.dcs.dat is input into the map naiveBayes which computes the conditional posterior chances of the category variable ( Service Demand ) given the forecaster variables ( A1, … , A5 ) utilizing Bayes regulation ( 2.5 ) . In Table 4.2 we see the preparation set with three category subsets: high, impersonal, low. The map predict can be called to foretell the category of the information points in the preparation and trial informations in dcs.dat ( contains 72 entries ) . The end product is a matrix which displays the figure of informations points that have been predicted by the NBC and corresponds to the existent resulting category, the experimental consequences are presented in subdivision 4.3.

For this measure, the preparation informations is modified utilizing the MDL rule. The thought in MDL rule is that regularity found in the information is used to compact the informations [ Gru I?nwald, 2005 ] . MDLP merely states that the best hypnotises is the 1 with minimum description length [ Kotsiantis and Kanellopoulos, 2006 ] . One may happen

26

entropy minimization discretisation ( EMD ) method synonymous with the minimal description length rule. This is because MDL rule being a information based method that uses binary discretisation ; determined by taking a cut point for which the information is minimal. The method considers a big interval that contains all the known values of an property and binary discretisation is applied recursively to partition this interval into smaller sub-intervals, ever choosing the cut point with minimal information, until a halting standard such as the MDLP ( or after accomplishing an optimum figure of intervals ) . The MDL step we are utilizing is incorporated in the map mdlp in R. This map is called upon to discretise the continued properties in the preparation informations matrix “ train.dcs.dat ” utilizing the entropy standard ; where the minimal description length standard is used as the fillet regulation to hold the discretisation procedure [ Yang and Webb, 2002 ] . Table 4.3 shows the discretised uninterrupted property in the preparation set. The ensuing discretised preparation dataset is so used for categorization acquisition of the NBC, as earlier.

The handiness of the mdlp map is convenient, as we do non necessitate to compose a new map to run discretisation. However, the draw back to utilizing a

27

readily available package bundle is that we are limited in flexibleness. The map does non let users to order the figure of bins that the MDLP algorithm splits the information into, this may present an issue in our survey and will be highlighted in the ulterior subdivision. A walk around may be to compose new maps that would suit our demands. However, this may be clip devouring and we are presuming that the mdlp map available is sufficient for this paper. Further research may be required to invent new maps that are tailored to specific demands.

4.2.4 ( 3 ) Implementing MTD merely

For this measure, we implemented the MTD technique described in Chapter 3 on the preparation informations and without discretisation. As in Chapter 3, we require the estima- tion of the sphere scope ( a, B ) of each property with regard to each category, so that ns figure of samples can be indiscriminately produced within this scope to be defined as the practical set. However, it is non right to develop the NBC straight with these practical sets because they are indiscriminately chosen ( or from unvarying distribution ) and does non stand for the complete information of the property informations distribution [ Li et al. , 2007a ] . Hence, we require a rank map ( or diffusion map ) Mi which computations for each of the random samples its matching rank value that reflects the significance of each randomly chosen sample. In add-on to the practical sets, rank values are besides calculated for the preparation informations. The elaborate stairss for ciphering sphere scope ( a, B ) and the rank map Mi can be found in Chapter 3 and therefore will non be reiterated here.

As highlighted above, the MTD technique is comparatively new and hence no freely available package bundle for R can be found that runs the algorithm of the MTD technique. Hence, we are required to code our ain map that implements the algorithm for the MTD technique, as explained in Chapter 3. The preparation of the chief maps that were coded for usage in this paper can be found in Appendix A. These maps are:

& gt ; MTD.mem.for.training.dat.w.class ( informations )

& gt ; MTD.final ( informations, gen.num=1 )

Where information is the data.frame matrix that we wish to use the MTD technique to, which will bring forth rank values for matching properties. While gen.num is the figure of practical information points ns that will be generate for each category in the data.frame, with matching rank value. The difference between the two maps is that the first map will non bring forth extra practical information points,

28

where as the 2nd map will.

Though farther research is required, from assorted published documents and through experimentation, approximately 100 practical samples for each category would do to develop the NBC. Consequently, we generate 100 random unreal informations along with its matching rank values. The rank value of the preparation sets are besides calculated and both the preparation sets and practical sets are so input into the NBC map for categorization acquisition. Once done, we produce a soprano of the anticipation consequences and comparison with existent category, as earlier. An abstract of the created practical dataset can be seen in Table 4.4.

4.2.5 ( 4 ) Implementing MTD so using MDLP

This measure combines stairss 3 and 4 of using the MTD technique on the preparation set and so utilizing the MDL rule to discretise the information. The ensuing modified information is so used for categorization acquisition as in measure 2. Note here that because the rank values M V Ai computed are in the scope [ 0, 1 ] , we do no use dis- cretisation on such a little interval as the discretisation will do these rank values redundant. In order to calculate the coveted consequences, we need to merely discretise the uninterrupted property Ai and leave the rank values MVAi unmodified. To calculate this measure, we use the aid of the map zipFastener found on-line [ Appendix B ] and compose a map mdlp.dat.only.

In the original dataset, there are 25 informations points that belongs to the Class High, 22 informations points to Class Low, and 25 informations points to Class Neutral. The consequences shows a little betterment of the categorization public presentation when discretisation of the variables is applied and the mega-trend diffusion technique is utilized. The av- erage truth of the NBC without any alteration of the properties or extra practical datasets is found to be 79.45 % . The mean truth once discretisation by MDLP is applied increased to 80.96 % while using MTD technique consequences in mean truth of 80.24 % . Base this experimental consequences, the CCN can utilize the NBC to sort other territories in the UK to place countries where they should take immediate action to construct more daycare Centres or supply with extra staff, etc. before the demand escalates, taking to troubles fulfilling them. The MDLP technique seems to be utile when the information collected is legion or complicated, as the MDLP would compact the informations to manageable graduated table for developing the NBC. Not merely will it does this accelerate the theoretical account build clip, it besides helps better the categorization truth slightly. Additionally we have seem that the MTD tech- nique have improved the NBC compared to non using the MTD. The CCN will happen the MTD technique a utile tool when it comes doing determination based limited

sample informations, in this and other jobs.

Kaya [ 2008 ] studied popular discretisation techniques in his paper, one of which involves a similar information method to MDLP used in this thesis. Like our paper, Kaya [ 2008 ] applied the discretisation techniques on naif Bayes classifier and besides found that the theoretical account build clip and overall categorization truth showed important additions.

The paper by Li, Lin, and Huang [ 2009a ] compared the categorization accu- racy of two acquisition tools, viz. back-propagation web ( BPN ) and Bayesian webs, from the original dataset to the practical sets of difference sizes. Their consequences shows an overall betterment of categorization public presentation when these ar- tificial sample sets are utilized. For their experiment on BPN, from the original dataset size of 21 informations points to practical sets of size 84 led to increase in mean truth from 62 % to 90 % . For their trial on BN, mean truth of the classifi- cation procedure rose from 57 % to 94 % .

Base on the research in this and other related paper, we are inclined to hold that unreal sample sets do lend to the betterment of the categorization per- formance of the NBC. Though it is deserving observing that these extra information is non the same as the existent informations, but a generated informations drawn from the population estimation through the employment of the MTD technique.

However, we will notice from the last three columns of Table 4.5 that the experimental consequences for the concluding method of using the MTD technique and so the MDLP on the preparation set lead to instead flimsy consequences. This consequence and suspected grounds taking to it is further discussed in the undermentioned subdivision 4.4.

4.4 Issues encountered

As highlighted briefly in the old subdivision, the concluding method of using the MDLP and

so the MTD technique on the preparation sets returned anticipations that were evidently unacceptable. Some category anticipations were about 100 % incorrect and others were non far behind. Unfortunately, many yearss of debugging and proving on different datasets yielded similar wrong consequences. Correctionss and reviewing of the maps were made extensively, but we can merely impute the cause of this anticipation mistake to the MDLP map in R, every bit good as some possible undependability

32

issues of the MTD map written for this thesis.

We mentioned antecedently that the map for implementing the MDL prin- ciple is prewritten and this led to the job of flexibleness, in footings of the user non being able to specify the bin size required to compact the informations during discreti- sation. Because the map defines the figure of bin itself, this could hold led to an over-compression of the informations. Over-compressing the dataset can do a loss of critical information that is required for both the MTD technique and in the classi- fication larning procedure. This is suspected to the be cause of the inconsistent consequences.

Furthermore, the MTD map devised for this thesis have non been tested strictly prior to its usage in this survey. Although it performed without jobs on its ain, It may hold been that the map was non coded to manage the consequences from the MDLP as the input variable decently. Therefore we suggest that farther survey and proper rating of the maps is require, so that the maps performs decently when using both the MDLP and MTD methods together.

33

Chapter 5

Decision and farther research

In this thesis we studied how classifiers can be applied practically in the mar- keting context. Initially, we introduced the construct of a classifier and the Bayes theorem. Then showed how a normal classifier converts to a naif Bayesian clas- sifier when posterior chances is used as the discriminant map ; along with the of import implicit in premise that the properties are independent given the category. Although many surveies agrees that the NBC forms an efficient categorization theoretical account, we besides highlighted its drawbacks and looked into two methods of optimizing the NBC suggested by Martinez-Arroyo and Sucar [ 2006 ] viz. , discretisation and structural betterment. For the discretisation procedure we learn about the MDL rule, an information based method of compacting a set of uninterrupted informations base on any regularity, which we applied in our experimental dataset of this paper. We have besides briefly looked into methods for structural betterment of the NBC including riddance, combination and alteration of properties to turn to any dependan- cies.

Sample sets are critical for the categorization larning procedure ( or machine learn- ing for that affair ) in the DSS, so that administrations can to do rational and right determinations. Unfortunately deficient sample sets is a reoccurring job, particularly for little or new administrations. The grounds for such instances are normally associated with high cost and being clip devouring to garner sufficient informations. This issue was addressed in this thesis with the proposed usage of the mega tendency diffusion technique. The MTD technique is a systematic method of geting addi- tional “ concealed ” informations relevant information, which is non explicitly provided by the informations itself. This paper detailed the stairss of set uping the rank map and practical sets required in the MTD technique.

34

For this thesis the naif Bayes classifier, proposed method of discreti- sation and MTD technique is applied to a existent universe job of make up one’s minding to construct more twenty-four hours attention Centres in the territories of the UK. The chief part of this pa- per is to propose the integrating of MDLP discretisation method and ( or ) the MTD technique with the NBC to bring forth a simple classifier that works good with limited informations. The experimental consequences shows that using discretisation and the MTD technique ( independently ) on the sample set outputs assuring categorization consequences as compared to an unmodified sample set used to develop the NBC. In add-on to the experimental consequence obtained in this thesis, we besides featured consequences from other research paper of similar nature.

We did, nevertheless, encounter displeasing consequences ( as shown in the last three columns of Table 4.5 ) when we use generated practical set from the method that combines both the MDLP and MTD technique to develop the classifier. The initial diagnosing of this job suggest that the job may be due to the map that implements the MDL rule. It was believed that the readily available MDLP map was non flexible, in the sense that limited input variables were allowed. This meant that we were merely able to input the dataset needed to be discretised, but were unable to specify the bin size which the algorithm splits the information into during the discretisation procedure. Consequently, the map decides the figure of bin it uses to divide the informations itself, which may hold led to over-compression of the informations and the doomed of relevant information. Additionally, the map for the MTD technique was merely formulated for usage in this research, and have non had the chance of traveling through strict testing or debugging. The MTD map was suspected to be incapable of managing the consequences from the MDLP decently, therefore the inconsis- tency in the consequences.

This paper have studied the execution of discretisation of properties for constructing the NBC ; every bit good as the employment of the MTD technique on limited sample sets for usage in NBC acquisition. But farther research is needed sing the integrating of both the discretisation and the MTD process. A reproduction of this research but with improved algorithms and maps for the calculation measure may take to better consequences. Alternatively, extra research is needed to analyze the possibility of implementing the MTD technique for usage with other construction of larning algorithms, apart organize the Bayesian classifier. In the selling sphere, where scarceness of informations required for determination devising poses an issue, the MTD tech-

35

nique might merely be the reply that selling directors and determination shapers need. Hence the MTD technique itself forms an interesting subject for farther survey.

36