Data excavation is a computerized engineering that uses different types of algorithms to happen relationships and tendencies in big datasets to advance determination support. The information sizes accumulated from assorted Fieldss are exponentially increasing, informations excavation techniques that extract information from big sum of informations have become popular in commercial and scientific spheres, including gross revenues, selling, client relationship direction, quality direction. The purpose of this paper is to judge the public presentation of different informations excavation categorization algorithms on assorted datasets. The public presentation analysis depends on many factors test manner, different nature of informations sets, and size of informations set.

that extract information from big sum of informations have become popular in commercial and scientific spheres, including selling, client relationship direction, quality direction. Classification maps informations into predefined categories, frequently referred as supervised acquisition because categories are determined before analyzing informations During the rating, the input datasets and the figure of classifier used are varied to mensurate the public presentation of Data Mining algorithm.

Datasets are varied with chiefly type of category property either nominal or numeral. I present the consequences for public presentation of different classifiers based on features such as truth, clip demand to place their features in a universe celebrated Data Mining tool-WEKA.

## II. Related Work:

I studied assorted articles sing public presentation rating of Data Mining algorithms on assorted different tools, some of them are described here, Osama abu abbas worked on constellating algorithm, and Abdullah compared assorted classifiers with different informations excavation tools, Mahendra Tiwari & A ; Yashpal Singh evaluated public presentation of 4 constellating algorithms on different datasets in WEKA with 2 trial manners. I presented their consequence every bit good as about tool and information set which are used in executing rating.

Osama Abu Abbas in his article “ comparing between informations constellating algorithms by Osama Abu Abbas ” compared four different constellating algorithms ( K-means, hierarchal, SOM, EM ) harmonizing to the size of the dataset, figure of the bunchs, type of S/W. Osama tested all the algorithms in LNKnet S/W- it is public sphere S/W made available from MIT Lincoln lab www.li.mit.edu/ist/lnknet. For analysing informations from different informations set, located at www.rana.lbl.gov/Eisensoftware.htm The dataset that is used to prove the bunch algorithms and comparison among them is obtained from the site www.kdnuggets.com/dataset. This dataset is stored in an ASCII file 600 rows, 60 columns with a individual chart per line.

## Relationship between figure of bunchs and the public presentation of algorithm

## The affect of informations type on algorithm

Mahendra Tiwari & A ; Yashpal Singh in their article “ Performance Evaluation of Data Mining constellating algorithm in WEKA ” evaluated public presentation of 4 different clusterers ( DBscan, EM, Hierarchical, Kmeans ) on different datasets in WEKA with 2 trial manners ( Full preparation informations, & A ; Percentage split ) . They used 4 informations sets for rating with constellating in WEKA, Two of them from UCI Data depository that are Zoo informations set and Letter image acknowledgment, rest two labour informations set and supermarket informations set is inbuilt in WEKA 3-6-6.Zoo information set and missive image acknowledgment are in csv file format, and labour and supermarket informations set are in arff file format.

## Detailss of Datasets:

## Evaluation of clusterer on Letter image acknowledgment dataset

Evaluation of clusterer on Letter image acknowledgment dataset with Full preparation informations trial manner.

Evaluation of clusterer on Letter image acknowledgment dataset with Percentage split test manner.

## III. Data Mining Classification algorithm:

Categorization maps informations into predefined categories, frequently referred as supervised acquisition because categories are determined before analyzing informations. A categorization algorithm is to utilize a preparation informations set to construct a theoretical account such that the theoretical account can be used to delegate unclassified records in to one of the defined categories. A trial set is used to find the truth of the theoretical account. Normally, the given dataset is divided in to preparation and trial sets, with preparation set used to construct the theoretical account and trial set used to formalize it.

There are assorted classifiers are an efficient and scalable fluctuation of Decision tree categorization. The

Decision tree theoretical account is built by recursively dividing the preparation dataset based on an optimum standards until all

records belonging to each of the dividers bear the same category label. Among many trees are peculiarly suited

for informations excavation, since they are built comparatively fast compared to other methods, obtaining similar or frequently better

truth.

Bayesian classifiers are statistical based on Bayes ‘ theorem, they predict the chance that a record

belongs to a peculiar category. A simple Bayesian classifier, called NaA?ve Bayesian classifier is comparable in

public presentation to determination tree and exhibits high truth and velocity when applied to big databases.

Rule-based categorization algorithms generate if-then regulations to execute categorization. PART, OneR and ZeroR of Rule, IBK, and KStar of Lazy scholars, , SMO of Function are besides used in rating procedure.

## Evaluation Strategy/Methodology: –

## H/W tools:

We conduct our rating on Pentium ( R ) D Processor platform which consist of 1 GB memory, Windows XP professional operating system, a 160GB secondary memory.

## S/W tool:

In all the experiments, We used Weka 3-7-7, looked at different features of the applications utilizing classifiers to mensurate the truth in different informations sets, clip taken to construct theoretical accounts etc.

Weka toolkit is a universe celebrated widely used toolkit for machine acquisition and information excavation that was originally developed at the university of Waikato in New Zealand. It contains big aggregation of state-of-the-art machine acquisition and information excavation algorithms written in Java. Weka contains tools for arrested development, categorization, constellating, association regulations, visual image, and informations processing.

## Input Data sets:

Input informations is an of import portion of informations excavation applications. The informations used in my experiment is existent universe informations obtained from UCI informations depository, during rating multiple informations sizes were used, each dataset is described by the types of properties, type of category ( either nominal or numeral ) the figure of cases stored within the dataset, besides the tabular array demonstrates that all the selected informations sets are used for the categorization undertaking. These datasets were chosen because they have different features and have addressed different countries.

## Experimental consequence and Discussion: –

To measure the selected tool utilizing the given datasets, several experiments are conducted. For rating intent, two trial manners are used, the k-fold cross-validation ( k-fold curriculum vitae ) manner, & A ; per centum split ( holdout method ) manner. The k-fold curriculum vitae refers to a widely used experimental testing process where the database is indiscriminately divided in to k disjoint blocks of objects, so the information excavation algorithm is trained utilizing k-1 blocks and the staying block is used to prove the public presentation of the algorithm, this procedure is repeated thousand times. At the terminal, the recorded steps are averaged. It is common to take k=10 or any other size depending chiefly on the size of the original dataset.

In per centum split, the database is indiscriminately split in to two disjoint datasets. The first set, which the information excavation system tries to pull out cognition from called preparation set. The extracted cognition may be tested against the 2nd set which is called trial set, it is common to randomly divide a information set under the excavation undertaking in to 2 parts. It is common to hold 66 % of the objects of the original database as a preparation set and the remainder of objects as a trial set. Once the trials is carried out utilizing the selected datasets, so utilizing the available categorization and trial manners, consequences are collected and an overall comparing is conducted.

## Performance steps:

For each characteristic, I analyzed how the consequences vary whenever trial manner is changed. Our step of involvement includes the analysis of classifiers and clusterers on different datasets, the consequences are described in value of right classified cases & A ; falsely classified cases ( for dataset with nominal category value ) , correlativity coefficient ( for dataset with numeral category value ) , average absolute mistake, root mean squared mistake, comparative absolute mistake, root comparative squared mistake after using the cross-validation or Percentage split method.

Different classifiers like regulation based ( ZeroR, OneR, PART ) , tree based ( Decisionstump, J48, REP ) , map ( SMO, MLP ) , bayes ( NaiveBayes ) , Lazy ( IBk, Kstar ) are evaluated on four different datasets.

two datasets ( EdibleMushrooms & A ; LandformIdentification ) have nominal category value & A ; other two ( CPUPerformance & A ; RedWhiteWine ) have numeral category value.

Most algorithms can sort both types of datasets with nominal & A ; numeral category value. But there are besides some algorithms that can merely sort datasets with either nominal or numeral category value, such as Bayes algorithms able to sort datasets merely with nominal category & A ; Linear arrested development, M5Rules able to sort datasets merely with numeral category value.

For all datasets, consequences with both trial modes i.e. K-fold cross-validation & A ; per centum split are about same.

## Detailss of informations sets

All datasets have file type arff.

1. Dataset: EdibleMushrooms

Properties: 22 nominal,

Class: Nominal

Cases: 8124

This information set includes descriptions of conjectural samples matching to 23 species of branchiate mushrooms in the Agaricus and Lepiota Family. Each species is identified as decidedly comestible, decidedly toxicant, or of unknown edibleness and non recommended. This latter category was combined with the toxicant 1. There is no simple regulation for finding the edibleness of a mushroom ; no regulation like “ cusps three, allow it be ” for Poisonous Oak and Ivy.

Attribute Information:

1. cap-shape: bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken=s

2. cap-surface: fibrous=f, grooves=g, scaly=y, smooth=s

3. cap-color: brown=n, buff=b, cinnamon=c, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y

4. contusions? : bruises=t, no=f

5. olfactory property: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p, spicy=s

6. gill-attachment: attached=a, descending=d, free=f, notched=n

7. gill-spacing: close=c, crowded=w, distant=d

8. gill-size: broad=b, narrow=n

9. gill-color: black=k, brown=n, buff=b, chocolate=h, gray=g, green=r, orange=o, pink=p, purple=u, red=e white=w, yellow=y

10. stalk-shape: enlarging=e, tapering=t

11. stalk-root: bulbous=b, club=c, cup=u, equal=e, rhizomorphs=z, rooted=r, missing= ?

12. stalk-surface-above-ring: fibrous=f, scaly=y, silky=k, smooth=s

13. stalk-surface-below-ring: fibrous=f, scaly=y, silky=k, smooth=s

14. stalk-color-above-ring: brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e, white=w, yellow=y

15. stalk-color-below-ring: brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e, white=w, yellow=y

16. veil-type: partial=p, universal=u

17. veil-color: brown=n, orange=o, white=w, yellow=y

18. ring-number: none=n, one=o, two=t

19. ring-type: cobwebby=c, evanescent=e, flaring=f, large=l, none=n, pendant=p, sheathing=s, zone=z

20. spore-print-color: black=k, brown=n, buff=b, chocolate=h, green=r, orange=o, purple=u, white=w, yellow=y

21. population: abundant=a, clustered=c, numerous=n, scattered=s, several=v, solitary=y

22. home ground: grasses=g, leaves=l, meadows=m, paths=p, urban=u, waste=w, woods=d

Class: – Edible = Y, N

## 2. Dataset: LandformIdentification

Properties: 6 numeric

Class: Nominal – 15 values

Cases: 300

Missing Valuess: None

This dataset contains satellite imaging informations used for analyzing alterations in terrain characteristics on the Earth ‘s surface. The end is to correlate orbiter measurings with terrain categorization observations made by worlds on the land, so that alterations in terrain can be tracked via orbiter. The orbiter informations consists of numeral measurings of light strength at six different wavelengths, which form the dataset attributes. The dataset contains 300 pels of image informations, which form the 300 cases in the dataset. For the LandformIdentification dataset, the terrain categorizations are nominal values depicting 16 different terrain types.

Properties:

1. bluish 4. nred

2. green 5. ir1

3. ruddy 6. ir2

Classs:

1. Agriculture1 7. Deep water 13.Turf_grass

2. Agriculture2 8. Marsh 14.Urban

3. Br_barren1 9.N_deciduous 15.Wooded_swamp

4. Br_barren2 10.S_deciduous

5. Coniferous 11.Shallow_water

6. Dark_barren 12.Shrub_swamp

## 3.Dataset: CPUPerformance

Properties: 6 numeral, 1 nominal,

Class: Numeral

Cases: 209

Missing Valuess: None

This dataset associates features of CPU processor circuit boards, with the processing public presentation of the boards.

Properties:

1. Seller: Nominal, 30 seller names.

2. MYCT: Numeric, rhythm clip in nanoseconds, 17-1500.

3. MMIN: Numeric, chief memory lower limit in KB, 64-32,000.

4. MMAX: Numeric, chief memory upper limit in KB, 64-64,000.

5. CACH: Numeric, cache memory in KB, 0-256.

6. CHMIN: Numeric, channels minimum, 0-52.

7. CHMAX: Numeric, channels maximum, 0-176.

Class:

Performance: Numeric, comparative processing power, 15-1238.

## 4. Dataset: RedWhiteWine

Properties: 11 numeral, 1 nominal,

Class: Numeral

Cases: 6497

Missing Valuess: None

In the original signifier of this dataset, two datasets were created, utilizing ruddy and white vino samples. Here, these two datasets have been combined into one dataset. The inputs include nonsubjective trials ( e.g. PH values ) and the end product is based on centripetal informations ( median of at least 3 ratings made by wine experts ) . Each expert graded the vino quality between 0 ( really bad ) and 10 ( really first-class ) . The two datasets are related to ruddy and white discrepancies of the Portuguese “ Vinho Verde ” vino. Due to privateness and logistic issues, merely physicochemical ( inputs ) and centripetal ( the end product ) variables are available ( e.g. there is no information about grape types, vino trade name, wine merchandising monetary value, etc. ) .

Properties:

1 – fixed sourness, numeral

2 – volatile sourness, numeral

3 – citric acid, numeral

4 – residuary sugar, numeral

5 – chlorides, numeral

6 – free S dioxide, numeral

7 – entire S dioxide, numeral

8 – denseness, numeral

9 – pH, numeral

10 – sulfates, numeral

11 – intoxicant, numeral

12 – R/W, nominal – R= ruddy, W = white

Class:

quality ( mark between 0 and 10 ) .

## IV.Evaluation of classifier on Data sets:

I tried to measure the public presentation of assorted classifiers on two trial manner 10 fold cross proof and

per centum split with different informations sets at WEKA 3-7-7, The consequences after rating is described here: –

## Table 1: Evaluation of classifiers on Edible Mushrooms dataset with Cross-validation manner

Classifier theoretical account: – Full moon preparation set.

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correctly classified cases

## Falsely classified cases

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

Rules-ZeroR.

0 seconds

Cross-validation

4208/8124

( 51.79 % )

3916/8124

( 48.20 % )

0.4994

0.4997

100 %

100 %

Rules- PART

0.25 seconds

Cross-validation

8124/8124

( 100 % )

0/8124

( 0 % )

0

0

0 %

0 %

Decision tabular array

4.73 seconds

Cross-validation

8124/8124

( 100 % )

0/8124

( 0 % )

0.0174

0.0296

3.4828 %

5.916 %

Lazy- IBk

0 seconds

Cross-validation

8124/8124

( 100 % )

0/8124

( 0 % )

0

0

0.0029 %

0.003 %

Bayes- NaiveBayes

0.2 seconds

Cross-validation

7785/8124

( 95.82 % )

339/8124

( 4.17 % )

0.0419

0.1757

8.3961 %

35.16 %

Functions- SMO

13.23 seconds

Cross-validation

8124/8124

( 100 % )

0/8124

( 0 % )

0

0

0 %

0 %

Trees- DecisionStump

0.05 seconds

Cross-validation

7204/8124

( 88.67 % )

920/8124

( 11.32 % )

0.1912

0.3092

38.29 %

61.88 %

Trees- J48

0.06

seconds

Cross-validation

8124/8124

( 100 % )

0/8124

( 0 % )

0

0

0 %

0 %

## Table 2: Evaluation of classifiers on EdibleMushrooms dataset with Percentage split manner

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correctly classified cases

## Falsely classified cases

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

## Rules-ZeroR.

## 0.02 seconds

## Percentage split

## 1410/2762

## ( 51.05 % )

## 1352/2762

## ( 48.95 % )

## 0.4995

## 0.5

## 100 %

## 100 %

## Rules- PART

## 0.13

## seconds

## Percentage split

## 2762/2762

## ( 100 % )

## 0/2762

## ( 0 % )

## 0

## 0

## 0 %

## 0 %

## Decision tabular array

## 4.75

## seconds

## Percentage split

## 2762/2762

## ( 100 % )

## 0/2762

## ( 0 % )

## 0.02

## 0.03

## 4.04 %

## 6.17 %

## Trees- DecisionStump

## 0.02

## seconds

## Percentage split

## 2464/2762

## ( 89.21 % )

## 298/2762

## ( 10.78 % )

## 0.1902

## 0.3033

## 38.07 %

## 60.66 %

## Trees- J48

## 0.06

## seconds

## Percentage split

## 2762/2762

## ( 100 % )

## 0/2762

## ( 0 % )

## 0

## 0

## 0 %

## 0 %

## Functions- SMO

## 13.42

## seconds

## Percentage split

## 2762/2762

## ( 100 % )

## 0/2762

## ( 0 % )

## 0

## 0

## 0 %

## 0 %

## Bayes- NaiveBayes

## 0.02

## seconds

## Percentage split

## 2625/2762

## ( 95.03 % )

## 137/2762

## ( 4.96 % )

## 0.0485

## 0.1922

## 9.70 %

## 38.44 %

## Lazy- IBk

## 0

## seconds

## Percentage split

## 2762/2762

## ( 100 % )

## 0/2762

## ( 0 % )

## 0

## 0

## 0.005 %

## 0.006 %

## Table 3: Evaluation of classifiers on LandformIdentification dataset with Cross-validation

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correctly classified cases

## Falsely classified cases

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

## Rules-ZeroR.

## 0 seconds

## Cross-validation.

## 20/300

## ( 6.66 % )

## 280/300

## ( 93.33 % )

## 0.1244

## 0.2494

## 100 %

## 100 %

## Rules- OneR

## 0.02 seconds

## Cross-validation.

## 208/300

## ( 69.33 % )

## 92/300

## ( 30.66 % )

## 0.0409

## 0.2022

## 32.85 %

## 81.06 %

## Rules- PART

## 0.06 seconds

## Cross-validation

## 285/300

## ( 95 % )

## 15/300

## ( 5 % )

## 0.007

## 0.0809

## 5.6391 %

## 32.45 %

## Trees- DecisionStump

## 0.02 seconds

## Cross-validation

## 40/300

## ( 13.33 % )

## 260/300

## ( 86.66 % )

## 0.1157

## 0.2405

## 92.94 %

## 96.41 %

## Trees- J48

## 0 seconds

## Cross-validation

## 292/300

## ( 97.33 % )

## 8/300

## ( 2.66 % )

## 0.004

## 0.0596

## 3.17 %

## 23.90 %

## Functions- SMO

## 1.8 seconds

## Cross-validation

## 273/300

## ( 91 % )

## 27/300

## ( 9 % )

## 0.1157

## 0.2349

## 92.97 %

## 94.16 %

## Lazy- IBk

## 0 seconds

## Cross-validation

## 297/300

## ( 99 % )

## 3/300

## ( 1 % )

## 0.0077

## 0.0378

## 6.2099 %

## 15.17 %

## Bayes- NaiveBayes

## 0 seconds

## Cross-validation

## 297/300

## ( 99 % )

## 3/300

## ( 1 % )

## 0.0015

## 0.0347

## 1.19 %

## 13.92 %

## Table 4: Evaluation of classifiers on LandformIdentification dataset with Percentage split

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correctly classified cases

## Falsely classified cases

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

Rules-ZeroR.

0 seconds

Percentage split

2/102

( 1.96 % )

100/102

( 98.03 % )

0.1252

0.2512

100 %

100 %

Rules- OneR

0.02 seconds

Percentage split

55/102

( 53.92 % )

47/102

( 46.08 % )

0.0614

0.2479

49.08 %

98.65 %

Rules- PART

0.03 seconds

Percentage split

99/102

( 97.05 % )

3/102

( 2.94 % )

0.0039

0.0626

3.13 %

24.92 %

Trees- DecisionStump

0.02 seconds

Percentage split

6/102

( 5.88 % )

96/102

( 94.11 % )

0.1172

0.2447

93.66 %

97.41 %

Trees- J48

0 seconds

Percentage split

96/102

( 94.11 % )

6/102

( 5.88 % )

0.0085

0.0888

6.7888 %

35.35 %

Functions- SMO

2.03 seconds

Percentage split

66/102

( 64.70 % )

36/102

( 35.30 % )

0.1162

0.236

92.85 %

93.91 %

Bayes- NaiveBayes

0 seconds

Percentage split

99/102

( 97.05 % )

3/102

( 2.94 % )

0.004

0.0603

3.22 %

24.01 %

Lazy- IBk

0 seconds

Percentage split

101/102

( 99.01 % )

1/102

( 0.98 % )

0.0099

0.039

7.90 %

15.51 %

## Table 5: Evaluation of classifiers on CPUPerformance dataset with Cross-validation

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correlation Coefficient

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

Rules-ZeroR

0 seconds

Cross- proof

-0.2486

88.189

155.49

100 %

100 %

Rules- M5Rules

0.2 seconds

Cross- proof

0.9839

13.081

27.6918

14.83 %

17.80 %

Trees- REPTree

0.02 seconds

Cross- proof

0.9234

25.56

59.81

31.26 %

38.46 %

Trees- DecisionStump

0.02 seconds

Cross- proof

0.6147

70.91

121.94

80.41 %

78.41 %

Lazy- IBk

0 seconds

Cross- proof

0.9401

20.92

56.70

23.73 %

36.46 %

Lazy- KStar

0 seconds

Cross- proof

0.9566

13.52

46.41

15.33 %

29.84 %

Functions-MLP

6.39 seconds

Cross- proof

0.9925

6.576

19.13

7.45 %

12.30 %

Functions-LinearRegression

0.03 seconds

Cross- proof

0.9337

34.79

55.26

39.44 %

35.54 %

## Table 6: Evaluation of classifiers on CPUPerformance dataset with Percentage Split

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correlation Coefficient

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

Rules-ZeroR

0

seconds

Percentage split

0

83.39

113.04

100 %

100 %

Rules- M5Rules

0.14

seconds

Percentage split

0.9841

12.30

30.89

14.75 %

27.32 %

Trees- REPTree

0

seconds

Percentage split

0.9334

29.51

44.83

35.39 %

39.66 %

Trees- DecisionStump

0

seconds

Percentage split

0

69.53

112.13

83.38 %

99.18 %

Lazy- IBk

0

seconds

Percentage split

0.9038

22.19

52.02

26.61 %

46.02 %

Lazy- KStar

0

seconds

Percentage split

0.9652

12.85

36.38

15.41 %

32.18 %

Functions-MLP

6.41

seconds

Percentage split

0.9979

6.438

9.778

7.72 %

8.64 %

Functions-LinearRegression

0.03

seconds

Percentage split

0.9642

32.71

41.10

39.22 %

36.36 %

## Table 7: Evaluation of classifiers on RedWhiteWine dataset with Cross-validation

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correlation Coefficient

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

## Rules-ZeroR

## 0.02

## seconds

## Cross- proof

## -0.0361

## 0.6856

## 0.8733

## 100 %

## 100 %

## Rules- M5Rules

## 13.44

## seconds

## Cross- proof

## 0.5802

## 0.5532

## 0.7115

## 80.68 %

## 81.47 %

## Trees- REPTree

## 0.22

## seconds

## Cross- proof

## 0.5566

## 0.5542

## 0.7322

## 80.82 %

## 83.83 %

## Trees- DecisionStump

## 0.08

## seconds

## Cross- proof

## 0.3963

## 0.6605

## 0.8017

## 96.33 %

## 91.79 %

## Trees- M5P

## 4.28

## seconds

## Cross- proof

## 0.5885

## 0.5467

## 0.7064

## 79.74 %

## 80.88 %

## Functions-MLP

## 31.45

## seconds

## Cross- proof

## 0.505

## 05993

## 0.7624

## 87.40 %

## 87.29 %

## Functions-LinearRegression

## 0.09

## seconds

## Cross- proof

## 0.5412

## 0.57

## 0.7343

## 83.12 %

## 84.07 %

## Lazy- IBk

## 0

## seconds

## Cross- proof

## 0.5994

## 0.4299

## 0.7731

## 62.69 %

## 88.52 %

## Table 8: Evaluation of classifiers on RedWhiteWine dataset with Percentage Split

## Classifier

## Time taken to construct theoretical account

## Test manner

## Correlation Coefficient

## Mean absolute

## mistake

## Root Mean squared

## mistake

## Relative absolute mistake

## Root comparative squared mistake

Rules-ZeroR

0

seconds

Percentage split

0

0.6932

0.8863

100 %

100 %

Rules- M5Rules

13.56

seconds

Percentage split

0.5486

0.5653

0.7414

81.54 %

83.64 %

Trees- REPTree

0.2

seconds

Percentage split

0.5204

0.5771

0.7617

83.25 %

85.93 %

Trees- DecisionStump

0.09

seconds

Percentage split

0.3962

0.6664

0.8138

96.13 %

91.81 %

Trees- M5P

4.49

seconds

Percentage split

0.5821

0.5604

0.7212

80.85 %

81.36 %

Functions-MLP

31.16

seconds

Percentage split

0.5775

0.6267

0.7971

90.40 %

89.93 %

Functions-LinearRegression

0.08

seconds

Percentage split

0.5454

0.5722

0.743

82.54 %

83.82 %

Lazy- IBk

0

seconds

Percentage split

0.5709

0.4599

0.8085

66.35 %

91.21 %