Weka Experiment
Environment
Introduction
The Weka Experiment Environment enables the
user to create, run, modify, and analyse experiments
in a more convenient manner than is possible when processing the schemes
individually. For example, a user can create an experiment that runs several
schemes against a series of datasets and then analyse
the results to determine if one of the schemes is (statistically) better than
the other schemes.
To begin the Experiment Environment GUI, start Weka
and click on Experimenter in the Weka GUI Chooser
window.
Defining an Experiment
When the Experimenter is started, the Setup window (actually a pane) is
displayed. Click New to initialize an
experiment. This causes default
parameters to be defined for the experiment.
To define the dataset to be processed by a scheme, first select “Use
relative paths” in the Datasets panel of the Setup window, click New, and then click
on “Add new …” to open a dialog window.
(The arff files can be found in c:\program
files\weka-3-4.)
Double click on the “data” folder to view the available datasets or navigate
to an alternate location. Select iris.arff and click Open to select the Iris dataset. The dataset name is now
displayed in the Datasets panel of the Setup window.
Saving the Results of the Experiment
To identify a file to which the results are to be sent, click on the
“CSV file” entry in the Destination panel.
Type the name of the output file. (If you save results, do it on your student
account (Y:).)
Saving the Experiment Definition
The experiment definition can be saved at any time. Select “Save”. Type
the dataset name with the extension “exp” (or select the dataset name if the
experiment definition dataset already exists). The experiment can be restored
by selecting Open.
Running an Experiment
First select the ZeroR algorithm under
Algorithms using “Add new ...”. To run the current
experiment, click the Run tab at the top of the Experiment Environment window. Select
the experiment type such that the experiment performs 10 randomized train and
test runs on the Iris dataset, using 66% of the patterns for training and 34%
for testing, and using the ZeroR scheme. Click Start
to run the experiment.
If the experiment was defined correctly, 3 messages will be displayed in
the Log panel. The results of the experiment are saved in the comma-separated
value file you selected earlier. Load it into Excel for analysis.
Changing the Experiment Parameters
Select
the classifier entry (ZeroR) and
“Edit selected”. This scheme has no modifiable properties but most other
schemes do have properties that can be modified by the user. Click on the “Add
new” to select J48. See how you can edit the parameters, and, if desired,
modify the parameters.
Run the experiment and observe that results are generated for both
schemes.
To add additional schemes, repeat this process. To remove a scheme,
select the scheme by clicking on it and then click Delete. Run an experiment
with a number of data sets and a number of classifiers at the same time.
Adding Additional Datasets
The scheme(s) may be run on any number of datasets at a time. Additional datasets are added by clicking
“Add new …” in the Datasets panel.
Datasets are deleted from the experiment by selecting the dataset and
then clicking Delete Selected.
Experiment Analyser
Weka includes an experiment analyzer
that can be used to analyse the results of
experiments. Set up an experiment that uses 3 schemes, ZeroR,
OneR, and J48, to classify the Iris data in an
experiment using 10 train and test runs, with 66% of the data used for training
and 34% used for testing.
After the experiment setup is complete, run the experiment. Then, to analyse the results, select the Analyse
tab at the top of the Experiment Environment window. Use “Experiment” (or as
alternative “File...”) to analyse the results of the
current experiment.
The number of result lines available (“Got 30 results”) is shown in the
Source panel. This experiment consisted of 10 runs, for 3 schemes, for 1
dataset, for a total of 30 result lines.
Select the Percent_correct attribute from the
Comparison field and click Perform test to generate a comparison of the 3
schemes.
The schemes used in the experiment are shown in the columns and the
datasets used are shown in the rows.
The percentage correct for each of the 3 schemes is shown in each data
set row. The annotation “v” or “*” indicates that a specific result is
statistically better (v) or worse (*) than the baseline scheme (in this case, ZeroR) at the significance level specified (currently
0.05). The results of both OneR and J48 are statistically better than the baseline
established by ZeroR.
At the bottom of each column after the first column is a count (xx/ yy/ zz) of the number of times
that the scheme was better than (xx), the same as (yy),
or worse than (zz) the baseline scheme on the
datasets used in the experiment. In this example, there was only one dataset
and OneR was better than ZeroR
once and never equivalent to or worse than ZeroR
(1/0/0); J48 was also better than ZeroR on the
dataset.
The value “(10)” at the beginning of the “iris” row defines the number
of runs of the experiment.
The standard deviation of the attribute being evaluated can be generated
by selecting the Show std. deviations check box.
Selecting Number_correct as the comparison
field and clicking Perform test generates the average number correct (out of a
maximum of 51 test patterns, which is 34% of 150 patterns in the Iris dataset).
Saving the Results
The information displayed in the Test output panel is controlled by the
currently-selected entry in the Result list panel. Clicking on an entry causes the results
corresponding to that entry to be displayed. The results shown in the Test
output panel can be saved to a file by clicking Save
output.
Changing the Baseline Scheme
The baseline scheme can be changed by clicking Select base… and then
selecting the desired scheme. Select the
OneR scheme. This causes the other schemes to be
compared individually with the OneR scheme.
Use the Percent_correct field with OneR as the base scheme. The system will indicate that
there is no statistical difference between the results for OneR
and J48. Is there a statistically significant difference between OneR and ZeroR?
Statistical Significance
The term “statistical significance” used in the previous section refers
to the result of a pair-wise comparison of schemes using a “t-test”. As the significance level is decreased, the
confidence in the conclusion increases.
In the current experiment, there is not a statistically significant
difference between the OneR and J48 schemes. Play
with the significance level.
Summary Test
Select for Test base Summary and perform a test. Then you will see
output (ignore the numbers inside the brackets) in which the first row “- 1 1” indicates that column “b” (OneR)
is better than row “a” (ZeroR) and that column “c”
(J48) is also better than row “a”. The remaining entries are 0 because there is
no significant difference between OneR and J48 on the
data set that was used in the experiment.
Ranking Test
Select Ranking from Test base. The ranking test ranks the schemes
according to the total wins (“>”) and losses (“<”) against the other
schemes. The first column (“>-<”) is the difference between the number of
wins and the number of losses.
Cross-Validation
To change from random train and test experiments to cross-validation
experiments, choose in the setup tab the cross-validation experiment type
Set the number of iterations to 1 in the Setup window.
Analyse this experiment (there are
30 (1 run times 10 folds times 3 schemes) result lines).
Averaging Result Producer
An alternative to the CrossValidation
the Averaging Result. This
result producer takes the average of a set of runs (which are typically
cross-validation runs). This result
producer is identified by clicking advanced in the setup and then the Result
Generator panel and then selecting AveragingResultProducer
from the drop-down list.
Conduct an experiment, in which the ZeroR, OneR, and j48.J48 schemes are run 10 times with 10-fold
cross validation. Each run of 10 cross-validation folds is then averaged, producing
one result line for each run (instead of one result line for each fold as in
the previous example using the cross-validation result producer) for a total of
30 result lines.
It should be noted that while the results generated by the averaging result
producer are slightly worse than those generated by the cross-validation result
producer, the standard deviations are significantly smaller with the averaging
result producer.
Build your own classifier
Take a look at "weka/classifiers/rules/ZeroR.java",
which is probably the simplest example there is. Make a copy of it and modify
to suit your tastes. Basically you need to edit buildClassifier
(that creates/trains the classifier), classifyInstance
(that classifies new test instances) and distributionForInstance
(which returns a class probability vector). If you're not making a distributionClassifier you don't need the last one.
Other classifiers are used just by those same calls from inside your
classifier. Just import the needed classifier, build it with a training set and
call to classify new instances. If you need more than one classifier, look at
e.g. how Bagging.java does it.
Getting and setting parameters to your classifier is more
messy and usually takes most of the code lines. Take a look at Bagging.java
for that too.