weka.experiment
Class ExplicitTestsetResultProducer

java.lang.Object
  extended by weka.experiment.ExplicitTestsetResultProducer
All Implemented Interfaces:
java.io.Serializable, AdditionalMeasureProducer, OptionHandler, RevisionHandler, ResultProducer

public class ExplicitTestsetResultProducer
extends java.lang.Object
implements ResultProducer, OptionHandler, AdditionalMeasureProducer, RevisionHandler

Loads the external test set and calls the appropriate SplitEvaluator to generate some results.
The filename of the test set is constructed as follows:
<dir> + / + <prefix> + <relation-name> + <suffix>
The relation-name can be modified by using the regular expression to replace the matching sub-string with a specified replacement string. In order to get rid of the string that the Weka filters add to the end of the relation name, just use '.*-weka' as the regular expression to find.
The suffix determines the type of file to load, i.e., one is not restricted to ARFF files. As long as Weka recognizes the extension specified in the suffix, the data will be loaded with one of Weka's converters.

Valid options are:

 -D
 Save raw split evaluator output.
 -O <file/directory name/path>
  The filename where raw output will be stored.
  If a directory name is specified then then individual
  outputs will be gzipped, otherwise all output will be
  zipped to the named file. Use in conjuction with -D.
  (default: splitEvalutorOut.zip)
 -W <class name>
  The full class name of a SplitEvaluator.
  eg: weka.experiment.ClassifierSplitEvaluator
 -R
  Set when data is to be randomized.
 -dir <directory>
  The directory containing the test sets.
  (default: current directory)
 -prefix <string>
  An optional prefix for the test sets (before the relation name).
 (default: empty string)
 -suffix <string>
  The suffix to append to the test set.
  (default: _test.arff)
 -find <regular expression>
  The regular expression to search the relation name with.
  Not used if an empty string.
  (default: empty string)
 -replace <string>
  The replacement string for the all the matches of '-find'.
  (default: empty string)
 
 Options specific to split evaluator weka.experiment.ClassifierSplitEvaluator:
 
 -W <class name>
  The full class name of the classifier.
  eg: weka.classifiers.bayes.NaiveBayes
 -C <index>
  The index of the class for which IR statistics
  are to be output. (default 1)
 -I <index>
  The index of an attribute to output in the
  results. This attribute should identify an
  instance in order to know which instances are
  in the test set of a cross validation. if 0
  no output (default 0).
 -P
  Add target and prediction columns to the result
  for each fold.
 
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
All options after -- will be passed to the split evaluator.

Version:
$Revision: 5353 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
static java.lang.String DATASET_FIELD_NAME
          The name of the key field containing the dataset name.
static java.lang.String DEFAULT_SUFFIX
          the default suffix.
static java.lang.String RUN_FIELD_NAME
          The name of the key field containing the run number.
static java.lang.String TIMESTAMP_FIELD_NAME
          The name of the result field containing the timestamp.
 
Constructor Summary
ExplicitTestsetResultProducer()
           
 
Method Summary
 void doRun(int run)
          Gets the results for a specified run number.
 void doRunKeys(int run)
          Gets the keys for a specified run number.
 java.util.Enumeration enumerateMeasures()
          Returns an enumeration of any additional measure names that might be in the SplitEvaluator.
 java.lang.String getCompatibilityState()
          Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface).
 java.lang.String[] getKeyNames()
          Gets the names of each of the columns produced for a single run.
 java.lang.Object[] getKeyTypes()
          Gets the data types of each of the columns produced for a single run.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure.
 java.lang.String[] getOptions()
          Gets the current settings of the result producer.
 java.io.File getOutputFile()
          Get the value of OutputFile.
 boolean getRandomizeData()
          Get if dataset is to be randomized.
 boolean getRawOutput()
          Get if raw split evaluator output is to be saved.
 java.lang.String getRelationFind()
          Returns the currently set regular expression to use on the relation name.
 java.lang.String getRelationReplace()
          Returns the currently set replacement string to use on the relation name.
 java.lang.String[] getResultNames()
          Gets the names of each of the columns produced for a single run.
 java.lang.Object[] getResultTypes()
          Gets the data types of each of the columns produced for a single run.
 java.lang.String getRevision()
          Returns the revision string.
 SplitEvaluator getSplitEvaluator()
          Get the SplitEvaluator.
 java.io.File getTestsetDir()
          Returns the currently set directory for the test sets.
 java.lang.String getTestsetPrefix()
          Returns the currently set prefix.
 java.lang.String getTestsetSuffix()
          Returns the currently set suffix.
static java.lang.Double getTimestamp()
          Gets a Double representing the current date and time.
 java.lang.String globalInfo()
          Returns a string describing this result producer.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options..
 java.lang.String outputFileTipText()
          Returns the tip text for this property.
 void postProcess()
          Perform any postprocessing.
 void preProcess()
          Prepare to generate results.
 java.lang.String randomizeDataTipText()
          Returns the tip text for this property.
 java.lang.String rawOutputTipText()
          Returns the tip text for this property.
 java.lang.String relationFindTipText()
          Returns the tip text for this property.
 java.lang.String relationReplaceTipText()
          Returns the tip text for this property.
 void setAdditionalMeasures(java.lang.String[] additionalMeasures)
          Set a list of method names for additional measures to look for in SplitEvaluators.
 void setInstances(Instances instances)
          Sets the dataset that results will be obtained for.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setOutputFile(java.io.File value)
          Set the value of OutputFile.
 void setRandomizeData(boolean value)
          Set to true if dataset is to be randomized.
 void setRawOutput(boolean value)
          Set to true if raw split evaluator output is to be saved.
 void setRelationFind(java.lang.String value)
          Sets the regular expression to use on the relation name.
 void setRelationReplace(java.lang.String value)
          Sets the replacement string to use on the relation name.
 void setResultListener(ResultListener listener)
          Sets the object to send results of each run to.
 void setSplitEvaluator(SplitEvaluator value)
          Set the SplitEvaluator.
 void setTestsetDir(java.io.File value)
          Sets the directory to use for the test sets.
 void setTestsetPrefix(java.lang.String value)
          Sets the prefix to use for the test sets.
 void setTestsetSuffix(java.lang.String value)
          Sets the suffix to use for the test sets.
 java.lang.String splitEvaluatorTipText()
          Returns the tip text for this property.
 java.lang.String testsetDirTipText()
          Returns the tip text for this property.
 java.lang.String testsetPrefixTipText()
          Returns the tip text for this property.
 java.lang.String testsetSuffixTipText()
          Returns the tip text for this property.
 java.lang.String toString()
          Gets a text descrption of the result producer.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_SUFFIX

public static final java.lang.String DEFAULT_SUFFIX
the default suffix.

See Also:
Constant Field Values

DATASET_FIELD_NAME

public static java.lang.String DATASET_FIELD_NAME
The name of the key field containing the dataset name.


RUN_FIELD_NAME

public static java.lang.String RUN_FIELD_NAME
The name of the key field containing the run number.


TIMESTAMP_FIELD_NAME

public static java.lang.String TIMESTAMP_FIELD_NAME
The name of the result field containing the timestamp.

Constructor Detail

ExplicitTestsetResultProducer

public ExplicitTestsetResultProducer()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this result producer.

Returns:
a description of the result producer suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options..

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -D
 Save raw split evaluator output.
 -O <file/directory name/path>
  The filename where raw output will be stored.
  If a directory name is specified then then individual
  outputs will be gzipped, otherwise all output will be
  zipped to the named file. Use in conjuction with -D.
  (default: splitEvalutorOut.zip)
 -W <class name>
  The full class name of a SplitEvaluator.
  eg: weka.experiment.ClassifierSplitEvaluator
 -R
  Set when data is to be randomized.
 -dir <directory>
  The directory containing the test sets.
  (default: current directory)
 -prefix <string>
  An optional prefix for the test sets (before the relation name).
 (default: empty string)
 -suffix <string>
  The suffix to append to the test set.
  (default: _test.arff)
 -find <regular expression>
  The regular expression to search the relation name with.
  Not used if an empty string.
  (default: empty string)
 -replace <string>
  The replacement string for the all the matches of '-find'.
  (default: empty string)
 
 Options specific to split evaluator weka.experiment.ClassifierSplitEvaluator:
 
 -W <class name>
  The full class name of the classifier.
  eg: weka.classifiers.bayes.NaiveBayes
 -C <index>
  The index of the class for which IR statistics
  are to be output. (default 1)
 -I <index>
  The index of an attribute to output in the
  results. This attribute should identify an
  instance in order to know which instances are
  in the test set of a cross validation. if 0
  no output (default 0).
 -P
  Add target and prediction columns to the result
  for each fold.
 
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
All options after -- will be passed to the split evaluator.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the result producer.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

setInstances

public void setInstances(Instances instances)
Sets the dataset that results will be obtained for.

Specified by:
setInstances in interface ResultProducer
Parameters:
instances - a value of type 'Instances'.

setAdditionalMeasures

public void setAdditionalMeasures(java.lang.String[] additionalMeasures)
Set a list of method names for additional measures to look for in SplitEvaluators. This could contain many measures (of which only a subset may be produceable by the current SplitEvaluator) if an experiment is the type that iterates over a set of properties.

Specified by:
setAdditionalMeasures in interface ResultProducer
Parameters:
additionalMeasures - an array of measure names, null if none

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Returns an enumeration of any additional measure names that might be in the SplitEvaluator.

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Returns:
an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure.

Specified by:
getMeasure in interface AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure
Throws:
java.lang.IllegalArgumentException - if the named measure is not supported

setResultListener

public void setResultListener(ResultListener listener)
Sets the object to send results of each run to.

Specified by:
setResultListener in interface ResultProducer
Parameters:
listener - a value of type 'ResultListener'

getTimestamp

public static java.lang.Double getTimestamp()
Gets a Double representing the current date and time. eg: 1:46pm on 20/5/1999 -> 19990520.1346

Returns:
a value of type Double

preProcess

public void preProcess()
                throws java.lang.Exception
Prepare to generate results.

Specified by:
preProcess in interface ResultProducer
Throws:
java.lang.Exception - if an error occurs during preprocessing.

postProcess

public void postProcess()
                 throws java.lang.Exception
Perform any postprocessing. When this method is called, it indicates that no more requests to generate results for the current experiment will be sent.

Specified by:
postProcess in interface ResultProducer
Throws:
java.lang.Exception - if an error occurs

doRunKeys

public void doRunKeys(int run)
               throws java.lang.Exception
Gets the keys for a specified run number. Different run numbers correspond to different randomizations of the data. Keys produced should be sent to the current ResultListener

Specified by:
doRunKeys in interface ResultProducer
Parameters:
run - the run number to get keys for.
Throws:
java.lang.Exception - if a problem occurs while getting the keys

doRun

public void doRun(int run)
           throws java.lang.Exception
Gets the results for a specified run number. Different run numbers correspond to different randomizations of the data. Results produced should be sent to the current ResultListener

Specified by:
doRun in interface ResultProducer
Parameters:
run - the run number to get results for.
Throws:
java.lang.Exception - if a problem occurs while getting the results

getKeyNames

public java.lang.String[] getKeyNames()
Gets the names of each of the columns produced for a single run. This method should really be static.

Specified by:
getKeyNames in interface ResultProducer
Returns:
an array containing the name of each column

getKeyTypes

public java.lang.Object[] getKeyTypes()
Gets the data types of each of the columns produced for a single run. This method should really be static.

Specified by:
getKeyTypes in interface ResultProducer
Returns:
an array containing objects of the type of each column. The objects should be Strings, or Doubles.

getResultNames

public java.lang.String[] getResultNames()
Gets the names of each of the columns produced for a single run. This method should really be static.

Specified by:
getResultNames in interface ResultProducer
Returns:
an array containing the name of each column

getResultTypes

public java.lang.Object[] getResultTypes()
Gets the data types of each of the columns produced for a single run. This method should really be static.

Specified by:
getResultTypes in interface ResultProducer
Returns:
an array containing objects of the type of each column. The objects should be Strings, or Doubles.

getCompatibilityState

public java.lang.String getCompatibilityState()
Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface). For example, a cross-validation ResultProducer may have a setting for the number of folds. For a given state, the results produced should be compatible. Typically if a ResultProducer is an OptionHandler, this string will represent the command line arguments required to set the ResultProducer to that state.

Specified by:
getCompatibilityState in interface ResultProducer
Returns:
the description of the ResultProducer state, or null if no state is defined

outputFileTipText

public java.lang.String outputFileTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getOutputFile

public java.io.File getOutputFile()
Get the value of OutputFile.

Returns:
Value of OutputFile.

setOutputFile

public void setOutputFile(java.io.File value)
Set the value of OutputFile.

Parameters:
value - Value to assign to OutputFile.

randomizeDataTipText

public java.lang.String randomizeDataTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getRandomizeData

public boolean getRandomizeData()
Get if dataset is to be randomized.

Returns:
true if dataset is to be randomized

setRandomizeData

public void setRandomizeData(boolean value)
Set to true if dataset is to be randomized.

Parameters:
value - true if dataset is to be randomized

rawOutputTipText

public java.lang.String rawOutputTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getRawOutput

public boolean getRawOutput()
Get if raw split evaluator output is to be saved.

Returns:
true if raw split evalutor output is to be saved

setRawOutput

public void setRawOutput(boolean value)
Set to true if raw split evaluator output is to be saved.

Parameters:
value - true if output is to be saved

splitEvaluatorTipText

public java.lang.String splitEvaluatorTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSplitEvaluator

public SplitEvaluator getSplitEvaluator()
Get the SplitEvaluator.

Returns:
the SplitEvaluator.

setSplitEvaluator

public void setSplitEvaluator(SplitEvaluator value)
Set the SplitEvaluator.

Parameters:
value - new SplitEvaluator to use.

testsetDirTipText

public java.lang.String testsetDirTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getTestsetDir

public java.io.File getTestsetDir()
Returns the currently set directory for the test sets.

Returns:
the directory

setTestsetDir

public void setTestsetDir(java.io.File value)
Sets the directory to use for the test sets.

Parameters:
value - the directory to use

testsetPrefixTipText

public java.lang.String testsetPrefixTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getTestsetPrefix

public java.lang.String getTestsetPrefix()
Returns the currently set prefix.

Returns:
the prefix

setTestsetPrefix

public void setTestsetPrefix(java.lang.String value)
Sets the prefix to use for the test sets.

Parameters:
value - the prefix

testsetSuffixTipText

public java.lang.String testsetSuffixTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getTestsetSuffix

public java.lang.String getTestsetSuffix()
Returns the currently set suffix.

Returns:
the suffix

setTestsetSuffix

public void setTestsetSuffix(java.lang.String value)
Sets the suffix to use for the test sets.

Parameters:
value - the suffix

relationFindTipText

public java.lang.String relationFindTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getRelationFind

public java.lang.String getRelationFind()
Returns the currently set regular expression to use on the relation name.

Returns:
the regular expression

setRelationFind

public void setRelationFind(java.lang.String value)
Sets the regular expression to use on the relation name.

Parameters:
value - the regular expression

relationReplaceTipText

public java.lang.String relationReplaceTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getRelationReplace

public java.lang.String getRelationReplace()
Returns the currently set replacement string to use on the relation name.

Returns:
the replacement string

setRelationReplace

public void setRelationReplace(java.lang.String value)
Sets the replacement string to use on the relation name.

Parameters:
value - the regular expression

toString

public java.lang.String toString()
Gets a text descrption of the result producer.

Overrides:
toString in class java.lang.Object
Returns:
a text description of the result producer.

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Returns:
the revision