|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.classifiers.AbstractClassifier
weka.classifiers.SingleClassifierEnhancer
weka.classifiers.RandomizableSingleClassifierEnhancer
weka.classifiers.meta.OneClassClassifier
public class OneClassClassifier
Performs one-class classification on a dataset.
Classifier reduces the class being classified to just a single class, and learns the datawithout using any information from other classes. The testing stage will classify as 'target'or 'outlier' - so in order to calculate the outlier pass rate the dataset must contain informationfrom more than one class.
Also, the output varies depending on whether the label 'outlier' exists in the instances usedto build the classifier. If so, then 'outlier' will be predicted, if not, then the label willbe considered missing when the prediction does not favour the target class. The 'outlier' classwill not be used to build the model if there are instances of this class in the dataset. It cansimply be used as a flag, you do not need to relabel any classes.
For more information, see:
Kathryn Hempstalk, Eibe Frank, Ian H. Witten: One-Class Classification by Combining Density and Class Probability Estimation. In: Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, ECMLPKDD2008, Berlin, 505--519, 2008.
@conference{Hempstalk2008, address = {Berlin}, author = {Kathryn Hempstalk and Eibe Frank and Ian H. Witten}, booktitle = {Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, ECMLPKDD2008}, month = {September}, pages = {505--519}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, title = {One-Class Classification by Combining Density and Class Probability Estimation}, volume = {Vol. 5211}, year = {2008}, location = {Antwerp, Belgium} }Valid options are:
-trr <rate> Sets the target rejection rate (default: 0.1)
-tcl <label> Sets the target class label (default: 'target')
-cvr <rep> Sets the number of times to repeat cross validation to find the threshold (default: 10)
-P <prop> Sets the proportion of generated data (default: 0.5)
-cvf <perc> Sets the percentage of heldout data for each cross validation fold (default: 10)
-num <classname + options> Sets the numeric generator (default: weka.classifiers.meta.generators.GaussianGenerator)
-nom <classname + options> Sets the nominal generator (default: weka.classifiers.meta.generators.NominalGenerator)
-L Sets whether to correct the number of classes to two, if omitted no correction will be made.
-E Sets whether to exclusively use the density estimate.
-I Sets whether to use instance weights.
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.meta.Bagging)
Options specific to classifier weka.classifiers.meta.Bagging:
-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-S <num> Random number seed. (default 1)
-I <num> Number of iterations. (default 10)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.trees.REPTree)
Options specific to classifier weka.classifiers.trees.REPTree:
-M <minimum number of instances> Set minimum number of instances per leaf (default 2).
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-N <number of folds> Number of folds for reduced error pruning (default 3).
-S <seed> Seed for random data shuffling (default 1).
-P No pruning.
-L Maximum tree depth (default -1, no maximum)Options after -- are passed to the designated classifier.
Field Summary | |
---|---|
static java.lang.String |
OUTLIER_LABEL
The label for the outlier class. |
Constructor Summary | |
---|---|
OneClassClassifier()
Default constructor. |
Method Summary | |
---|---|
void |
buildClassifier(Instances data)
Build the one-class classifier, any non-target data values are ignored. |
java.lang.String |
densityOnlyTipText()
Returns the tip text for this property. |
double[] |
distributionForInstance(Instance instance)
Returns a probability distribution for a given instance. |
Capabilities |
getCapabilities()
Returns default capabilities of the base classifier. |
boolean |
getDensityOnly()
Gets whether only the density estimate should be used by the classifier. |
NominalAttributeGenerator |
getNominalGenerator()
Gets the generator that will be used by default to generate nominal outlier data. |
NumericAttributeGenerator |
getNumericGenerator()
Gets thegenerator that will be used by default to generate numeric outlier data. |
int |
getNumRepeats()
Gets the number of repeats for (internal) cross validation. |
java.lang.String[] |
getOptions()
Gets the current settings of the Classifier. |
double |
getPercentageHeldout()
Gets the percentage of data that will be heldout in each iteration of cross validation. |
double |
getProportionGenerated()
Gets the proportion of data that will be generated compared to the target class label. |
java.lang.String |
getRevision()
Returns the revision string. |
java.lang.String |
getTargetClassLabel()
Gets the target class label - the class label to perform one class classification on. |
double |
getTargetRejectionRate()
Gets the target rejection rate - the proportion of target class samples that will be rejected in order to build a threshold. |
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on. |
boolean |
getUseInstanceWeights()
Gets whether instance weighting will be performed. |
boolean |
getUseLaplaceCorrection()
Gets whether a laplace correction should be used. |
java.lang.String |
globalInfo()
Returns a string describing this classes ability. |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] args)
Main method for executing this classifier. |
java.lang.String |
nominalGeneratorTipText()
Returns the tip text for this property. |
java.lang.String |
numericGeneratorTipText()
Returns the tip text for this property. |
java.lang.String |
numRepeatsTipText()
Returns the tip text for this property. |
java.lang.String |
percentageHeldoutTipText()
Returns the tip text for this property. |
java.lang.String |
proportionGeneratedTipText()
Returns the tip text for this property. |
void |
setDensityOnly(boolean density)
Sets whether the density estimate will be used by itself. |
void |
setNominalGenerator(NominalAttributeGenerator agen)
Sets the generator that will be used by default to generate nominal outlier data. |
void |
setNumericGenerator(NumericAttributeGenerator agen)
Sets the generator that will be used by default to generate numeric outlier data. |
void |
setNumRepeats(int repeats)
Sets the number of repeats for (internal) cross validation to a new value. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setPercentageHeldout(double percent)
Sets the percentage heldout in each CV fold. |
void |
setProportionGenerated(double prop)
Sets the proportion of generated data to a new value. |
void |
setTargetClassLabel(java.lang.String label)
Sets the target class label to a new value. |
void |
setTargetRejectionRate(double rate)
Sets the target rejection rate. |
void |
setUseInstanceWeights(boolean newuse)
Sets whether to perform weighting on instances based on their prevalence in the data. |
void |
setUseLaplaceCorrection(boolean newuse)
Sets whether a laplace correction should be used. |
java.lang.String |
targetClassLabelTipText()
Returns the tip text for this property. |
java.lang.String |
targetRejectionRateTipText()
Returns the tip text for this property. |
java.lang.String |
toString()
Output a representation of this classifier |
java.lang.String |
useInstanceWeightsTipText()
Returns the tip text for this property. |
java.lang.String |
useLaplaceCorrectionTipText()
Returns the tip text for this property. |
Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer |
---|
getSeed, seedTipText, setSeed |
Methods inherited from class weka.classifiers.SingleClassifierEnhancer |
---|
classifierTipText, getClassifier, setClassifier |
Methods inherited from class weka.classifiers.AbstractClassifier |
---|
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String OUTLIER_LABEL
Constructor Detail |
---|
public OneClassClassifier()
Method Detail |
---|
public java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class RandomizableSingleClassifierEnhancer
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-trr <rate> Sets the target rejection rate (default: 0.1)
-tcl <label> Sets the target class label (default: 'target')
-cvr <rep> Sets the number of times to repeat cross validation to find the threshold (default: 10)
-P <prop> Sets the proportion of generated data (default: 0.5)
-cvf <perc> Sets the percentage of heldout data for each cross validation fold (default: 10)
-num <classname + options> Sets the numeric generator (default: weka.classifiers.meta.generators.GaussianGenerator)
-nom <classname + options> Sets the nominal generator (default: weka.classifiers.meta.generators.NominalGenerator)
-L Sets whether to correct the number of classes to two, if omitted no correction will be made.
-E Sets whether to exclusively use the density estimate.
-I Sets whether to use instance weights.
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.meta.Bagging)
Options specific to classifier weka.classifiers.meta.Bagging:
-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-S <num> Random number seed. (default 1)
-I <num> Number of iterations. (default 10)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.trees.REPTree)
Options specific to classifier weka.classifiers.trees.REPTree:
-M <minimum number of instances> Set minimum number of instances per leaf (default 2).
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-N <number of folds> Number of folds for reduced error pruning (default 3).
-S <seed> Seed for random data shuffling (default 1).
-P No pruning.
-L Maximum tree depth (default -1, no maximum)
setOptions
in interface OptionHandler
setOptions
in class RandomizableSingleClassifierEnhancer
options
- The list of options as an array of strings.
java.lang.Exception
- If an option is not supported.public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class RandomizableSingleClassifierEnhancer
public boolean getDensityOnly()
public void setDensityOnly(boolean density)
density
- Whether to use the density estimate exclusively or not.public java.lang.String densityOnlyTipText()
public double getTargetRejectionRate()
public void setTargetRejectionRate(double rate)
rate
- The new target rejection rate.public java.lang.String targetRejectionRateTipText()
public java.lang.String getTargetClassLabel()
public void setTargetClassLabel(java.lang.String label)
label
- The target class label to classify for.public java.lang.String targetClassLabelTipText()
public int getNumRepeats()
public void setNumRepeats(int repeats)
repeats
- The new number of repeats for cross validation.public java.lang.String numRepeatsTipText()
public void setProportionGenerated(double prop)
prop
- The new proportion.public double getProportionGenerated()
public java.lang.String proportionGeneratedTipText()
public void setPercentageHeldout(double percent)
percent
- The new percent of heldout data.public double getPercentageHeldout()
public java.lang.String percentageHeldoutTipText()
public NumericAttributeGenerator getNumericGenerator()
public void setNumericGenerator(NumericAttributeGenerator agen)
agen
- The new numeric data generator to use.public java.lang.String numericGeneratorTipText()
public NominalAttributeGenerator getNominalGenerator()
public void setNominalGenerator(NominalAttributeGenerator agen)
agen
- The new nominal data generator to use.public java.lang.String nominalGeneratorTipText()
public boolean getUseLaplaceCorrection()
public void setUseLaplaceCorrection(boolean newuse)
newuse
- Whether to use the laplace correction (default: true).public java.lang.String useLaplaceCorrectionTipText()
public void setUseInstanceWeights(boolean newuse)
newuse
- Whether or not to use instance weighting.public boolean getUseInstanceWeights()
public java.lang.String useInstanceWeightsTipText()
public Capabilities getCapabilities()
getCapabilities
in interface Classifier
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class SingleClassifierEnhancer
Capabilities
public void buildClassifier(Instances data) throws java.lang.Exception
buildClassifier
in interface Classifier
data
- The training data.
java.lang.Exception
- If the classifier could not be built successfully.public double[] distributionForInstance(Instance instance) throws java.lang.Exception
distributionForInstance
in interface Classifier
distributionForInstance
in class AbstractClassifier
instance
- The instance to calculate the probability distribution for.
java.lang.Exception
- if distribution could not be
computed successfullypublic java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class AbstractClassifier
public static void main(java.lang.String[] args)
args
- use -h to see all available options
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |