LBJ2.classify
Class TestDiscrete

java.lang.Object
  extended by LBJ2.classify.TestDiscrete

public class TestDiscrete
extends java.lang.Object

This class is a program that can evaluate any Classifier against an oracle Classifier on the objects returned from a Parser.

Usage:

java LBJ2.classify.TestDiscrete [-t <n>] <classifier> <oracle> <parser> <input file> [<null label> [<null label> ...]]

Options: The -t <n> option is similar to the LBJ compiler's command line option of the same name. When <n> is greater than 0, a time stamp is printed to STDOUT after every <n> examples are processed.

Input: The first three command line parameters are fully qualified class names, e.g. myPackage.myClassifier. Next, <input file> is passed (as a String) to the constructor of <parser>. The optional parameter <null label> identifies one of the possible labels produced by <oracle> as representing "no classification". It is used during the computation of overall precision, recall, and F1 scores. Finally, it is also assumed that <classifier> is discrete, and that its discreteValue(Object) method is implemented.

Output: First some timing information is presented. The first time reported is the time taken to load the specified classifier's Java class into memory. This reflects the time taken for LBJ to load the classifier's internal representation if the classifier does not make use of the cachedin keyword. Next, the time taken to evaluate the first example is reported. It isn't particularly informative unless the classifier does make use of the cachedin keyword. In this case, it reflects the time LBJ takes to load the classifier's internal representation better than the first time reported. Finally, the average time taken to execute the classifier's discreteValue(Object) method is reported.

After the timing information, an ASCII table is written to STDOUT reporting precision, recall, and F1 scores itemized by the values that either the classifier or the oracle produced during the test. The two rightmost columns are named "LCount" and "PCount" (standing for "labeled count" and "predicted count" respectively), and they report the number of times the oracle produced each label and the number of times the classifier predicted each label respectively. If a "null label" is specified, overall precision, recall, and F1 scores and a total count of non-null-labeled examples are reported at the bottom of the table. In the last row, whether a "null label" is specified or not, overall accuracy is reported in the precision column. In the count column, the total number of predictions (or labels, equivalently) is reported.


Field Summary
private static Classifier classifier
          References the classifier that is to be tested.
protected  java.util.HashMap correctHistogram
          The histogram of correct predictions.
protected  java.util.HashMap goldHistogram
          The histogram of correct labels.
protected  java.util.HashSet nullLabels
          The set of "null" labels whose statistics are not included in overall precision, recall, F1, or accuracy.
private static Classifier oracle
          References the oracle classifier to test against.
private static int outputGranularity
          The number of examples processed in between time stamp messages.
private static Parser parser
          References the parser supplying the testing objects.
protected  java.util.HashMap predictionHistogram
          The histogram of predictions.
 
Constructor Summary
TestDiscrete()
          Default constructor.
 
Method Summary
 void addNull(java.lang.String n)
          Adds a label to the set of "null" labels.
 java.lang.String[] getAllClasses()
          Returns the set of all classes reported as either predictions or labels.
 int getCorrect(java.lang.String p)
          Returns the number of times the requested prediction was reported correctly.
 double getF(double b, java.lang.String l)
          Returns the Fbeta score associated with the given label.
 double getF1(java.lang.String l)
          Returns the F1 score associated with the given label.
 int getLabeled(java.lang.String l)
          Returns the number of times the requested label was reported.
 java.lang.String[] getLabels()
          Returns the set of labels that have been reported so far.
 double[] getOverallStats()
          Computes overall the overall statistics precision, recall, F1, and accuracy.
 double[] getOverallStats(double b)
          Computes overall the overall statistics precision, recall, Fbeta, and accuracy.
 double getPrecision(java.lang.String p)
          Returns the precision associated with the given prediction.
 int getPredicted(java.lang.String p)
          Returns the number of times the requested prediction was reported.
 java.lang.String[] getPredictions()
          Returns the set of predictions that have been reported so far.
 double getRecall(java.lang.String l)
          Returns the recall associated with the given label.
 boolean hasNulls()
          Returns true iff there exist "null" labels.
protected  void histogramAdd(java.util.HashMap histogram, java.lang.String key, int amount)
          Takes a histogram implemented as a map and increments the count for the given key by the given amount.
protected  void histogramAddAll(java.util.HashMap h1, java.util.HashMap h2)
          Takes two histograms implemented as maps and adds the amounts found in the second histogram to the amounts found in the first.
protected  int histogramGet(java.util.HashMap histogram, java.lang.String key)
          Takes a histogram implemented as a map and retrieves the count for the given key.
private static TestDiscrete instantiate(java.lang.String[] args)
          Given command line parameters representing the fully qualified names of the classifier to be tested, the oracle classifier to test against, the parser supplying the testing objects, and the input parameter to the parser's constructor this method instantiates all three objects.
 boolean isNull(java.lang.String n)
          Determines if a label is treated as a "null" label.
static void main(java.lang.String[] args)
          The entry point of this program.
 void printPerformance(java.io.PrintStream out)
          Performance results are written to the given stream in the form of precision, recall, and F1 statistics.
 void removeNull(java.lang.String n)
          Removes a label from the set of "null" labels.
 void reportAll(TestDiscrete t)
          Report all the predictions in the argument's histograms.
 void reportPrediction(java.lang.String p, java.lang.String l)
          Whenever a prediction is made, report that prediction and the correct label with this method.
static TestDiscrete testDiscrete(Classifier classifier, Classifier oracle, Parser parser)
          Tests the given discrete classifier against the given oracle using the given parser to provide the labeled testing data.
static TestDiscrete testDiscrete(TestDiscrete tester, Classifier classifier, Classifier oracle, Parser parser, boolean output, int outputGranularity)
          Tests the given discrete classifier against the given oracle using the given parser to provide the labeled testing data.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

classifier

private static Classifier classifier
References the classifier that is to be tested.


oracle

private static Classifier oracle
References the oracle classifier to test against.


parser

private static Parser parser
References the parser supplying the testing objects.


outputGranularity

private static int outputGranularity
The number of examples processed in between time stamp messages.


goldHistogram

protected java.util.HashMap goldHistogram
The histogram of correct labels.


predictionHistogram

protected java.util.HashMap predictionHistogram
The histogram of predictions.


correctHistogram

protected java.util.HashMap correctHistogram
The histogram of correct predictions.


nullLabels

protected java.util.HashSet nullLabels
The set of "null" labels whose statistics are not included in overall precision, recall, F1, or accuracy.

Constructor Detail

TestDiscrete

public TestDiscrete()
Default constructor.

Method Detail

main

public static void main(java.lang.String[] args)
The entry point of this program.

Parameters:
args - The command line parameters.

testDiscrete

public static TestDiscrete testDiscrete(Classifier classifier,
                                        Classifier oracle,
                                        Parser parser)
Tests the given discrete classifier against the given oracle using the given parser to provide the labeled testing data. This simplified interface to testDiscrete(TestDiscrete,Classifier,Classifier,Parser,boolean,int) assumes there are no null predictions and that output should not be generated on STDOUT.

Parameters:
classifier - The classifier to be tested.
oracle - The classifier to test against.
parser - The parser supplying the labeled example objects.

testDiscrete

public static TestDiscrete testDiscrete(TestDiscrete tester,
                                        Classifier classifier,
                                        Classifier oracle,
                                        Parser parser,
                                        boolean output,
                                        int outputGranularity)
Tests the given discrete classifier against the given oracle using the given parser to provide the labeled testing data.

Parameters:
tester - An object of this class that has already been told via addNull(String) which prediction values are considered to be null predictions.
classifier - The classifier to be tested.
oracle - The classifier to test against.
parser - The parser supplying the labeled example objects.
output - Whether or not to produce output on STDOUT.
outputGranularity - The number of examples processed in between time stamp messages.
Returns:
The same TestDiscrete object passed in the first argument, after being filled with statistics.

instantiate

private static TestDiscrete instantiate(java.lang.String[] args)
Given command line parameters representing the fully qualified names of the classifier to be tested, the oracle classifier to test against, the parser supplying the testing objects, and the input parameter to the parser's constructor this method instantiates all three objects.

Parameters:
args - The command line.
Returns:
A new tester object containing the "null" labels.

reportPrediction

public void reportPrediction(java.lang.String p,
                             java.lang.String l)
Whenever a prediction is made, report that prediction and the correct label with this method.

Parameters:
p - The prediction.
l - The correct label.

reportAll

public void reportAll(TestDiscrete t)
Report all the predictions in the argument's histograms.

Parameters:
t - Another object of this class.

getLabels

public java.lang.String[] getLabels()
Returns the set of labels that have been reported so far.

Returns:
An array containing the labels that have been reported so far.

getPredictions

public java.lang.String[] getPredictions()
Returns the set of predictions that have been reported so far.

Returns:
An array containing the predictions that have been reported so far.

getAllClasses

public java.lang.String[] getAllClasses()
Returns the set of all classes reported as either predictions or labels.

Returns:
An array containing all classes reported as either predictions or labels.

addNull

public void addNull(java.lang.String n)
Adds a label to the set of "null" labels.

Parameters:
n - The label to add.

removeNull

public void removeNull(java.lang.String n)
Removes a label from the set of "null" labels.

Parameters:
n - The label to remove.

isNull

public boolean isNull(java.lang.String n)
Determines if a label is treated as a "null" label.

Parameters:
n - The label in question.
Returns:
true iff n is one of the "null" labels.

hasNulls

public boolean hasNulls()
Returns true iff there exist "null" labels.


histogramAdd

protected void histogramAdd(java.util.HashMap histogram,
                            java.lang.String key,
                            int amount)
Takes a histogram implemented as a map and increments the count for the given key by the given amount.

Parameters:
histogram - The histogram.
key - The key whose count should be incremented.
amount - The amount by which to increment.

histogramGet

protected int histogramGet(java.util.HashMap histogram,
                           java.lang.String key)
Takes a histogram implemented as a map and retrieves the count for the given key.

Parameters:
histogram - The histogram.
key - The key whose count should be retrieved.
Returns:
The count of the specified key.

histogramAddAll

protected void histogramAddAll(java.util.HashMap h1,
                               java.util.HashMap h2)
Takes two histograms implemented as maps and adds the amounts found in the second histogram to the amounts found in the first.

Parameters:
h1 - The first histogram, whose values will be modified.
h2 - The second histogram, whose values will be added into the first's.

getLabeled

public int getLabeled(java.lang.String l)
Returns the number of times the requested label was reported.

Parameters:
l - The label in question.
Returns:
The number of times l was reported.

getPredicted

public int getPredicted(java.lang.String p)
Returns the number of times the requested prediction was reported.

Parameters:
p - The prediction in question.
Returns:
The number of times p was reported.

getCorrect

public int getCorrect(java.lang.String p)
Returns the number of times the requested prediction was reported correctly.

Parameters:
p - The prediction in question.
Returns:
The number of times p was reported.

getPrecision

public double getPrecision(java.lang.String p)
Returns the precision associated with the given prediction.

Parameters:
p - The given prediction.
Returns:
The precision associated with p.

getRecall

public double getRecall(java.lang.String l)
Returns the recall associated with the given label.

Parameters:
l - The given label.
Returns:
The precision associated with l.

getF1

public double getF1(java.lang.String l)
Returns the F1 score associated with the given label.

Parameters:
l - The given label.
Returns:
The F1 score associated with l.

getF

public double getF(double b,
                   java.lang.String l)
Returns the Fbeta score associated with the given label. Fbeta is defined as:
Fbeta = (beta2 + 1) * P * R / (beta2 * P + R)

Parameters:
b - The value of beta.
l - The given label.
Returns:
The Fbeta score associated with l.

getOverallStats

public double[] getOverallStats()
Computes overall the overall statistics precision, recall, F1, and accuracy. Note that these statistics are all equivalent unless "null" labels have been added.

Returns:
An array in which the first element represents overall precision, the second represents overall recall, then F1, and finally accuracy.

getOverallStats

public double[] getOverallStats(double b)
Computes overall the overall statistics precision, recall, Fbeta, and accuracy. Note that these statistics are all equivalent unless "null" labels have been added.

Parameters:
b - The value of beta.
Returns:
An array in which the first element represents overall precision, the second represents overall recall, then F1, and finally accuracy.

printPerformance

public void printPerformance(java.io.PrintStream out)
Performance results are written to the given stream in the form of precision, recall, and F1 statistics.

Parameters:
out - The stream to write to.