LBJ2.nlp
Class Sentence

java.lang.Object
  extended by LBJ2.parse.LinkedChild
      extended by LBJ2.nlp.Sentence
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable

public class Sentence
extends LinkedChild

This representation of a sentence simply stores the entire text of the sentence in a string. This may include any newlines present in the input, depending on the parser (e.g., SentenceSplitter will leave them in). However, this class also provides methods to convert that string to other representations.

See Also:
Serialized Form

Field Summary
private  boolean[] inURL
          Indicates whether the corresponding index in the text has been determined to be part of a URL; used by partOfURL(int).
private static java.lang.String[] protocols
          URL prefixes; used by partOfURL(int).
 java.lang.String text
          The actual text of the sentence.
private static java.lang.String[] topLevelDomains
          Domain name suffixes; used by partOfURL(int).
 
Fields inherited from class LBJ2.parse.LinkedChild
end, next, parent, previous, start
 
Constructor Summary
Sentence(java.lang.String t)
          Constructs a sentence from its text.
Sentence(java.lang.String t, int s, int e)
          Constructor that sets the character offsets of this sentence.
 
Method Summary
private  void myAdd(java.util.LinkedList l, int i, java.lang.String description)
          For debugging purposes, it's useful to insert print statements here.
private  boolean partOfURL(int index)
          Does a simple check to determine if the symbol at the specified index in the specified string is likely to be part of a URL.
 java.lang.String toString()
          The string representation of a Sentence is just its text.
 LinkedVector wordSplit()
          Creates and returns a LinkedVector representation of this sentence in which every LinkedChild is a Word.
 
Methods inherited from class LBJ2.parse.LinkedChild
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

protocols

private static final java.lang.String[] protocols
URL prefixes; used by partOfURL(int). The values in this array need to be sorted by decreasing order of length to make the regular expressions that use them work properly.


topLevelDomains

private static final java.lang.String[] topLevelDomains
Domain name suffixes; used by partOfURL(int). The values in this array need to be sorted by decreasing order of length to make the regular expressions that use them work properly.


inURL

private boolean[] inURL
Indicates whether the corresponding index in the text has been determined to be part of a URL; used by partOfURL(int).


text

public java.lang.String text
The actual text of the sentence.

Constructor Detail

Sentence

public Sentence(java.lang.String t)
Constructs a sentence from its text.

Parameters:
t - The text of the sentence.

Sentence

public Sentence(java.lang.String t,
                int s,
                int e)
Constructor that sets the character offsets of this sentence.

Parameters:
t - The text of the sentence.
s - The offset at which this child starts.
e - The offset at which this child ends.
Method Detail

myAdd

private void myAdd(java.util.LinkedList l,
                   int i,
                   java.lang.String description)
For debugging purposes, it's useful to insert print statements here.

Parameters:
l - The list to add to.
i - The item to add.
description - A string describing why the addition is happening.

wordSplit

public LinkedVector wordSplit()
Creates and returns a LinkedVector representation of this sentence in which every LinkedChild is a Word. Offset information is respected and propagated.

Returns:
A LinkedVector representation of this sentence.
See Also:
Word

partOfURL

private boolean partOfURL(int index)
Does a simple check to determine if the symbol at the specified index in the specified string is likely to be part of a URL. If the specified text contains any of the following strings before the specified symbol, and there is no whitespace in between the two, the specified symbol is deemed likely to be part of a URL.

Parameters:
index - The index of the symbol in question.
Returns:
true if and only if the specified symbol appears to be part of a URL.

toString

public java.lang.String toString()
The string representation of a Sentence is just its text.

Overrides:
toString in class java.lang.Object
Returns:
The text of this sentence.