uk.ac.man.entitytagger.doc
Class TaggedDocument
java.lang.Object
  
uk.ac.man.entitytagger.doc.TaggedDocument
public class TaggedDocument
- extends java.lang.Object
 
Class representing a tagged document. Contains the original document, and the tags found in that document by a matcher.
- Author:
 
  - Martin
 
 
 
 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
original
private Document original
abs
private TaggedSection[] abs
body
private TaggedSection[] body
rawMatches
private java.util.List<Mention> rawMatches
rawContent
private java.lang.String rawContent
TaggedDocument
public TaggedDocument(Document original,
                      TaggedSection[] abs,
                      TaggedSection[] body,
                      java.util.List<Mention> rawMatches,
                      java.lang.String rawContent)
toHTML
public java.lang.StringBuffer toHTML(boolean link,
                                     Function<Pair<java.lang.String>> alternativeTagFunction)
 
getMatchTags
private static Pair<java.lang.String> getMatchTags(Mention m,
                                                   TaggedDocument.Format format,
                                                   boolean link)
 
toHTML
public static java.lang.StringBuffer toHTML(java.lang.String text,
                                            java.util.List<Mention> matches,
                                            TaggedDocument.Format format,
                                            boolean link,
                                            Function<Pair<java.lang.String>> alternativeTagFunction)
- Converts a document text, with given NER mentions, to e.g. HTML or XML format (adding tags around the recognized mentions).
 When calling the method, an alternative user-specific function can be provided of the following signature: Pair Function.function(Object[] args), 
 where the returned pair is the pair of tags enclosing the mention (e.g.  and ) and the args is an array of [the mention, format, link]
- Parameters:
 text - the original text of the documentmatches - the recognized NER mentionsformat - the wanted formatlink - whether to construct NER linkouts or notalternativeTagFunction - potentially an alternative tagging function (may be null)
- Returns:
 - the text of the document, with formatting tags added around the specified mentions
 
 
 
getOriginal
public Document getOriginal()
- Returns:
 - the original
 
 
getAbs
public TaggedSection[] getAbs()
- Returns:
 - the abs
 
 
getBody
public TaggedSection[] getBody()
- Returns:
 - the body
 
 
getRawMatches
public java.util.List<Mention> getRawMatches()
- Returns:
 - the rawMatches
 
 
getAllMatches
public java.util.ArrayList<Mention> getAllMatches()
 
getAllMatchedSpecies
public java.util.HashSet<java.lang.String> getAllMatchedSpecies()
 
getContent
public java.lang.String getContent()