uk.ac.man.entitytagger.doc
Class TaggedDocument

java.lang.Object
  extended by uk.ac.man.entitytagger.doc.TaggedDocument

public class TaggedDocument
extends java.lang.Object

Class representing a tagged document. Contains the original document, and the tags found in that document by a matcher.

Author:
Martin

Nested Class Summary
static class TaggedDocument.Format
           
 
Field Summary
private  TaggedSection[] abs
           
private  TaggedSection[] body
           
private  Document original
           
private  java.lang.String rawContent
           
private  java.util.List<Mention> rawMatches
           
 
Constructor Summary
TaggedDocument(Document original, TaggedSection[] abs, TaggedSection[] body, java.util.List<Mention> rawMatches, java.lang.String rawContent)
           
 
Method Summary
 TaggedSection[] getAbs()
           
 java.util.HashSet<java.lang.String> getAllMatchedSpecies()
           
 java.util.ArrayList<Mention> getAllMatches()
           
 TaggedSection[] getBody()
           
 java.lang.String getContent()
           
private static Pair<java.lang.String> getMatchTags(Mention m, TaggedDocument.Format format, boolean link)
           
 Document getOriginal()
           
 java.util.List<Mention> getRawMatches()
           
 java.lang.StringBuffer toHTML(boolean link, Function<Pair<java.lang.String>> alternativeTagFunction)
           
static java.lang.StringBuffer toHTML(java.lang.String text, java.util.List<Mention> matches, TaggedDocument.Format format, boolean link, Function<Pair<java.lang.String>> alternativeTagFunction)
          Converts a document text, with given NER mentions, to e.g.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

original

private Document original

abs

private TaggedSection[] abs

body

private TaggedSection[] body

rawMatches

private java.util.List<Mention> rawMatches

rawContent

private java.lang.String rawContent
Constructor Detail

TaggedDocument

public TaggedDocument(Document original,
                      TaggedSection[] abs,
                      TaggedSection[] body,
                      java.util.List<Mention> rawMatches,
                      java.lang.String rawContent)
Method Detail

toHTML

public java.lang.StringBuffer toHTML(boolean link,
                                     Function<Pair<java.lang.String>> alternativeTagFunction)

getMatchTags

private static Pair<java.lang.String> getMatchTags(Mention m,
                                                   TaggedDocument.Format format,
                                                   boolean link)

toHTML

public static java.lang.StringBuffer toHTML(java.lang.String text,
                                            java.util.List<Mention> matches,
                                            TaggedDocument.Format format,
                                            boolean link,
                                            Function<Pair<java.lang.String>> alternativeTagFunction)
Converts a document text, with given NER mentions, to e.g. HTML or XML format (adding tags around the recognized mentions). When calling the method, an alternative user-specific function can be provided of the following signature: Pair Function.function(Object[] args), where the returned pair is the pair of tags enclosing the mention (e.g. and ) and the args is an array of [the mention, format, link]

Parameters:
text - the original text of the document
matches - the recognized NER mentions
format - the wanted format
link - whether to construct NER linkouts or not
alternativeTagFunction - potentially an alternative tagging function (may be null)
Returns:
the text of the document, with formatting tags added around the specified mentions

getOriginal

public Document getOriginal()
Returns:
the original

getAbs

public TaggedSection[] getAbs()
Returns:
the abs

getBody

public TaggedSection[] getBody()
Returns:
the body

getRawMatches

public java.util.List<Mention> getRawMatches()
Returns:
the rawMatches

getAllMatches

public java.util.ArrayList<Mention> getAllMatches()

getAllMatchedSpecies

public java.util.HashSet<java.lang.String> getAllMatchedSpecies()

getContent

public java.lang.String getContent()