uk.ac.man.entitytagger.doc
Class TaggedDocument
java.lang.Object
uk.ac.man.entitytagger.doc.TaggedDocument
public class TaggedDocument
- extends java.lang.Object
Class representing a tagged document. Contains the original document, and the tags found in that document by a matcher.
- Author:
- Martin
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
original
private Document original
abs
private TaggedSection[] abs
body
private TaggedSection[] body
rawMatches
private java.util.List<Mention> rawMatches
rawContent
private java.lang.String rawContent
TaggedDocument
public TaggedDocument(Document original,
TaggedSection[] abs,
TaggedSection[] body,
java.util.List<Mention> rawMatches,
java.lang.String rawContent)
toHTML
public java.lang.StringBuffer toHTML(boolean link,
Function<Pair<java.lang.String>> alternativeTagFunction)
getMatchTags
private static Pair<java.lang.String> getMatchTags(Mention m,
TaggedDocument.Format format,
boolean link)
toHTML
public static java.lang.StringBuffer toHTML(java.lang.String text,
java.util.List<Mention> matches,
TaggedDocument.Format format,
boolean link,
Function<Pair<java.lang.String>> alternativeTagFunction)
- Converts a document text, with given NER mentions, to e.g. HTML or XML format (adding tags around the recognized mentions).
When calling the method, an alternative user-specific function can be provided of the following signature: Pair Function.function(Object[] args),
where the returned pair is the pair of tags enclosing the mention (e.g. and ) and the args is an array of [the mention, format, link]
- Parameters:
text
- the original text of the documentmatches
- the recognized NER mentionsformat
- the wanted formatlink
- whether to construct NER linkouts or notalternativeTagFunction
- potentially an alternative tagging function (may be null)
- Returns:
- the text of the document, with formatting tags added around the specified mentions
getOriginal
public Document getOriginal()
- Returns:
- the original
getAbs
public TaggedSection[] getAbs()
- Returns:
- the abs
getBody
public TaggedSection[] getBody()
- Returns:
- the body
getRawMatches
public java.util.List<Mention> getRawMatches()
- Returns:
- the rawMatches
getAllMatches
public java.util.ArrayList<Mention> getAllMatches()
getAllMatchedSpecies
public java.util.HashSet<java.lang.String> getAllMatchedSpecies()
getContent
public java.lang.String getContent()