VariantDictionaryMatcher

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.man.entitytagger.matching.matchers
Class VariantDictionaryMatcher

java.lang.Object
  uk.ac.man.entitytagger.matching.Matcher
      uk.ac.man.entitytagger.matching.matchers.VariantDictionaryMatcher

All Implemented Interfaces:: Sizeable

public class VariantDictionaryMatcher
extends Matcher
implements Sizeable
extends Matcher
implements Sizeable

Class for performing NER dictionary matching against text. The dictionaries should contain a list of all possible variations of the strings that one would like to match Class objects are created using database details and potentially a identifier; when used the first time, the matcher will load dictionary terms and identifiers from the database for the specified identifier

Author:: Martin Gerner

Nested Class Summary

Nested classes/interfaces inherited from class uk.ac.man.entitytagger.matching.Matcher
`Matcher.Disambiguation`

Field Summary
`private java.sql.Connection`	`conn` SQL database connection and table names, from which dictionaries should be loaded initially
`private boolean`	`ignoreCase`
`private long`	`size` very rough estimate of the dictionary memory footprint, in bytes
`private java.lang.String[]`	`tableNames`
`private java.lang.String`	`tag` the identifier for this particular matcher; dictionary terms will only be loaded from the database where this tag matches a tag column
`private java.lang.String[]`	`terms` array containing all terms in the dictionary
`private java.lang.String[][]`	`termToIdsMap` array mapping terms to dictionary identifiers: termToIdsMap[i] contains all IDs for the term terms[i]
`private java.util.regex.Pattern`	`tokenizationPattern`

Constructor Summary
`VariantDictionaryMatcher(java.sql.Connection conn, java.lang.String[] tableNames, java.lang.String tag, boolean ignoreCase)`
`VariantDictionaryMatcher(java.lang.String[][] termToIdsMap, java.lang.String[] terms, boolean ignoreCase)`

Method Summary
`private java.util.List<java.lang.Integer>`	`getMatchIds(java.util.List<Pair<java.lang.Integer>> tokenLocations, int i, java.lang.String matchText)` Performs dictionary matching on the text, finding terms that start with the token 'token'.
`private void`	`init()` Will load the dictionary terms and identifiers from the database, sort the terms and set up the proper mappings
`static VariantDictionaryMatcher`	`load(java.io.File inFile, boolean ignoreCase)`
`static VariantDictionaryMatcher`	`load(java.io.InputStream stream, boolean ignoreCase)`
`private static java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.util.Set<java.lang.String>>>`	`loadFileSeparated(java.io.File inFile, boolean ignoreCase)`
`private java.util.Map<java.lang.String,java.util.Set<java.lang.String>>`	`loadFromDB()` Loads a dictionary from a database
`static java.util.Map<java.lang.String,Matcher>`	`loadSeparated(java.io.File[] inFiles, boolean ignoreCase)`
`static java.util.Map<java.lang.String,Matcher>`	`loadSeparatedFromDB(java.sql.Connection conn, java.lang.String[] tableNames, boolean ignoreCase)`
`static CacheMap<java.lang.String,VariantDictionaryMatcher>`	`loadSeparatedFromDBCached(java.sql.Connection conn, java.lang.String[] tableNames, boolean ignoreCase, long maxSize, java.util.logging.Logger logger)` Creates a Map from tag identifiers to VariantDictionaryMatcher objects.
`private static java.util.Map<java.lang.String,java.util.Set<java.lang.String>>`	`loadStream(java.io.InputStream inputStream, boolean ignoreCase)`
`java.util.List<Mention>`	`match(java.lang.String text, Document doc)` Search a given text for mentions
`int`	`size()`
`long`	`sizeof()` Gives a rough estimate of the memory consumption of this object.

Methods inherited from class uk.ac.man.entitytagger.matching.Matcher
`combineMatches, detectEnumerations, disambiguate, isValidMatch, match, match, match, performAcronymResolution`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

terms

private java.lang.String[] terms

array containing all terms in the dictionary

termToIdsMap

private java.lang.String[][] termToIdsMap

array mapping terms to dictionary identifiers: termToIdsMap[i] contains all IDs for the term terms[i]

conn

private java.sql.Connection conn

SQL database connection and table names, from which dictionaries should be loaded initially

tableNames

private java.lang.String[] tableNames

tag

private java.lang.String tag

the identifier for this particular matcher; dictionary terms will only be loaded from the database where this tag matches a tag column

tokenizationPattern

private final java.util.regex.Pattern tokenizationPattern

ignoreCase

private boolean ignoreCase

size

private long size

very rough estimate of the dictionary memory footprint, in bytes

Constructor Detail

VariantDictionaryMatcher

public VariantDictionaryMatcher(java.lang.String[][] termToIdsMap,
                                java.lang.String[] terms,
                                boolean ignoreCase)

VariantDictionaryMatcher

public VariantDictionaryMatcher(java.sql.Connection conn,
                                java.lang.String[] tableNames,
                                java.lang.String tag,
                                boolean ignoreCase)

Parameters:: conn - Connection to the database from which the dictionary should be loaded; tableNames - Names of the table(s) where terms should be loaded from; species - species identifier, specifying what part of the tables to load; ignoreCase - whether to ignore case when matching or not

Method Detail

size

public int size()

Overrides:: size in class Matcher

load

public static VariantDictionaryMatcher load(java.io.File inFile,
                                            boolean ignoreCase)

load

public static VariantDictionaryMatcher load(java.io.InputStream stream,
                                            boolean ignoreCase)

init

private void init()

Will load the dictionary terms and identifiers from the database, sort the terms and set up the proper mappings

loadSeparatedFromDB

public static java.util.Map<java.lang.String,Matcher> loadSeparatedFromDB(java.sql.Connection conn,
                                                                          java.lang.String[] tableNames,
                                                                          boolean ignoreCase)

loadSeparatedFromDBCached

public static CacheMap<java.lang.String,VariantDictionaryMatcher> loadSeparatedFromDBCached(java.sql.Connection conn,
                                                                                            java.lang.String[] tableNames,
                                                                                            boolean ignoreCase,
                                                                                            long maxSize,
                                                                                            java.util.logging.Logger logger)

Creates a Map from tag identifiers to VariantDictionaryMatcher objects. Using the map, dictionaries for particular tags can be retrieved, and used fro gene NER matching for that particular species The map is cached: the first time that a particular tag is retrieved, it will be loaded from a database; subsequent accesses will use the pre-loaded dictionary If the total size of the dictionaries (as given by the sum of their sizeof() method) exceeds maxSize, rarely used dictionaries will be unloaded from memory Access calls for unloaded dictionaries will result in them being loaded from the database again.

Parameters:: conn -; tableNames - The tables from which the dictionaries should be loaded; maxSize - The maximum size that the user would like the dictionaries to occupy, in bytes. The map will try to adhere to this, roughly. Also note that the size estimates of the dictionaries are very rough.; logger -
Returns:

loadSeparated

public static java.util.Map<java.lang.String,Matcher> loadSeparated(java.io.File[] inFiles,
                                                                    boolean ignoreCase)

loadFileSeparated

private static java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.util.Set<java.lang.String>>> loadFileSeparated(java.io.File inFile,
                                                                                                                                 boolean ignoreCase)

loadStream

private static java.util.Map<java.lang.String,java.util.Set<java.lang.String>> loadStream(java.io.InputStream inputStream,
                                                                                          boolean ignoreCase)

loadFromDB

private java.util.Map<java.lang.String,java.util.Set<java.lang.String>> loadFromDB()

Loads a dictionary from a database

Returns:

match

public java.util.List<Mention> match(java.lang.String text,
                                     Document doc)

Description copied from class: Matcher

Search a given text for mentions

Specified by:: match in class Matcher

Parameters:: text -; doc - object containing the document ID for the mentions
Returns:: a list of gene mentions found in the text using the loaded dictionary

getMatchIds

private java.util.List<java.lang.Integer> getMatchIds(java.util.List<Pair<java.lang.Integer>> tokenLocations,
                                                      int i,
                                                      java.lang.String matchText)

Performs dictionary matching on the text, finding terms that start with the token 'token'.

Parameters:: tokenLocations - a list of all token coordinates in the text; token - the token for which we would like to start our scan; text - the text that we would like to scan
Returns:

sizeof

public long sizeof()

Gives a rough estimate of the memory consumption of this object.

Specified by:: sizeof in interface Sizeable

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.man.entitytagger.matching.matchers Class VariantDictionaryMatcher

terms

termToIdsMap

conn

tableNames

tag

tokenizationPattern

ignoreCase

size

VariantDictionaryMatcher

VariantDictionaryMatcher

size

load

load

init

loadSeparatedFromDB

loadSeparatedFromDBCached

loadSeparated

loadFileSeparated

loadStream

loadFromDB

match

getMatchIds

sizeof

uk.ac.man.entitytagger.matching.matchers
Class VariantDictionaryMatcher