uk.ac.man.entitytagger.generate
Class GenerateAutomatons

java.lang.Object
  extended by uk.ac.man.entitytagger.generate.GenerateAutomatons

 class GenerateAutomatons
extends java.lang.Object

Class providing functions used to generate automatons for efficient regular expression matching

Author:
Martin

Nested Class Summary
private  class GenerateAutomatons.ProcessProblem
          Class used to join several automatons together and minimize the result (put into a class to enable concurrent computations)
private  class GenerateAutomatons.ToAutomatonProblem
           
private  class GenerateAutomatons.ToAutomatonProblemIterator
           
 
Constructor Summary
GenerateAutomatons()
           
 
Method Summary
(package private) static Tuple<java.util.ArrayList<dk.brics.automaton.Automaton>,java.lang.Boolean> loadArray(java.io.File file)
           
(package private) static CustomRunAutomaton[] loadRArray(java.io.File file, java.util.logging.Logger logger)
           
(package private)  java.util.ArrayList<dk.brics.automaton.Automaton> process(java.util.ArrayList<dk.brics.automaton.Automaton> automatons, int multiJoin, boolean minimize, boolean showNumStates, int numThreads, java.util.logging.Logger logger)
          Function which will take a list of automatons and join them together in groups of size multiJoin (e.g.
(package private) static void storeArray(java.io.File file, java.util.ArrayList<dk.brics.automaton.Automaton> l, boolean ignoreCase)
           
(package private) static void storeRArray(java.util.ArrayList<dk.brics.automaton.Automaton> list, boolean ignoreCase, boolean tableize, java.io.File file, java.util.logging.Logger logger)
           
static void storeVariants(java.io.File file, java.sql.PreparedStatement pstmt, java.util.List<dk.brics.automaton.Automaton> automatons, java.util.logging.Logger logger, int report)
           
(package private)  java.util.ArrayList<dk.brics.automaton.Automaton> toAutomatons(java.util.ArrayList<DictionaryEntry> dictionaryEntries, int numThreads, java.lang.Integer report, boolean ignoreCase, java.util.logging.Logger logger)
          converts a list of dictionary entries to their corresponding automatons
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenerateAutomatons

GenerateAutomatons()
Method Detail

process

java.util.ArrayList<dk.brics.automaton.Automaton> process(java.util.ArrayList<dk.brics.automaton.Automaton> automatons,
                                                          int multiJoin,
                                                          boolean minimize,
                                                          boolean showNumStates,
                                                          int numThreads,
                                                          java.util.logging.Logger logger)
Function which will take a list of automatons and join them together in groups of size multiJoin (e.g. input 12 automatons and multiJoin=3 would give output of 4 automatons)

Parameters:
automatons - the list of automatons to be joined together
multiJoin - the number of automatons that should be joined at a time
minimize - whether to perform automaton minimization afterwards (will produce smaller automatons requiring less memory, but requires more time to perform)
showNumStates - whether to print some statistics at the end
numThreads - the number of concurrent joins to perform (note that multiple threads will increase memory requirements)
logger -
Returns:
a list of joined automatons of size (automatons.size() / multiJoin).

loadRArray

static CustomRunAutomaton[] loadRArray(java.io.File file,
                                       java.util.logging.Logger logger)

storeRArray

static void storeRArray(java.util.ArrayList<dk.brics.automaton.Automaton> list,
                        boolean ignoreCase,
                        boolean tableize,
                        java.io.File file,
                        java.util.logging.Logger logger)

toAutomatons

java.util.ArrayList<dk.brics.automaton.Automaton> toAutomatons(java.util.ArrayList<DictionaryEntry> dictionaryEntries,
                                                               int numThreads,
                                                               java.lang.Integer report,
                                                               boolean ignoreCase,
                                                               java.util.logging.Logger logger)
converts a list of dictionary entries to their corresponding automatons

Parameters:
dictionaryEntries - the list of dictionary entries
numThreads - the number of concurrent threads to use for conversion
report - null if the function should not output progress, will otherwise print progress after every report:th conversion)
b -
logger -
Returns:
the list of automatons representing the list of dictionary entries

loadArray

static Tuple<java.util.ArrayList<dk.brics.automaton.Automaton>,java.lang.Boolean> loadArray(java.io.File file)

storeArray

static void storeArray(java.io.File file,
                       java.util.ArrayList<dk.brics.automaton.Automaton> l,
                       boolean ignoreCase)

storeVariants

public static void storeVariants(java.io.File file,
                                 java.sql.PreparedStatement pstmt,
                                 java.util.List<dk.brics.automaton.Automaton> automatons,
                                 java.util.logging.Logger logger,
                                 int report)