$Id: TODO,v 1.6 2001/05/21 00:20:27 dfs Exp $ o Optimize/improve Unicode character classes. o Fix any pending bugs listed in BUGS file. o Update org.apache.oro.text.regex and org.apache.oro.text.perl syntax to latest version of Perl, currently version 5.6. This will require a lot of work. o Pattern cache implementations are probably not very efficient. Should revisit and reimplement. o Look for ways to avoid creating unnecessary String instances and potential cases of redundant String/char[] conversions. o The MatchAction, MatchActionInfo, and MatchActionProcessor classes were just a bad idea because that sort of thing is very inefficient in Java. The class should probably be removed. o Reduce the memory overhead of case insensitive matching in Perl5Matcher. o Measure performance of HotSpot iterating through match input via an interface's virtual function versus direct character array indexing. If HotSpot dynamically inlines the functions and achieves comparable performance, provided a clear warning is indicated that performance could be reduced on earlier JDK versions, could create a generic interface for representing input. Input array indexing could be replaced with the generic interface, PatternMatcherInput could be made to implement the interface, and stream matching could be reintroduced. Reintroduced stream matching could include a callback mechanism in the interface to report when a "contains" match has been found to allow the input encapsulator to trim its buffer. Strong warnings must go into the documentation referencing the ACM paper and noting that for many streams it will be more efficient to read the entire stream into a buffer first rather than try to match incrementally because many regular expressions will cause the whole stream to be read in anyway. For situations where that is not the case we want to be able to trim the buffer (there have been people who used OROMatcher to search gigabyte length files!). Additional methods could be added to regulate buffer growth behavior, whether to save all of it for reuse in a future pass, etc. o Make separate src and bin distributions. Current distribution is getting big on account of 1.2 MB of API docs. src only distribution should be half the size of bin distribution for quicker download.