TestRandomChains/TestAllAnalyzersHaveFactories are missing types in the test setup #1126
Open
1 task done
Labels
is:bug
is:task
A chore to be done
pri:normal
testability
up-for-grabs
This issue is open to be worked on by anyone
Milestone
Is there an existing issue for this?
Task description
With respect to Lucene 4.8.1, we are missing types from both
TestAllAnalyzersHaveFactories
andTestRandomChains
.TestAllAnalyzersHaveFactories
usesTestRandomChains.getClassesForPackage()
in Java to load the types from both referenced and external.jar
s based on classpath. So, in Java these tests both get their types from the same method. However, in .NET we are currently only considering the types that are available inLucene.Net.Analysis.Common
and not any other assemblies that may contain types from the same namespace. We don't have a common method to retrieve the types.I did a comparison with
TestAllAnalyzersHaveFactories
andTestRandomChains
to get a list of all of the missing Tokenizers, Token Filters, and Char Filters as well as checking the reverse to see if we have any that don't exist in Lucene 4.8.1.TestRandomChains
Tokenizers Missing
Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer
Lucene.Net.Analysis.Icu.Segmentation.ICUTokenizer
Lucene.Net.Analysis.Ja.JapaneseTokenizer
Lucene.Net.Analysis.MockTokenizer
Lucene.Net.Analysis.OpenNlp.OpenNLPTokenizer
Lucene.Net.Analysis.Th.ThaiTokenizer
TokenFilters Missing
Lucene.Net.Analysis.CachingTokenFilter
Lucene.Net.Analysis.Icu.ICUFoldingFilter
Lucene.Net.Analysis.Icu.ICUNormalizer2Filter
Lucene.Net.Analysis.Icu.ICUTransformFilter
Lucene.Net.Analysis.Ja.JapaneseBaseFormFilter
Lucene.Net.Analysis.Ja.JapaneseKatakanaStemFilter
Lucene.Net.Analysis.Ja.JapanesePartOfSpeechStopFilter
Lucene.Net.Analysis.Ja.JapaneseReadingFormFilter
Lucene.Net.Analysis.MockFixedLengthPayloadFilter
Lucene.Net.Analysis.MockGraphTokenFilter
Lucene.Net.Analysis.MockHoleInjectingTokenFilter
Lucene.Net.Analysis.MockRandomLookaheadTokenFilter
Lucene.Net.Analysis.MockTokenFilter
Lucene.Net.Analysis.MockVariableLengthPayloadFilter
Lucene.Net.Analysis.Morfologik.MorfologikFilter
Lucene.Net.Analysis.OpenNlp.OpenNLPChunkerFilter
Lucene.Net.Analysis.OpenNlp.OpenNLPLemmatizerFilter
Lucene.Net.Analysis.OpenNlp.OpenNLPPOSFilter
Lucene.Net.Analysis.Phonetic.BeiderMorseFilter
Lucene.Net.Analysis.Phonetic.DoubleMetaphoneFilter
Lucene.Net.Analysis.Phonetic.PhoneticFilter
Lucene.Net.Analysis.Stempel.StempelFilter
Lucene.Net.Analysis.TrivialLookaheadFilter
Lucene.Net.Analysis.ValidatingTokenFilter
Lucene.Net.TestFramework.Analysis.CrankyTokenFilter
CharFilters Missing
Lucene.Net.Analysis.Icu.ICUNormalizer2CharFilter
Lucene.Net.Analysis.Ja.JapaneseIterationMarkCharFilter
Lucene.Net.Analysis.MockCharFilter
Tokenizers Extra
TokenFilters Extra
Lucene.Net.Analysis.Fa.PersianStemFilter
- This was contributed by the Lucene.NET community.Lucene.Net.Analysis.Miscellaneous.TypeAsSynonymFilter
- This was added from Lucene 8.2.0 because the opennlp module calls it out in the documentation.CharFilters Extra
Lucene.Net.Analysis.Util.BufferedCharFilter
- This was created to addBufferedReader
support toCharFilter
for specific cases that require buffering.TestAllAnalyzersHaveFactories
Tokenizers Missing
Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer
Lucene.Net.Analysis.Icu.Segmentation.ICUTokenizer
Lucene.Net.Analysis.Ja.JapaneseTokenizer
Lucene.Net.Analysis.OpenNlp.OpenNLPTokenizer
Lucene.Net.Analysis.Th.ThaiTokenizer
TokenFilters Missing
Lucene.Net.Analysis.Icu.ICUFoldingFilter
Lucene.Net.Analysis.Icu.ICUNormalizer2Filter
Lucene.Net.Analysis.Icu.ICUTransformFilter
Lucene.Net.Analysis.Ja.JapaneseBaseFormFilter
Lucene.Net.Analysis.Ja.JapaneseKatakanaStemFilter
Lucene.Net.Analysis.Ja.JapanesePartOfSpeechStopFilter
Lucene.Net.Analysis.Ja.JapaneseReadingFormFilter
Lucene.Net.Analysis.Morfologik.MorfologikFilter
Lucene.Net.Analysis.OpenNlp.OpenNLPChunkerFilter
Lucene.Net.Analysis.OpenNlp.OpenNLPLemmatizerFilter
Lucene.Net.Analysis.OpenNlp.OpenNLPPOSFilter
Lucene.Net.Analysis.Phonetic.BeiderMorseFilter
Lucene.Net.Analysis.Phonetic.DoubleMetaphoneFilter
Lucene.Net.Analysis.Phonetic.PhoneticFilter
Lucene.Net.Analysis.Stempel.StempelFilter
Lucene.Net.Analysis.TrivialLookaheadFilter
CharFilters Missing
Lucene.Net.Analysis.Icu.ICUNormalizer2CharFilter
Lucene.Net.Analysis.Ja.JapaneseIterationMarkCharFilter
Tokenizers Extra
TokenFilters Extra
Lucene.Net.Analysis.Fa.PersianStemFilter
Lucene.Net.Analysis.Miscellaneous.TypeAsSynonymFilter
CharFilters Extra
A few ways we could address this:
In Java, both tests will fail on Lucene 8.8.1 (using jdk 1.8.0_202) and Lucene 4.8.1 (using jdk 1.8.0_302). There are problems both with using Reflection on the constructors and with loading resources. I suspected there have been security patches in recent versions of Java 8 that invalidated the old way of loading these types, but I checked with Java SE Development Kit 8u25, and it isn't working.
I was able to get
TestRandomChains
running with the following code in the loop of thebeforeClass()
method:However, it still tends to crash when running tests with any of the other components that are in non-referenced packages. I suspect it is due to a failure when loading resources.
The text was updated successfully, but these errors were encountered: