Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestRandomChains/TestAllAnalyzersHaveFactories are missing types in the test setup #1126

Open
1 task done
NightOwl888 opened this issue Jan 28, 2025 · 0 comments
Open
1 task done
Labels
is:bug is:task A chore to be done pri:normal testability up-for-grabs This issue is open to be worked on by anyone

Comments

@NightOwl888
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Task description

With respect to Lucene 4.8.1, we are missing types from both TestAllAnalyzersHaveFactories and TestRandomChains.

TestAllAnalyzersHaveFactories uses TestRandomChains.getClassesForPackage() in Java to load the types from both referenced and external .jars based on classpath. So, in Java these tests both get their types from the same method. However, in .NET we are currently only considering the types that are available in Lucene.Net.Analysis.Common and not any other assemblies that may contain types from the same namespace. We don't have a common method to retrieve the types.

I did a comparison with TestAllAnalyzersHaveFactories and TestRandomChains to get a list of all of the missing Tokenizers, Token Filters, and Char Filters as well as checking the reverse to see if we have any that don't exist in Lucene 4.8.1.

TestRandomChains

Tokenizers Missing

  • Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer
  • Lucene.Net.Analysis.Icu.Segmentation.ICUTokenizer
  • Lucene.Net.Analysis.Ja.JapaneseTokenizer
  • Lucene.Net.Analysis.MockTokenizer
  • Lucene.Net.Analysis.OpenNlp.OpenNLPTokenizer
  • Lucene.Net.Analysis.Th.ThaiTokenizer

TokenFilters Missing

  • Lucene.Net.Analysis.CachingTokenFilter
  • Lucene.Net.Analysis.Icu.ICUFoldingFilter
  • Lucene.Net.Analysis.Icu.ICUNormalizer2Filter
  • Lucene.Net.Analysis.Icu.ICUTransformFilter
  • Lucene.Net.Analysis.Ja.JapaneseBaseFormFilter
  • Lucene.Net.Analysis.Ja.JapaneseKatakanaStemFilter
  • Lucene.Net.Analysis.Ja.JapanesePartOfSpeechStopFilter
  • Lucene.Net.Analysis.Ja.JapaneseReadingFormFilter
  • Lucene.Net.Analysis.MockFixedLengthPayloadFilter
  • Lucene.Net.Analysis.MockGraphTokenFilter
  • Lucene.Net.Analysis.MockHoleInjectingTokenFilter
  • Lucene.Net.Analysis.MockRandomLookaheadTokenFilter
  • Lucene.Net.Analysis.MockTokenFilter
  • Lucene.Net.Analysis.MockVariableLengthPayloadFilter
  • Lucene.Net.Analysis.Morfologik.MorfologikFilter
  • Lucene.Net.Analysis.OpenNlp.OpenNLPChunkerFilter
  • Lucene.Net.Analysis.OpenNlp.OpenNLPLemmatizerFilter
  • Lucene.Net.Analysis.OpenNlp.OpenNLPPOSFilter
  • Lucene.Net.Analysis.Phonetic.BeiderMorseFilter
  • Lucene.Net.Analysis.Phonetic.DoubleMetaphoneFilter
  • Lucene.Net.Analysis.Phonetic.PhoneticFilter
  • Lucene.Net.Analysis.Stempel.StempelFilter
  • Lucene.Net.Analysis.TrivialLookaheadFilter
  • Lucene.Net.Analysis.ValidatingTokenFilter
  • Lucene.Net.TestFramework.Analysis.CrankyTokenFilter

CharFilters Missing

  • Lucene.Net.Analysis.Icu.ICUNormalizer2CharFilter
  • Lucene.Net.Analysis.Ja.JapaneseIterationMarkCharFilter
  • Lucene.Net.Analysis.MockCharFilter

Tokenizers Extra

  • (No entries)

TokenFilters Extra

  • Lucene.Net.Analysis.Fa.PersianStemFilter - This was contributed by the Lucene.NET community.
  • Lucene.Net.Analysis.Miscellaneous.TypeAsSynonymFilter - This was added from Lucene 8.2.0 because the opennlp module calls it out in the documentation.

CharFilters Extra

  • Lucene.Net.Analysis.Util.BufferedCharFilter - This was created to add BufferedReader support to CharFilter for specific cases that require buffering.

TestAllAnalyzersHaveFactories

Tokenizers Missing

  • Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer
  • Lucene.Net.Analysis.Icu.Segmentation.ICUTokenizer
  • Lucene.Net.Analysis.Ja.JapaneseTokenizer
  • Lucene.Net.Analysis.OpenNlp.OpenNLPTokenizer
  • Lucene.Net.Analysis.Th.ThaiTokenizer

TokenFilters Missing

  • Lucene.Net.Analysis.Icu.ICUFoldingFilter
  • Lucene.Net.Analysis.Icu.ICUNormalizer2Filter
  • Lucene.Net.Analysis.Icu.ICUTransformFilter
  • Lucene.Net.Analysis.Ja.JapaneseBaseFormFilter
  • Lucene.Net.Analysis.Ja.JapaneseKatakanaStemFilter
  • Lucene.Net.Analysis.Ja.JapanesePartOfSpeechStopFilter
  • Lucene.Net.Analysis.Ja.JapaneseReadingFormFilter
  • Lucene.Net.Analysis.Morfologik.MorfologikFilter
  • Lucene.Net.Analysis.OpenNlp.OpenNLPChunkerFilter
  • Lucene.Net.Analysis.OpenNlp.OpenNLPLemmatizerFilter
  • Lucene.Net.Analysis.OpenNlp.OpenNLPPOSFilter
  • Lucene.Net.Analysis.Phonetic.BeiderMorseFilter
  • Lucene.Net.Analysis.Phonetic.DoubleMetaphoneFilter
  • Lucene.Net.Analysis.Phonetic.PhoneticFilter
  • Lucene.Net.Analysis.Stempel.StempelFilter
  • Lucene.Net.Analysis.TrivialLookaheadFilter

CharFilters Missing

  • Lucene.Net.Analysis.Icu.ICUNormalizer2CharFilter
  • Lucene.Net.Analysis.Ja.JapaneseIterationMarkCharFilter

Tokenizers Extra

  • (No entries)

TokenFilters Extra

  • Lucene.Net.Analysis.Fa.PersianStemFilter
  • Lucene.Net.Analysis.Miscellaneous.TypeAsSynonymFilter

CharFilters Extra

  • (No entries)

A few ways we could address this:

  1. Add the references to the other projects that contain the above types.
  2. Load the assemblies for the above types programmatically in some way.
  3. Port the system that was created for Lucene 9.1.0 in https://issues.apache.org/jira/browse/LUCENE-10352.

In Java, both tests will fail on Lucene 8.8.1 (using jdk 1.8.0_202) and Lucene 4.8.1 (using jdk 1.8.0_302). There are problems both with using Reflection on the constructors and with loading resources. I suspected there have been security patches in recent versions of Java 8 that invalidated the old way of loading these types, but I checked with Java SE Development Kit 8u25, and it isn't working.

I was able to get TestRandomChains running with the following code in the loop of the beforeClass() method:

      String name = c.getName();
      // Constructors don't resolve
      if (name.equals("org.apache.lucene.analysis.icu.ICUNormalizer2CharFilter")
          || name.equals("org.apache.lucene.analysis.icu.segmentation.ICUTokenizer")
          || name.equals("org.apache.lucene.analysis.icu.ICUNormalizer2Filter")
          || name.equals("org.apache.lucene.analysis.icu.ICUTransformFilter")
          || name.equals("org.apache.lucene.analysis.ja.JapaneseTokenizer")
          || name.equals("org.apache.lucene.analysis.phonetic.BeiderMorseFilter")
          || name.equals("org.apache.lucene.analysis.phonetic.PhoneticFilter")
          || name.equals("org.apache.lucene.analysis.stempel.StempelFilter")
          || name.equals("org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer")
          || name.equals("org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer")
          
          // Resources don't resolve
          || name.equals("org.apache.lucene.analysis.morfologik.MorfologikFilter")
          ) {
        continue;
      }

However, it still tends to crash when running tests with any of the other components that are in non-referenced packages. I suspect it is due to a failure when loading resources.

@NightOwl888 NightOwl888 added is:bug is:task A chore to be done pri:normal testability up-for-grabs This issue is open to be worked on by anyone labels Jan 28, 2025
@NightOwl888 NightOwl888 added this to the 4.8.0-beta00018 milestone Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:bug is:task A chore to be done pri:normal testability up-for-grabs This issue is open to be worked on by anyone
Projects
None yet
Development

No branches or pull requests

1 participant