-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] need an example of using doc_quality plugin with installed pypi packages #575
Comments
I might be misunderstanding something, but if the request is to include badword file into pypi package, it sounds weird to me. |
no need to publish the 'bad word files' to pypi. |
@sujee I am not sure whether you are just asking a question or if you want @dtsuzuku-ibm to make any changes in his code. If it is the former, e.g., you want to use this transform in a Colab notebook and you have no access to the local directory, you can specify the filepath as a parameter and use what we have in the ldnoobw directory of our repo. The files in this directory are all publicly available, i.e., they are open source. Downloading them from our repo or other open-source URLs doesn't make a difference. If you are suggesting a code change, can you be more specific? Thanks.
|
no code change necessary, just to be clear :-) I will work on an example showcasing:
for (1) are there published 'bad words files' we can access? |
Search before asking
Component
Transforms/Other
What happened + What you expected to happen
The current sample code looks for bad_word_filepath in project directory (assuming this is run from source tree).
Currently this file is in :
transforms/language/doc_quality/ray/ldnoobw/en/
We need an example showing how to use this using PYPI packages.
I have the following packages installed
Reproduction script
https://github.com/IBM/data-prep-kit/blob/dev/examples/notebooks/rag/rag_1A_dpk_process_ray.ipynb
Step 7
Anything else
No response
OS
Ubuntu
Python
3.11.x
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: