Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to access quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest #995

Open
2 tasks done
touma-I opened this issue Jan 29, 2025 · 2 comments
Open
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@touma-I
Copy link
Collaborator

touma-I commented Jan 29, 2025

Search before asking

  • I searched the issues and found no similar issues.

Component

Other

What happened + What you expected to happen

When trying to pull the doc_chunk-ray:latest image from quay.io, user receives and error

Reproduction script

docker pull quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Trying to pull quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest...
Error: initializing source docker://quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest: reading manifest latest in quay.io/dataprep1/data-prep-kit/doc_chunk-ray: unauthorized: access to the requested resource is not authorized

Anything else

No response

OS

Ubuntu

Python

Other

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@touma-I touma-I added the bug Something isn't working label Jan 29, 2025
@touma-I touma-I self-assigned this Jan 29, 2025
@touma-I
Copy link
Collaborator Author

touma-I commented Jan 29, 2025

this issue was solved by changing the repository permission for the image. Now, users with public access to quay.io can pull the image as follow:

docker pull quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Trying to pull quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest...
Getting image source signatures
Copying blob deb3674b2ec0 done   |
Copying blob 7478e0ac0f23 done   |
Copying blob f12b67e28f22 done   |
Copying blob 852ed58bb221 done   |
Copying blob d4afbec88200 done   |
Copying blob 4f4fb700ef54 done   |
Copying blob bb79cac925ec done   |
Copying blob 61596c7a543d done   |
Copying blob e12f2907e37f done   |
Copying blob fcda38bac12a done   |
Copying blob 71222f4ca89a done   |
Copying blob 90d7fa14f40c done   |
Copying blob 3f58a5109f2c done   |
Copying blob eca9a524d4c4 done   |
Copying blob 71dd65287a60 done   |
Copying blob f436ca417ad8 done   |
Copying config f4fbd6d403 done   |
Writing manifest to image destination
f4fbd6d403d1cff2df67301d312f36b53bdbbec93fceb6327f1910397096d0a9

@touma-I
Copy link
Collaborator Author

touma-I commented Jan 29, 2025

The entry point for the new docker image has change. The current entry point is as follow:

docker run -it quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest python -m dpk_doc_chunk.ray.transform -h
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
04:21:42 INFO - Launching doc_chunk transform
usage: transform.py [-h] [--run_locally RUN_LOCALLY] [--doc_chunk_chunking_type {li_markdown,dl_json,li_token_text}]
[--doc_chunk_content_column_name DOC_CHUNK_CONTENT_COLUMN_NAME] [--doc_chunk_doc_id_column_name DOC_CHUNK_DOC_ID_COLUMN_NAME]
[--doc_chunk_output_chunk_column_name DOC_CHUNK_OUTPUT_CHUNK_COLUMN_NAME]
[--doc_chunk_output_source_doc_id_column_name DOC_CHUNK_OUTPUT_SOURCE_DOC_ID_COLUMN_NAME]
[--doc_chunk_output_jsonpath_column_name DOC_CHUNK_OUTPUT_JSONPATH_COLUMN_NAME]
[--doc_chunk_output_pageno_column_name DOC_CHUNK_OUTPUT_PAGENO_COLUMN_NAME]
[--doc_chunk_output_bbox_column_name DOC_CHUNK_OUTPUT_BBOX_COLUMN_NAME] [--doc_chunk_chunk_size_tokens DOC_CHUNK_CHUNK_SIZE_TOKENS]
[--doc_chunk_chunk_overlap_tokens DOC_CHUNK_CHUNK_OVERLAP_TOKENS] [--doc_chunk_dl_min_chunk_len DOC_CHUNK_DL_MIN_CHUNK_LEN]
[--data_s3_cred DATA_S3_CRED] [--data_s3_config DATA_S3_CONFIG] [--data_local_config DATA_LOCAL_CONFIG] [--data_max_files DATA_MAX_FILES]
[--data_checkpointing DATA_CHECKPOINTING] [--data_files_to_checkpoint DATA_FILES_TO_CHECKPOINT] [--data_data_sets DATA_DATA_SETS]
[--data_files_to_use DATA_FILES_TO_USE] [--data_num_samples DATA_NUM_SAMPLES] [--runtime_num_workers RUNTIME_NUM_WORKERS]
[--runtime_worker_options RUNTIME_WORKER_OPTIONS] [--runtime_creation_delay RUNTIME_CREATION_DELAY] [--runtime_pipeline_id RUNTIME_PIPELINE_ID]
[--runtime_job_id RUNTIME_JOB_ID] [--runtime_code_location RUNTIME_CODE_LOCATION]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant