Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify download apis for minio mounted fs #117

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Vismayak
Copy link
Contributor

@Vismayak Vismayak commented Jan 17, 2025

APIs for File Downloads with MinIO Mounted Directory

This update modifies how files are downloaded in an extractor when the environment variable MINIO_MOUNTED_PATH is set. Instead of downloading files to the /tmp folder, the API uses the S3fs-mounted directory to load files directly. This improves performance by eliminating redundant downloads and enabling direct access to files stored in MinIO.


Testing Steps

To test this functionality, we need to set up an S3fs mount and configure the environment appropriately.


1. Prerequisites

Ensure the following are in place before testing:

  • MinIO is set up and running.
  • s3fs is installed on your system. Follow the installation instructions at s3fs GitHub.

2. Expose the minio-nginx Container

To expose MinIO at port 9000, add the following configuration to your docker-compose.yml file:

minio-nginx:
  image: nginx:1.19.2-alpine
  restart: unless-stopped
  hostname: nginx
  ports:
    - "9000:9000"
  networks:
    - clowder2
  volumes:
    - ./deployments/docker/minio-nginx.conf:/etc/nginx/nginx.conf:ro
  depends_on:
    - minio1
    - minio2
    - minio3
    - minio4

3. Set the MINIO_ENDPOINT Environment Variable

Set the MINIO_ENDPOINT environment variable to the URL of the exposed MinIO service. For local development, the value should be:

MINIO_ENDPOINT="http://localhost:9000"

4. Create a .miniocred File

In your home directory (or another preferred location), create a file named .miniocred. Populate it with your MinIO credentials in the format:

ACCESS_KEY_ID:SECRET_ACCESS_KEY

For default Docker values, use:

minioadmin:minioadmin

Ensure the file has secure permissions:

chmod 600 ~/.miniocred

5. Mount the MinIO Filesystem

  1. Create a directory to mount the MinIO filesystem. For example:
   mkdir ~/clowderfs
  1. Run the following command to mount the MinIO bucket to this directory:
   s3fs clowder ~/clowderfs \
     -o passwd_file=~/.miniocred \
     -o use_path_request_style \
     -o url=$MINIO_ENDPOINT \
     -o allow_other
  1. Verify the mount by listing files in the directory:
   ls ~/clowderfs

You should see the clowder files in the MinIO bucket listed by fileids


6. Set the MINIO_MOUNTED_PATH Environment Variable

Set the MINIO_MOUNTED_PATH environment variable to the mounted directory:

MINIO_MOUNTED_PATH=~/clowderfs

Test the Download File API in PyClowder

Use the PyClowder API to test file downloads:

  1. Run the download_file API and specify the file path.
  2. The API should return a file path that points to the mounted directory (e.g., ~/clowderfs), confirming that files are being accessed directly from the mounted filesystem.

OR

Test with the Image-Classification-Dataset Extractor

  1. Clone the image-classification-dataset extractor repository.
  2. Switch to the minio-mounted-extractor branch:
   git checkout minio-mounted-extractor
  1. Run the extractor locally.
  2. Observe the logs during execution. The listed filenames should reference the MinIO mounted directory (e.g., ~/clowderfs), indicating that files are being accessed correctly.

@KastanDay
Copy link
Contributor

Just here to say: nice docs. Love this detail. Saw this in an email update and had to come say nice work 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modify download APIs to return file location in minio mounted FS
2 participants