You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a user who uploads tiff and csv files to a collection. Apparently setting metadata goes kind of wrong.
The collection element shows data, but the actual dataset tiff. The inconsistency can also be seen for the hidden datasets corresponding to collection element number 16:
dataset 16 (HID?): the actual dataset which is shown as tiff
dataset 32 to my understanding the collection element is shown as data
I can download the dataset and it is a tifffile (also the tiffile library can read it).
Edit: Maybe of importance: I still use the SQLlite backend for celery tasks.
To increase confusion, my celery log file shows loads of Tracebacks as shown below indicating problems while parsing tiff files, but the problem can be reproduced only for the csv files (the logfiles also seem to indicate that job is for a csv file).
Should Tracebacks from setting metadata be visible in the celery logs at all? Also the csv sniffer should run before the tiff sniffer (but it might have failed).
It seems that celery upload jobs failing metadata setting are considered successful (at least the job working dir of the upload jobs are gone and my galaxy is configured to keep them for failing jobs).
A general question is how celery log files can be interpreted: How can one identify the log messages that originate from the same task? Since in my understanding celery tasks run concurrently (default seems to be concurrency: 2) consecutive lines can probably not be interpreted to come from the same task?
Galaxy Version and/or server at which you observed the bug
Galaxy Version: 24.0
To Reproduce
Unsure. Also repeated upload shows the problem for different collection elements.
Expected behavior
Datatypes should be consistent for the datasets and collection elements.
Additional context
[2025-02-10 13:39:25,486: INFO/main] Successfully executed Celery task setup_fetch_data setup_fetch_data (241.727 ms)
[2025-02-10 13:39:25,491: DEBUG/main] finish(): Moved /work/songalax/galaxy/database/jobs_directory/000/318/318120/outputs/dataset_0acce8bb-a60d-4921-82eb-53b228a325d6.dat to /gpfs1/data/galaxy_server/galaxy/database/files/000/691/dataset_691783.dat
[2025-02-10 13:39:25,516: DEBUG/main] unnamed outputs [{'__unnamed_outputs': [{'destination': {'type': 'hdas'}, 'elements': [{'name': 'microscope-20240429095347--B4-000-cropped_resized__SHAPES.csv', 'dbkey': '?', 'ext': 'tiff', 'link_data_only': False, 'sources': [], 'hashes': [], 'info': 'uploaded tiff file', 'state': 'ok', 'filename': '/work/songalax/galaxy/database/jobs_directory/000/318/318120/working/gxupload_0', 'object_id': 1033183}]}]}]
[2025-02-10 13:39:25,534: ERROR/main] Exception occured while setting metdata
Traceback (most recent call last):
File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.11/site-packages/tifffile/tifffile.py", line 4053, in __init__
byteorder = {b'II': '<', b'MM': '>', b'EP': '<'}[header[:2]]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: b'Lo'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/model/store/discover.py", line 261, in set_datasets_metadata
primary_data.set_meta()
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/model/__init__.py", line 4671, in set_meta
return self.datatype.set_meta(self, **kwd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/datatypes/images.py", line 113, in set_meta
with tifffile.TiffFile(dataset.get_file_name()) as tif:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.11/site-packages/tifffile/tifffile.py", line 4055, in __init__
raise TiffFileError(f'not a TIFF file {header!r}') from exc
tifffile.tifffile.TiffFileError: not a TIFF file b'Loca'
[2025-02-10 13:39:25,543: INFO/main] Task galaxy.setup_fetch_data[d37be958-0dab-4eef-944a-ce8c77e69621] succeeded in 0.29848467744886875s: ('/work/songalax/galaxy/database/jobs_directory/000/318/318124', '/work/songalax/galaxy/database/jobs_directory/000/318/318124/request.json', {'file_sources': [{...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}, {...}], 'config': {'symlink_allowlist': [...], 'fetch_url_allowlist': [...], 'library_import_dir': '/data/galaxy_server/galaxy/library_import/', 'user_library_import_dir': '/gpfs1/data/galaxy_server/library_import_user/', 'ftp_upload_dir': '/gpfs1/data/galaxy_server/library_import_user/', 'ftp_upload_purge': True}})
[2025-02-10 13:39:25,578: INFO/main] Collecting metrics for Job 318120 in /work/songalax/galaxy/database/jobs_directory/000/318/318120
[2025-02-10 13:39:25,621: DEBUG/main] job_wrapper.finish for job 318120 executed (195.082 ms)
The text was updated successfully, but these errors were encountered:
Describe the bug
I have a user who uploads tiff and csv files to a collection. Apparently setting metadata goes kind of wrong.
The collection element shows
data
, but the actual datasettiff
. The inconsistency can also be seen for the hidden datasets corresponding to collection element number 16:I can download the dataset and it is a tifffile (also the
tiffile
library can read it).Edit: Maybe of importance: I still use the SQLlite backend for celery tasks.
To increase confusion, my celery log file shows loads of Tracebacks as shown below indicating problems while parsing tiff files, but the problem can be reproduced only for the csv files (the logfiles also seem to indicate that job is for a csv file).
concurrency: 2
) consecutive lines can probably not be interpreted to come from the same task?Galaxy Version and/or server at which you observed the bug
Galaxy Version: 24.0
To Reproduce
Unsure. Also repeated upload shows the problem for different collection elements.
Expected behavior
Datatypes should be consistent for the datasets and collection elements.
Additional context
The text was updated successfully, but these errors were encountered: