You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loader should include some normalization routine to handle dates in different formats.
Expected Behavior
Such a normalization routine would be called for each date field in the record ensuring that the data fit the schema, like "2017 Sep 1" -> "2017-09-01", "2017-Sep-1" -> "2017-09-01", "2017 Sep-Oct" -> "2017", "01.09.2017" -> "2017-09-01"
Current Behavior
I have to admit, I do not know to what extent it is already implemented in hepcrawl. In the harvesting-kit each publisher program has its own normalization code. At DESY we have a hand-written function which tries to catch most the cases.
Context
We will have to write a lot of spiders. It would save time, if we could just map the date-fields without thinking about the format.
The text was updated successfully, but these errors were encountered:
Date ranges are not suported yet, are they a common occurence? if so we need to extend the utils to understand them. Also the last case is interpreted wrongly, but is ambiguous so we would need to make a choice here. Do you think your interpretation is more common?
Loader should include some normalization routine to handle dates in different formats.
Expected Behavior
Such a normalization routine would be called for each date field in the record ensuring that the data fit the schema, like "2017 Sep 1" -> "2017-09-01", "2017-Sep-1" -> "2017-09-01", "2017 Sep-Oct" -> "2017", "01.09.2017" -> "2017-09-01"
Current Behavior
I have to admit, I do not know to what extent it is already implemented in hepcrawl. In the harvesting-kit each publisher program has its own normalization code. At DESY we have a hand-written function which tries to catch most the cases.
Context
We will have to write a lot of spiders. It would save time, if we could just map the date-fields without thinking about the format.
The text was updated successfully, but these errors were encountered: