Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non 14-digit datetime reports "at None" in 404 HTML body #286

Closed
machawk1 opened this issue Nov 22, 2017 · 8 comments
Closed

Non 14-digit datetime reports "at None" in 404 HTML body #286

machawk1 opened this issue Nov 22, 2017 · 8 comments

Comments

@machawk1
Copy link
Member

  1. ipwb index ipwb/samples/warcs/5mementos.warc | ipwb replay
  2. http://localhost:5000/2016/memento.us/ in a browser
  3. Note, "at None" message in HTML response body.
@machawk1
Copy link
Member Author

machawk1 commented Dec 1, 2017

Whoh, here's a mess. The links to the Link and CDXJ TimeMaps (added in #285) also do not properly extract out the URI-R (e.g., the Link link goes to http://localhost:5000/timemap/link/)

screen shot 2017-11-30 at 11 37 59 pm

@machawk1
Copy link
Member Author

machawk1 commented Dec 1, 2017

A big problem here is that 2016 is treated as part of the URI-R...sort of, instead of being captured as part of the datetime (see #286). Maybe we should provide a sanity check on the .split('/')[0] value to ensure it's a valid hostname. Compare to a 14-digit fabricated datetime:

screen shot 2017-11-30 at 11 42 58 pm

The partial culprit here is the newly created getCompleteURI().

machawk1 added a commit that referenced this issue Dec 1, 2017
Handle non-14-digit datetimes. Closes #286
@ibnesayeed
Copy link
Member

  • I am not sure how getCompleteURI might be affecting it.
  • Current patch is very naive and may yield in issues as descried below:
    • For example, "2006".ljust(14, '0') would yield in "20060000000000" which means zeroth month and zeroth day. This is not how 14-digit datetime string is interpreted (firs month and the first day should be 01).
    • Something like "20062".ljust(14, '0') would yield in "20062000000000", but a 20th month is invalid in this context.

In MemGator, it is implemented as following:

var regs = map[string]*regexp.Regexp{
	// some stuff
	"dttmstr": regexp.MustCompile(`^(\d{4})(\d{2})?(\d{2})?(\d{2})?(\d{2})?(\d{2})?$`),
	// some stuff
}

func paddedTime(dttmstr string) (dttm *time.Time, err error) {
	m := regs["dttmstr"].FindStringSubmatch(dttmstr)
	dts := m[1]
	dts += (m[2] + "01")[:2]
	dts += (m[3] + "01")[:2]
	dts += (m[4] + "00")[:2]
	dts += (m[5] + "00")[:2]
	dts += (m[6] + "00")[:2]
	var dtm time.Time
	dtm, err = time.Parse("20060102150405", dts)
	dttm = &dtm
	return
}

@machawk1
Copy link
Member Author

machawk1 commented Dec 1, 2017

That would be an improvement, @ibnesayeed. I was also mistaken in attributing the issue to getCompleteURI().

The more liberal date handled (1-14 digits instead of hard at 14) now handles the request instead of the general handler in replay. #301 would allow for more strategic specification of dates. I agree that 0-padding is not a correct assumption but the issue in this ticket was to make the above display function correctly to provide a listing of the URI-Ms for a URI-R when a capture for the datetime (in this case, ill-specified) instead of a blank, unhelpful display.

The latter problem you mentioned might also be an issue, though I believe WARC/1.1 allows support for datetimes beyond 14-digits, so we may want to attempt to interpret a datetime > 14 digits as one beyond the conventional granularity.

@ibnesayeed
Copy link
Member

The latter problem you mentioned might also be an issue, though I believe WARC/1.1 allows support for datetimes beyond 14-digits, so we may want to attempt to interpret a datetime > 14 digits as one beyond the conventional granularity.

No, I did not mean more than 14-digit here. I was illustrating a side-effect of 0-padding that will make month 2 to 20.

Also, the above illustrated MemGator code accepts datetime in the YYYY[MM[DD[hh[mm[ss]]]]] format.

@machawk1
Copy link
Member Author

machawk1 commented Dec 1, 2017

Right, I understand this. I want to propose the >14-digit datetime as a third bullet to the two cases you mentioned above. They should all be handled in #301.

@ibnesayeed
Copy link
Member

A datetime with a smaller granularity than one second is good for WARC, but perhaps not very practical from replay lookup perspective at this time (this may change in future).

@machawk1
Copy link
Member Author

machawk1 commented Dec 1, 2017

On a related note, a WARC containing other facets of the 1.1 spec would be good for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants