Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems identified testing v0.3 #25

Open
craig-willis opened this issue Jun 13, 2018 · 4 comments
Open

Problems identified testing v0.3 #25

craig-willis opened this issue Jun 13, 2018 · 4 comments

Comments

@craig-willis
Copy link
Collaborator

craig-willis commented Jun 13, 2018

  • Registry URL is not correctly parameterized (needs domain and subdomain now)
  • Running instances were migrated (but not running)
  • Running tales have dev.wholetale.org domain -- DOMAIN env missing from start-worker.sh
  • After updating start-worker.sh on all staging nodes, tried restarting but now getting Tale could not be launched ! TimeoutError: TimeoutError('The operation timed out.) errors. Need operational documentation with common errors/troubleshooting/resolution.
@craig-willis
Copy link
Collaborator Author

Timeout error was due to incorrect queue specified during manual restart of celery_worker. I used worker but needed to be celery. Also, actual error message was in internal log file in girder container (/root/.girder/logs).

Traceback (most recent call last):
  File "/girder/girder/api/rest.py", line 620, in endpointDecorator
    val = fun(self, args, kwargs)
  File "/girder/girder/api/rest.py", line 1204, in POST
    return self.handleRoute(method, path, params)
  File "/girder/girder/api/rest.py", line 947, in handleRoute
    val = handler(**kwargs)
  File "/girder/girder/api/access.py", line 63, in wrapped
    return fun(*args, **kwargs)
  File "/girder/girder/api/describe.py", line 702, in wrapped
    return fun(*args, **kwargs)
  File "/girder/plugins/wholetale/server/rest/instance.py", line 166, in createInstance
    save=True)
  File "/girder/plugins/wholetale/server/models/instance.py", line 147, in createInstance
    volume = volumeTask.get(timeout=TASK_TIMEOUT)
  File "/usr/local/lib/python3.5/dist-packages/celery/result.py", line 191, in get
    on_message=on_message,
  File "/usr/local/lib/python3.5/dist-packages/celery/backends/async.py", line 188, in wait_for_pending
    for _ in self._wait_for_pending(result, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/celery/backends/async.py", line 259, in _wait_for_pending
    raise TimeoutError('The operation timed out.')
celery.exceptions.TimeoutError: The operation timed out.

After restarting the workers with correct queue, things are working.

@craig-willis
Copy link
Collaborator Author

Separate issue now after launching the tale:

 docker service ps tmp-k8evq397tylo --no-trunc
ID                          NAME                     IMAGE                                                   NODE                DESIRED STATE       CURRENT STATE                 ERROR                                                                           PORTS
gx9lt9w7jvch838qiw7hsedbv   tmp-k8evq397tylo.1       registry.stage.wholetale.org/5964d96e1801c10001061e49   wt-stage-01         Ready               Rejected 2 seconds ago        "No such image: registry.stage.wholetale.org/5964d96e1801c10001061e49:latest"

Do we need to migrate the registry?

@Xarthisius
Copy link
Contributor

Migrate or trigger build for all images.

@Xarthisius
Copy link
Contributor

Xarthisius commented Jun 13, 2018

...or make plugin do it if the image is not there, although that will significantly increase deployment time to staging each time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants