Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Securing Aim Remote Tracking server using SSL key and certificate #3172

Open
JeroenVranken opened this issue Jun 19, 2024 · 9 comments
Open
Assignees
Labels
area / SDK-storage Issue area: SDK and storage related issues phase / shipped Issue phase: shipped type / enhancement Issue type: new feature or request

Comments

@JeroenVranken
Copy link

Securing Aim Remote Tracking server using SSL key and certificate

Hi, first of all I appreciate all the work you've put into making Aim!

I am having some trouble securing the connection to the Aim Remote Tracking (RT) Server, and was wondering if you could help me out.

I recently setup a virtual machine on Azure, which is running both the Aim RT Server and the Aim UI. To do this, I have used a docker-compose.yml, which brings up both the server and the UI. This is working properly, I can log runs from another machine and see them appear in the UI, great.

However, now I want to secure the connection to the remote tracking server using SSL, as described here. I've created a self-signed key and certificate file using openssl, as described here.

Whenever I bring up the server using this command, eveything seems in working order, I do not get any errors etc:

aim server --repo ~/mycontainer/aim/ --ssl-keyfile ~/secrets/server.key --ssl-certfile ~/secrets/server.crt --host 0.0.0.0 --dev --port 53800

But then when I try to log a run from another machine, I get the following error on the client:

azureuser@ml-ci-jvranken-prd:~/cloudfiles/code/Users/jvranken/aim-tracking-server$ python aim_test.py 
Failed to connect to Aim Server. Have you forgot to run `aim server` command?
Traceback (most recent call last):
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 14, in wrapper
    return func(*args, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 138, in connect
    response = requests.get(endpoint, headers=self.request_headers)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 682, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/ml-ci-jvranken-prd/code/Users/jvranken/aim-tracking-server/aim_test.py", line 7, in <module>
    run = Run(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 70, in wrapper
    _SafeModeConfig.exception_callback(e, func)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 47, in reraise_exception
    raise e
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 68, in wrapper
    return func(*args, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 859, in __init__
    super().__init__(run_hash, repo=repo, read_only=read_only, experiment=experiment, force_resume=force_resume)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 272, in __init__
    super().__init__(run_hash, repo=repo, read_only=read_only, force_resume=force_resume)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/base_run.py", line 34, in __init__
    self.repo = get_repo(repo)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo_utils.py", line 26, in get_repo
    repo = Repo.from_path(repo)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 210, in from_path
    repo = Repo(path, read_only=read_only, init=init)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 121, in __init__
    self._client = Client(remote_path)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 50, in __init__
    self.connect()
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 18, in wrapper
    raise RuntimeError(error_message)
RuntimeError: Failed to connect to Aim Server. Have you forgot to run `aim server` command?

Do you have any clue as to why this is not working? Here is the docker-compose.yaml and the python file I'm using:

services:
  ui:
    image: aimstack/aim:3.20.1
    container_name: aim_ui
    restart: unless-stopped
    command: up --host 0.0.0.0 --port 43800 --dev
    ports:
      - 80:43800
    volumes:
    - ~/mycontainer/aim:/opt/aim
    networks:
      - aim

  server:
    image: aimstack/aim:3.20.1
    container_name: aim_server
    restart: unless-stopped
    command: server --host 0.0.0.0 --dev --ssl-keyfile /opt/secrets/server.key --ssl-certfile /opt/secrets/server.crt
    ports:
      - 53800:53800
    volumes:
    - ~/mycontainer/aim:/opt/aim
    - ~/secrets:/opt/secrets
    networks:
      - aim

networks:
  aim:
    driver: bridge
from aim import Run

# AIM_REPO='/home/azureuser/mycontainer/aim'
AIM_REPO='aim://REDACTED:53800'
AIM_EXPERIMENT='SSL-server'

run = Run(
    repo=AIM_REPO,
    experiment=AIM_EXPERIMENT
)


hparams_dict = {
    'learning_rate': 0.001,
    'batch_size': 32,
}
run['hparams'] = hparams_dict


# log metric
for i in range(30):
    if i % 5 == 0:
        i = i * 0.347
    run.track(float(i), name='numbers')
@JeroenVranken JeroenVranken added the type / question Issue type: question label Jun 19, 2024
@SGevorg
Copy link
Member

SGevorg commented Jun 21, 2024

@JeroenVranken thanks for the issue. This could be related to the auth token things we have added recently. @mihran113 @alberttorosyan what do you guys think?

@schauaib
Copy link

schauaib commented Aug 9, 2024

Any update on this ? I guess I faced a similar issue in #3206

@schauaib
Copy link

schauaib commented Aug 9, 2024

This error occurs with version 3.20.1, but everything works fine when I revert to the 3.17.4 version of AIM

@erikdao
Copy link

erikdao commented Aug 27, 2024

This error occurs with version 3.20.1, but everything works fine when I revert to the 3.17.4 version of AIM

Have you tried the latest version 3.23.0. I seem to be dealing with the same issue.

@merryHunter
Copy link

@erikdao did you manage to resolve it? It seems in general Aim docs could be improved for making it into production. The docker-compose file is missing even in the repo, and there are a few docker / ssl related issues open for a long time..

@erikdao
Copy link

erikdao commented Sep 22, 2024

@erikdao did you manage to resolve it? It seems in general Aim docs could be improved for making it into production. The docker-compose file is missing even in the repo, and there are a few docker / ssl related issues open for a long time..

It turned out that my problem was different. I didn't enable SSL when deploying Aim Server. My problem was related to networking on GCP.

@mihran113 mihran113 self-assigned this Sep 23, 2024
@mihran113 mihran113 added type / enhancement Issue type: new feature or request phase / review-needed Issue phase: issues that are done and needs review area / SDK-storage Issue area: SDK and storage related issues and removed type / question Issue type: question labels Sep 23, 2024
@mihran113
Copy link
Contributor

mihran113 commented Sep 23, 2024

Hey folks! Sorry for late response.

I've opened a PR which will add support for self-signed SSL certificates.
The problem here was that by default requests package doesn't trust self-signed certificates and needs a custom cert file path to verify against, which renders our protocol probe logic (choosing between http and https for the client) obsolete and falls back to using http which results in the errors shared above.

This addition will allow to specify cert files path via env variable, which will allow the client flow to work as expected.

The changes will be included in the upcoming 3.25.0 release.

@merryHunter
Copy link

@mihran113 amazing, thank you very much!!

@mihran113 mihran113 added phase / ready-to-go Issue phase: issues that are merged and will be included in the upcoming release phase / shipped Issue phase: shipped and removed phase / review-needed Issue phase: issues that are done and needs review phase / ready-to-go Issue phase: issues that are merged and will be included in the upcoming release labels Sep 26, 2024
@mihran113
Copy link
Contributor

mihran113 commented Oct 2, 2024

Hey folks the changes for this have been published with the newest release of aim v3.25.0.
Please check out the new section in documentation for client side setup (works the same way as with versions of 3.17.5 and older)
https://aimstack.readthedocs.io/en/latest/using/remote_tracking.html#ssl-support

Let me know if everything works as expected or not 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area / SDK-storage Issue area: SDK and storage related issues phase / shipped Issue phase: shipped type / enhancement Issue type: new feature or request
Projects
None yet
Development

No branches or pull requests

6 participants