Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activate task becomes very slow when you have a lot of deployed items #120

Open
vitch opened this issue Jul 4, 2022 · 6 comments · May be fixed by #121
Open

Activate task becomes very slow when you have a lot of deployed items #120

vitch opened this issue Jul 4, 2022 · 6 comments · May be fixed by #121

Comments

@vitch
Copy link
Contributor

vitch commented Jul 4, 2022

We've been happily using ember-cli-deploy and this plugin for over 6 years - thanks!

As a consequence of this, we've built up a large history of previously deployed versions of our apps (which are available via special URLs for testing and other purposes).

I noticed that our activation stage is taking a long time (we have a bunch of apps across a bunch of environments with a bunch of historical versions) and digging into it I found that a lot of this is due to this plugin needing to list the revisions available on S3 for various reasons.

In #119 I optimised some of the easy cases where it was possible to avoid some of this communication with S3 but we're still left with fetchInitialRevisions and fetchRevisions.

I was trying to figure out why these functions exist/ are called and whether we could optimise them in some way.

I found some relevant discussions:

From these, I wasn't too clear exactly how important these hooks might be. If they impose a high cost on some users (those with full buckets on S3) then could we change their behaviour or somehow speed them up?

It seems like the prime use-case for fetchInitialRevisions is to generate a changelog and indeed we have code which adds an entry to an audit log with the previously active revision and the new one. If other plugins only need to know the previously active revision then we can optimise by at least bailing out of the listObjectRecursively loop once we've found the active revision.

I'm interested in the appetite here for changes around this. If you think this is a problem that most people don't suffer from then we can fork the plugin and change the behaviour for our purposes. But if you think it's something that this plugin should handle then I'd love to talk about possible approaches and exactly what we need from fetchInitialRevisions and fetchRevisions

@vitch vitch mentioned this issue Jul 4, 2022
3 tasks
@lukemelia
Copy link
Contributor

@vitch I understand the issue and it seems from our discussions at the time that we had some concerns about this eventuality but decided the benefits outweighed the costs. The contract of the hooks is that they will fetch all the revisions so I am skeptical that general solutions that don't do this would be guaranteed to be backward compatible user's deployment configurations.

As a low-tech solution, have you considered deleting or archiving your 59,000 oldest revisions?

@vitch
Copy link
Contributor Author

vitch commented Jul 4, 2022

Yeah - I meant to mention that we'd considered that. The same folders are used to power our special historical URLs and I wanted to avoid adding special casing there...

I agree that it would be hard to do this in a backward compatible way which is why I split it out from #119.

I guess one other option would be to have a config flag to allow people to opt in to a new behaviour (where fetchRevisions returned nothing and fetchInitialRevisions returned only the currently active revision) if they were running into performance issues.

I'm essentially going to build that out anyway and test it in our environment to see if there are any unforeseen consequences. I can put up a PR for it and we can discuss there - if you don't think it's a good match for this plugin then I'm happy to maintain a forked version for our needs...

If you have any concrete examples of fetchRevisions or fetchInitialRevisions which need access to more than I describe above then let me know and it might help me to come up with a flexible solution...

@lukemelia
Copy link
Contributor

I don't see any harm in having a config flag for this plugin that allows you to opt into the behavior you're describing. Then you wouldn't have the hassle of maintaining a fork.

@vitch
Copy link
Contributor Author

vitch commented Jul 4, 2022

Cool. Any preferences on a name? optimisedFetchRevisions? fetchOnlyRelevantRevisions? speedOverAccuracy?

(naming things is hard - those are more in the vein of brainstorming than actual suggestions)

@vitch
Copy link
Contributor Author

vitch commented Jul 4, 2022

In fact, it looks like fetchRevisions is used by deploy:list as well - I guess we don't want to break it there...

Looking at the hooks documentation and re-reading the threads above, I'm still not too clear on why fetchRevisions is called after activate. Can you clarify the use-case for me?

(it looks like I can get another ~20% speed up by changing fetchInitialRevisions to only loop over s3 until it finds the active revision but that percentage will change depending on where the active revision lies and I will be able to get a much bigger speedup by noop-ing the fetchRevisions call but I can't figure out how to do that safely)

@vitch
Copy link
Contributor Author

vitch commented Jul 4, 2022

Another possible approach is to cache the result of fetchInitialRevisions and manually update it post-activation rather than looping over s3 again in fetchRevisions...

That (in combination with #119) means we're loading the list of all deployed versions once instead of three times when deploy:activate is called... I'd prefer an option for zero times but once would definitely be a step in the right direction ;)

vitch pushed a commit to CrowdStrike/ember-cli-deploy-s3-index that referenced this issue Jul 4, 2022
There are cases where we might want to page through the list of objects available
on S3 but we don't need to keep going once we've found what we are looking for.
This commit updates `listAllObjects` to accept an `until` argument - if this
is provided then when a new page of results is loaded we check if `some` of them
match the `until` - if so we don't bother loading another page.

A concrete use-case for this is in ember-cli-deploy#120 - sometimes we just want to
find the currently active revision so we only need to loop through the "pages"
on S3 until we do.
vitch pushed a commit to CrowdStrike/ember-cli-deploy-s3-index that referenced this issue Jul 4, 2022
If you set this option in your S3 config then we will only load enough revisions from
S3 to find the currently active version and then we will stop.
The assumption here is that the main use-case for `fetchInitialRevisions` is to be
able to do some sort of changelog or audit log from another plugin where we would
want to know the details of the previously active revision.
Since looping over revisions from S3 can be slow when you have too many of them
stored (see ember-cli-deploy#120) we provide an option to short circuit that.
vitch pushed a commit to CrowdStrike/ember-cli-deploy-s3-index that referenced this issue Jul 4, 2022
If you pass this flag then `fetchRevisions` doesn't attempt to fetch _any_ revisions
from S3. This is useful because fetching the revisions from S3 can take a long time
(see ember-cli-deploy#120) and if you don't have any reason to use them then this time is wasted.

In our use case I have hooked this flag up to an environment variable. e.g. - in `deploy.js`:

```
disableFetchRevisions: process.env.DISABLE_FETCH_REVISIONS
```

Then we can set that environment variable when we are calling the `activate`
command and leave it off at any other time (so e.g. `ember deploy:list` works as
expected - if you have the time to wait)
@vitch vitch linked a pull request Jul 4, 2022 that will close this issue
3 tasks
vitch pushed a commit to CrowdStrike/ember-cli-deploy-s3-index that referenced this issue Jul 4, 2022
If you set this option in your S3 config then we will only load enough revisions from
S3 to find the currently active version and then we will stop.
The assumption here is that the main use-case for `fetchInitialRevisions` is to be
able to do some sort of changelog or audit log from another plugin where we would
want to know the details of the previously active revision.
Since looping over revisions from S3 can be slow when you have too many of them
stored (see ember-cli-deploy#120) we provide an option to short circuit that.
vitch pushed a commit to CrowdStrike/ember-cli-deploy-s3-index that referenced this issue Jul 4, 2022
If you pass this flag then `fetchRevisions` doesn't attempt to fetch _any_ revisions
from S3. This is useful because fetching the revisions from S3 can take a long time
(see ember-cli-deploy#120) and if you don't have any reason to use them then this time is wasted.

In our use case I have hooked this flag up to an environment variable. e.g. - in `deploy.js`:

```
disableFetchRevisions: process.env.DISABLE_FETCH_REVISIONS
```

Then we can set that environment variable when we are calling the `activate`
command and leave it off at any other time (so e.g. `ember deploy:list` works as
expected - if you have the time to wait)
vitch pushed a commit to CrowdStrike/ember-cli-deploy-s3-index that referenced this issue Jul 8, 2022
There are cases where we might want to page through the list of objects available
on S3 but we don't need to keep going once we've found what we are looking for.
This commit updates `listAllObjects` to accept an `until` argument - if this
is provided then when a new page of results is loaded we check if `some` of them
match the `until` - if so we don't bother loading another page.

A concrete use-case for this is in ember-cli-deploy#120 - sometimes we just want to
find the currently active revision so we only need to loop through the "pages"
on S3 until we do.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants