Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porter: "Installation not found" when preceeding pipeline step fails #4281

Open
jonnyry opened this issue Jan 21, 2025 · 13 comments
Open

Porter: "Installation not found" when preceeding pipeline step fails #4281

jonnyry opened this issue Jan 21, 2025 · 13 comments
Labels
bug Something isn't working

Comments

@jonnyry
Copy link
Collaborator

jonnyry commented Jan 21, 2025

If a resource install fails in a pipeline step before porter has had chance to run, when you attempt to Update the resource to fix the issue, Porter will fail with an "Installation not found" error, as it didn't create any state on install.

E.g. the first step of this pipeline failed.

Image

When I attempted to recover the issue by updating the resource (which if I understand calls porter upgrade, I got the following error:

1ac69287-2099-40aa-a807-3620e5cadeda: Porter action failed with error = Error message: could not find installation /1ac69287-2099-40aa-a807-3620e5cadeda: Installation not found could not find installation /1ac69287-2099-40aa-a807-3620e5cadeda: Installation not found ; Command executed: porter upgrade "1ac69287-2099-40aa-a807-3620e5cadeda" --reference mytre.azurecr.io/tre-service-databricks:v1.0.10 --param address_space="10.1.6.0/24" --param arm_environment="public" --param arm_use_msi="true" --param id="1ac69287-2099-40aa-a807-3620e5cadeda" --param is_exposed_externally="False" --param tfstate_container_name="tfstate" --param tfstate_resource_group_name="rg-mytre-mgmt" --param tfstate_storage_account_name="mytremgmtstore" --param tre_id="mytre" --param workspace_id="e01fa8c3-83c7-4a13-9710-0ad3082e9523" --force --credential-set arm_auth --credential-set aad_auth Installation not found

As expected there was no state in the cosmos mongo DB, as porter install never ran in the first place.

@jonnyry jonnyry added the bug Something isn't working label Jan 21, 2025
@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 21, 2025

(Manual workaround: update the resource's deploymentStatus in cosmos to deleted and deploy a new resource)

@marrobi
Copy link
Member

marrobi commented Jan 21, 2025

Yes, I've started seeing this, I am sure porter used to create a new installation. Wonder if linked to a porter upgrade.

Suggest we try catch it and install instead?

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 21, 2025

Yep could do, or could check for the presence of an install with porter installations first.

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 28, 2025

Seeing this more frequently when trying to update after a resource installation fails during the terraform phase. I'm sure it used to be possible to run an update after a broken installation.

Noticed porter have recently introduced a force-upgrade option, this might make it possible to recover: getporter/porter@6c56f35

@marrobi
Copy link
Member

marrobi commented Jan 29, 2025

Agree, I am sure it used to work. Don't think its that PR though, as the error "could not find installation" is issued adn the code returns before opts.ForceUpgrade is evaluated.

	if err != nil {
		return span.Errorf("could not find installation %s/%s: %w", opts.Namespace, opts.Name, err)
	}

@marrobi
Copy link
Member

marrobi commented Jan 29, 2025

Ideally if this line fails with a "could not find installation" error, https://github.com/getporter/porter/blob/fe65874aa3a17d647f47a3163431257d7162911b/cmd/porter/installations.go#L297 would have a --install-if-not-exists flag, that then calls install.

Alternatively we need to check for the install first on the RP, and if doesn't exist, install.

This is one of the times I think it would be easier without porter!

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 29, 2025

Agree, I am sure it used to work. Don't think its that PR though, as the error "could not find installation" is issued adn the code returns before opts.ForceUpgrade is evaluated.

	if err != nil {
		return span.Errorf("could not find installation %s/%s: %w", opts.Namespace, opts.Name, err)
	}

Ah I think it's this change introduced in Porter 1.2.0:

Upgrade should not be allowed if installation is not installed #3213

The RP was using Porter 1.1.1 until recently, hence why we're only seeing this now.

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 29, 2025

@marrobi Suggest we just pass the --force-upgrade flag to make it behave like it did previously?

RP was running 1.1.1, now running 1.2.1

Added in 1.2.0:

	if !i.IsInstalled() {
		return span.Errorf("The installation cannot be upgraded, because it is not installed. Verify the installation name and namespace, and if correct, use porter install.")
	}

Amended in 1.2.1:

	if !i.IsInstalled() && !opts.ForceUpgrade {
		return span.Errorf("The installation cannot be upgraded, because it is not installed. Verify the installation name and namespace, and if correct, use porter install.")
	}

@marrobi
Copy link
Member

marrobi commented Jan 29, 2025

That error is cannot be upgraded rather than cannot be found. Not sure it gets that far in the code before failing.

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 29, 2025

Oh dear yes you're right, thought that was too easy. I'll take another look

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 29, 2025

Ah so... I am also seeing the other error (this just occured earlier today - deployment failed on an Azure SQL installation, then tried to Update to recover it).

I think I may have conflated the two together, hence my confusion above.

Error message: The installation cannot be upgraded, because it is not installed. Verify the installation name and namespace, and if correct, use porter install. The installation cannot be upgraded, because it is not installed. Verify the installation name and namespace, and if correct, use porter install. ; Command executed: porter upgrade "7ba7d3d6-86db-4dc7-9cf2-c68fddeaa6e0" --reference mytreacr.azurecr.io/tre-workspace-service-azuresql:v1.0.0 --param arm_environment="public" --param arm_use_msi="true" --param azuresql_identity="/subscriptions/XXXXXX/resourcegroups/rg-mytre-mgmt/providers/Microsoft.ManagedIdentity/userAssignedIdentities/id-azuresql-mytre" --param id="xxxxxx" --param sql_sku="S2 | 50 DTUs" --param storage_gb="5" --param tfstate_container_name="tfstate" --param tfstate_resource_group_name="rg-mytre-mgmt" --param tfstate_storage_account_name="mytremgmtstore" --param tre_id="mytree" --param workspace_id="6ded63ed-81c7-4758-bc63-61b0b7c86967" --force --credential-set arm_auth --credential-set aad_auth...

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 29, 2025

Created a separate issue for "The installation cannot be upgraded, because it is not installed" #4291

@marrobi
Copy link
Member

marrobi commented Jan 29, 2025

Ah, yes, it's this one that's caused the issue: getporter/porter#3213

Ideally an upstream change to do #4281 (comment) would help - as in install if does not exist, and add the force-upgrade to resolve the other one.

Or we can check the for the installation in the RP code and try to install if it does not exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants