Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save job_id of bq operator in commandStatus #1808

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hnarimiya
Copy link
Contributor

@hnarimiya hnarimiya commented May 17, 2023

issue

When the bq operator retries, it references the same job_id and cannot retry correctly.
If this is a temporary bq problem, retrying will not solve it.
For example, the case below

{
  "message" : "Error encountered during execution. Retrying may solve the problem.",
  "reason" : "backendError"
}

https://cloud.google.com/bigquery/docs/error-messages

resolve

So I saved job_id under commandStatus.
This will cause the BaseOperator to remove it on each run.

log

before fix

2023-05-17 19:48:54 +0900 [INFO] (0082@[0:new-project:1:1]+new+retry): Submitting BigQuery job: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:54 +0900 [INFO] (0082@[0:new-project:1:1]+new+retry): Checking BigQuery job status: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:55 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:55 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): {
  "location" : "query",
  "message" : "Unrecognized name: HOGE at [1:8]",
  "reason" : "invalidQuery"
}
2023-05-17 19:48:55 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): Task failed, retrying
io.digdag.spi.TaskExecutionException: io.digdag.spi.TaskExecutionException: BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
...
2023-05-17 19:48:55 +0900 [INFO] (0082@[0:new-project:1:1]+new+retry): Submitting BigQuery job: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:55 +0900 [INFO] (0082@[0:new-project:1:1]+new+retry): Checking BigQuery job status: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:55 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:55 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): {
  "location" : "query",
  "message" : "Unrecognized name: HOGE at [1:8]",
  "reason" : "invalidQuery"
}
2023-05-17 19:48:55 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): Task failed, retrying
io.digdag.spi.TaskExecutionException: io.digdag.spi.TaskExecutionException: BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
...
2023-05-17 19:48:55 +0900 [INFO] (0082@[0:new-project:1:1]+new+retry): Submitting BigQuery job: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:55 +0900 [INFO] (0082@[0:new-project:1:1]+new+retry): Checking BigQuery job status: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:56 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:56 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): {
  "location" : "query",
  "message" : "Unrecognized name: HOGE at [1:8]",
  "reason" : "invalidQuery"
}
2023-05-17 19:48:56 +0900 [ERROR] (0082@[0:new-project:1:1]+new+retry): Task +new+retry failed.
BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_a54c0454-d88f-4f83-bea4-fb01049f7302
2023-05-17 19:48:56 +0900 [INFO] (0082@[0:new-project:1:1]+new^failure-alert): type: notify

after fix

2023-05-17 19:58:46 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Submitting BigQuery job: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_e3bf09df-e9c8-4d96-b6d7-ea2c47501806
2023-05-17 19:58:46 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Checking BigQuery job status: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_e3bf09df-e9c8-4d96-b6d7-ea2c47501806
2023-05-17 19:58:46 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_e3bf09df-e9c8-4d96-b6d7-ea2c47501806
2023-05-17 19:58:46 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): {
  "location" : "query",
  "message" : "Unrecognized name: HOGE at [1:8]",
  "reason" : "invalidQuery"
}
2023-05-17 19:58:46 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): Task failed, retrying
io.digdag.spi.TaskExecutionException: io.digdag.spi.TaskExecutionException: BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_e3bf09df-e9c8-4d96-b6d7-ea2c47501806
...
2023-05-17 19:58:47 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Submitting BigQuery job: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_757ec40d-986e-4a7e-8d39-006d4d432f7e
2023-05-17 19:58:47 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Checking BigQuery job status: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_757ec40d-986e-4a7e-8d39-006d4d432f7e
2023-05-17 19:58:47 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_757ec40d-986e-4a7e-8d39-006d4d432f7e
2023-05-17 19:58:47 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): {
  "location" : "query",
  "message" : "Unrecognized name: HOGE at [1:8]",
  "reason" : "invalidQuery"
}
2023-05-17 19:58:47 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): Task failed, retrying
io.digdag.spi.TaskExecutionException: io.digdag.spi.TaskExecutionException: BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_757ec40d-986e-4a7e-8d39-006d4d432f7e
...
2023-05-17 19:58:47 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Submitting BigQuery job: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_2c601416-663d-4df8-a42b-bcc98ddaa9a7
2023-05-17 19:58:48 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Checking BigQuery job status: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_2c601416-663d-4df8-a42b-bcc98ddaa9a7
2023-05-17 19:58:48 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_2c601416-663d-4df8-a42b-bcc98ddaa9a7
2023-05-17 19:58:48 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): {
  "location" : "query",
  "message" : "Unrecognized name: HOGE at [1:8]",
  "reason" : "invalidQuery"
}
2023-05-17 19:58:48 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): Task failed, retrying
io.digdag.spi.TaskExecutionException: io.digdag.spi.TaskExecutionException: BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_2c601416-663d-4df8-a42b-bcc98ddaa9a7
...
2023-05-17 19:58:48 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Submitting BigQuery job: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_ad9cadbf-8f69-4638-97de-bf751f350a3b
2023-05-17 19:58:48 +0900 [INFO] (0079@[0:new-project:1:1]+new+retry): Checking BigQuery job status: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_ad9cadbf-8f69-4638-97de-bf751f350a3b
2023-05-17 19:58:49 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_ad9cadbf-8f69-4638-97de-bf751f350a3b
2023-05-17 19:58:49 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): {
  "location" : "query",
  "message" : "Unrecognized name: HOGE at [1:8]",
  "reason" : "invalidQuery"
}
2023-05-17 19:58:49 +0900 [ERROR] (0079@[0:new-project:1:1]+new+retry): Task +new+retry failed.
BigQuery job failed: xxx:digdag_s0_p_new-project_w_new_t_2_a_1_ad9cadbf-8f69-4638-97de-bf751f350a3b

@hnarimiya
Copy link
Contributor Author

@yoyama @szyn
Could you please give me a review?

@yoyama yoyama requested a review from szyn September 15, 2023 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant