Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DENG-4298 Added managed backfill issues to docs #5909

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wwyc
Copy link
Contributor

@wwyc wwyc commented Jul 11, 2024

https://mozilla-hub.atlassian.net/browse/DENG-4298

We have noticed users overwriting existing backfill entries and deleting tables in staging dataset. Adding notes to the docs to bring awareness. It would be most ideal to implement CI checks to support this.

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
  • If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
    in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
  • If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
  • When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.

For modifications to schemas in restricted namespaces (see CODEOWNERS):

┆Issue is synchronized with this Jira Task

@wwyc wwyc requested a review from ANich July 11, 2024 23:49
Copy link
Contributor

@ANich ANich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some thoughts

docs/cookbooks/creating_a_derived_dataset.md Outdated Show resolved Hide resolved
@@ -284,6 +285,7 @@ For our example:

3. You will be notified when swapping is complete.

**Note**. Please announce in the #data-platform-infra-wg Slack channel before deleting any tables in the `backfill_staging_derived` dataset since it may cause issues in the workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what a good approach here is. Restricting delete access would be ideal... I think this could also be a good thing for the person on triage to handle.

Copy link
Contributor Author

@wwyc wwyc Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I am asking @whd for options to restrict delete access to backfill_staging_derived.

If the person on triage would handle this then it means everyone on triage rotation list would have persmission to delete tables there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe this just becomes a known fix for failed backfills during triage, depending on why the backfill failed. Some of this might make it into the DAG docs.

@@ -267,6 +267,7 @@ For our example:
```bash
bqetl backfill create <project>.<dataset>.<table> --start_date=<YYYY-MM-DD> --end_date=<YYYY-MM-DD>
```
**Note** Do not overwrite existing backfill entries since it will cause issues in the workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note** Do not overwrite existing backfill entries since it will cause issues in the workflow.
**Note** Do not overwrite existing backfill entries since it will cause issues in the workflow (e.g. duplicate processing).

@@ -284,6 +285,7 @@ For our example:

3. You will be notified when swapping is complete.

**Note**. Please announce in the #data-platform-infra-wg Slack channel before deleting any tables in the `backfill_staging_derived` dataset since it may cause issues in the workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe this just becomes a known fix for failed backfills during triage, depending on why the backfill failed. Some of this might make it into the DAG docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants