Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bigeye checks for mozilla_org_derived datasets #6473

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alekhyamoz
Copy link
Contributor

@alekhyamoz alekhyamoz commented Nov 12, 2024

Description

Related Tickets & Documents

  • DENG-XXXX
  • DSRE-XXXX

Reviewer, please follow this checklist

┆Issue is synchronized with this Jira Task

@dataops-ci-bot
Copy link

Integration report for "Add Bigeye checks for mozilla_org_derived datasets"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blogs_goals_v2: bigconfig.yml
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/downloads_with_attribution_v2: bigconfig.yml
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_clients_v2: bigconfig.yml
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_sessions_v2: bigconfig.yml
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/www_site_hits_v2: bigconfig.yml
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blogs_goals_v2/bigconfig.yml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blogs_goals_v2/bigconfig.yml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blogs_goals_v2/bigconfig.yml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blogs_goals_v2/bigconfig.yml	2024-11-12 22:12:29.000000000 +0000
@@ -0,0 +1,71 @@
+type: BIGCONFIG_FILE
+
+row_creation_times:
+  column_selectors:
+  - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2.date
+
+saved_metric_definitions:
+  metrics:
+  - saved_metric_id: COUNT_DUPLICATES
+    metric_type:
+      type: PREDEFINED
+      predefined_metric: COUNT_DUPLICATES
+    metric_name: Duplicates (#)
+    group_by:
+    - date
+    threshold:
+      type: CONSTANT
+      upper_bound: 0.0
+      lower_bound: 0.0
+    lookback:
+      lookback_window:
+        interval_type: DAYS
+        interval_value: -1
+      lookback_type: METRIC_TIME
+      bucket_size: DAY
+    rct_overrides:
+    - date
+    metric_schedule:
+      named_schedule:
+        name: Default Schedule - 13:00 UTC
+  - saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK
+    metric_type:
+      type: TEMPLATE
+      template_id: 947
+      aggregation_type: COUNT
+      template_name: visit_identifier_regex_check
+    metric_name: COUNT of visit_identifier_regex_check
+    group_by:
+    - date
+    threshold:
+      type: CONSTANT
+      upper_bound: 0.0
+      lower_bound: 0.0
+    parameters:
+    - key: column_name
+      string_value: visit_identifier
+    lookback:
+      lookback_window:
+        interval_type: DAYS
+        interval_value: -1
+      lookback_type: METRIC_TIME
+      bucket_size: DAY
+    rct_overrides:
+    - date
+    metric_schedule:
+      named_schedule:
+        name: Default Schedule - 13:00 UTC
+
+tag_deployments:
+- collection:
+    name: Google Analytics
+    description: All checks related to GA tables
+  deployments:
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2.visit_identifier
+    metrics:
+    - saved_metric_id: COUNT_DUPLICATES
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2
+    metrics:
+    - saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/downloads_with_attribution_v2/bigconfig.yml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/downloads_with_attribution_v2/bigconfig.yml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/downloads_with_attribution_v2/bigconfig.yml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/downloads_with_attribution_v2/bigconfig.yml	2024-11-12 22:12:29.000000000 +0000
@@ -0,0 +1,38 @@
+type: BIGCONFIG_FILE
+
+row_creation_times:
+  column_selectors:
+  - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.downloads_with_attribution_v2.download_date
+
+saved_metric_definitions:
+  metrics:
+  - saved_metric_id: COUNT_ROWS
+    metric_type:
+      type: PREDEFINED
+      predefined_metric: COUNT_ROWS
+    metric_name: Row count (#)
+    group_by:
+    - download_date
+    threshold:
+      type: CONSTANT
+      lower_bound: 50000.0
+    lookback:
+      lookback_window:
+        interval_type: DAYS
+        interval_value: -1
+      lookback_type: METRIC_TIME
+      bucket_size: DAY
+    rct_overrides:
+    - download_date
+    metric_schedule:
+      named_schedule:
+        name: Default Schedule - 13:00 UTC
+tag_deployments:
+- collection:
+    name: Google Analytics
+    description: All checks related to GA tables
+  deployments:
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.downloads_with_attribution_v2.*
+    metrics:
+    - saved_metric_id: COUNT_ROWS
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_clients_v2/bigconfig.yml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_clients_v2/bigconfig.yml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_clients_v2/bigconfig.yml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_clients_v2/bigconfig.yml	2024-11-12 22:12:29.000000000 +0000
@@ -0,0 +1,61 @@
+type: BIGCONFIG_FILE
+
+row_creation_times:
+  column_selectors:
+  - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_clients_v2.first_seen_date
+
+saved_metric_definitions:
+  metrics:
+  - saved_metric_id: COUNT_DUPLICATES
+    metric_type:
+      type: PREDEFINED
+      predefined_metric: COUNT_DUPLICATES
+    metric_name: Duplicates (#)
+    threshold:
+      type: AUTO
+      sensitivity: MEDIUM
+      upper_bound_only: false
+      lower_bound_only: false
+    rct_overrides:
+    - bigeye-no-rct
+  - saved_metric_id: COUNT_ROWS
+    metric_type:
+      type: PREDEFINED
+      predefined_metric: COUNT_ROWS
+    metric_name: Row count (#)
+    conditions:
+    - "first_seen_date >= '2024-01-01'\n  and first_reported.country IN ('United States',\
+      \ 'Canada')"
+    group_by:
+    - first_seen_date
+    - first_reported.country
+    threshold:
+      type: AUTO
+      sensitivity: MEDIUM
+      upper_bound_only: false
+      lower_bound_only: false
+    lookback:
+      lookback_window:
+        interval_type: DAYS
+        interval_value: -1
+      lookback_type: METRIC_TIME
+      bucket_size: DAY
+    rct_overrides:
+    - first_seen_date
+    metric_schedule:
+      named_schedule:
+        name: Default Schedule - 13:00 UTC
+
+tag_deployments:
+- collection:
+    name: Google Analytics
+    description: All checks related to GA tables
+  deployments:
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_clients_v2.ga_client_id
+    metrics:
+    - saved_metric_id: COUNT_DUPLICATES
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_clients_v2.*
+    metrics:
+    - saved_metric_id: COUNT_ROWS
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_sessions_v2/bigconfig.yml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_sessions_v2/bigconfig.yml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_sessions_v2/bigconfig.yml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/ga_sessions_v2/bigconfig.yml	2024-11-12 22:12:29.000000000 +0000
@@ -0,0 +1,60 @@
+type: BIGCONFIG_FILE
+
+row_creation_times:
+  column_selectors:
+  - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.session_date
+
+saved_metric_definitions:
+  metrics:
+  - saved_metric_id: PERCENT_NULL
+    metric_type:
+      type: PREDEFINED
+      predefined_metric: PERCENT_NULL
+    metric_name: Null (%)
+    threshold:
+      type: CONSTANT
+      upper_bound: 0.0
+      lower_bound: 0.0
+    lookback:
+      lookback_window:
+        interval_type: DAYS
+        interval_value: -1
+      lookback_type: METRIC_TIME
+      bucket_size: DAY
+    rct_overrides:
+    - session_date
+    metric_schedule:
+      named_schedule:
+        name: Default Schedule - 13:00 UTC
+  - saved_metric_id: COUNT_ROWS
+    metric_type:
+      type: PREDEFINED
+      predefined_metric: COUNT_ROWS
+    metric_name: Row count (#)
+    group_by:
+    - ga_session_id
+    - ga_client_id
+    threshold:
+      type: CONSTANT
+      upper_bound: 1.0
+      lower_bound: 0.0
+    rct_overrides:
+    - bigeye-no-rct
+    metric_schedule:
+      named_schedule:
+        name: Default Schedule - 13:00 UTC
+tag_deployments:
+- collection:
+    name: Google Analytics
+    description: All checks related to GA tables
+  deployments:
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.session_date
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.ga_session_id
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.ga_client_id
+    metrics:
+    - saved_metric_id: PERCENT_NULL
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.*
+    metrics:
+    - saved_metric_id: COUNT_ROWS
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/www_site_hits_v2/bigconfig.yml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/www_site_hits_v2/bigconfig.yml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/www_site_hits_v2/bigconfig.yml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/www_site_hits_v2/bigconfig.yml	2024-11-12 22:12:29.000000000 +0000
@@ -0,0 +1,45 @@
+type: BIGCONFIG_FILE
+
+row_creation_times:
+  column_selectors:
+  - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2.date
+
+saved_metric_definitions:
+  metrics:
+  - saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK
+    metric_type:
+      type: TEMPLATE
+      template_id: 947
+      aggregation_type: COUNT
+      template_name: visit_identifier_regex_check
+    metric_name: COUNT of visit_identifier_regex_check
+    group_by:
+    - date
+    threshold:
+      type: CONSTANT
+      upper_bound: 0.0
+      lower_bound: 0.0
+    parameters:
+    - key: column_name
+      string_value: visit_identifier
+    lookback:
+      lookback_window:
+        interval_type: DAYS
+        interval_value: -1
+      lookback_type: METRIC_TIME
+      bucket_size: DAY
+    rct_overrides:
+    - date
+    metric_schedule:
+      named_schedule:
+        name: Default Schedule - 13:00 UTC
+
+tag_deployments:
+- collection:
+    name: Google Analytics
+    description: All checks related to GA tables
+  deployments:
+  - column_selectors:
+    - name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2
+    metrics:
+    - saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK

Link to full diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants