Merge remote-tracking branch 'origin/main' into feat/e2e-fabric-datao…

…ps-sample-v0-2
Azure-Samples · Feb 5, 2025 · 227865f · 227865f
2 parents 89ef274 + 55a5f03
commit 227865f
Show file tree

Hide file tree

Showing 44 changed files with 1,796 additions and 99 deletions.
diff --git a/e2e_samples/fabric_dataops_sample/libraries/test/ddo_transform/data/README.md b/e2e_samples/fabric_dataops_sample/libraries/test/ddo_transform/data/README.md
@@ -0,0 +1,13 @@
+# Data Generation
+
+The data in the files below is generated using Python and the Faker library.
+
+- parking_bay_data.json
+- parking_sensor_data.json
+
+The data includes dummy/fake records for testing and development purposes. The latitude and longitude coordinates are confined to the approximate location within the Microsoft Redmond Campus.
+
+This data will be used for demonstrating:
+
+- Ingestion, standardization, transformation of data engineering pipelines.
+- Writing unit test cases for python and pyspark transformation code.
diff --git a/e2e_samples/parking_sensors/README.md b/e2e_samples/parking_sensors/README.md
@@ -224,7 +224,7 @@ Follow the setup prerequisites, permissions, and deployment environment options.
 2. [Azure Account](https://azure.microsoft.com/en-us/free/) If you do not have one already, create an Azure Account.
    - *Permissions needed*: ability to create and deploy to an azure [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview), a [service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals), and grant the [collaborator role](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to the service principal over the resource group.
 3. [Azure DevOps Project](https://azure.microsoft.com/en-us/products/devops/) : Follow the documentation to create a new project, or use an existing project you wish to deploy these resources to.
-   - *Permissions needed*: ability to create [service connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml), [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml) and [variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml).
+   - *Permissions needed*: It is required to be able to create [Service Connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml),  [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml) , [variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml) and  allow *Manage Project Properties* as [Endpoint Administrator](https://learn.microsoft.com/en-us/azure/devops/pipelines/policies/permissions?view=azure-devops#set-service-connection-security-in-azure-pipelines).
 
 #### Deployment Options
 
@@ -294,16 +294,6 @@ Set up the environment variables as specified, fork the GitHub repository, and l
      **Login and Cluster Configuration**
 
       - Ensure that you have completed the configuration for the variables described in the previous section, titled **Configuration: Variables and Login**.
-
-        - This configuration will be used during the environment deployment process to facilitate login.
-        - Create a `cluster.config.json` Spark configuration from the [`cluster.config.template.json`](./databricks/config/cluster.config.template.json) file. For the "node_type_id" field, select a SKU that is available from the following command in your subscription:
-
-          ```bash
-            az vm list-usage --location "<YOUR_REGION>" -o table
-          ```
-
-
-          In the repository we provide an example, but you need to make sure that the SKU exists on your region and that is available for your subscription.
 
 2. **Deploy Azure resources**
    - `cd` into the `e2e_samples/parking_sensors` folder of the repo.
@@ -442,6 +432,11 @@ The following lists some limitations of the solution and associated deployment s
 
 - Azure DevOps Variable Groups linked to KeyVault can only be created via the UI, cannot be created programmatically and was not incorporated in the automated deployment of the solution.
   - **Workaround**: Deployment add sensitive configuration as "secrets" in Variable Groups with the downside of duplicated information. If you wish, you may manually link a second Variable Group to KeyVault to pull out the secrets. KeyVault secret names should line up with required variables in the Azure DevOps pipelines. See [here](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml#link-secrets-from-an-azure-key-vault) for more information.
+- Azure DevOps Service Connection Removal: If you encounter an error like: *"Cannot delete this service connection while federated credentials for app <app-id> exist in Entra tenant <tenant-id>. Please make sure federated credentials have been removed prior to deleting the service connection."* This issue occurs when you try to delete a Service Connection in the Azure DevOps (AzDo) portal, but the Service Connection has federated credentials that need to be manually removed from the Azure Portal.
+  - **Workaround - Manually Deleting Federated Credentials:**
+    Navigate to the Azure portal and locate your app registration under App Registrations. In the left navigation pane, select Certificates & Secrets and then the Federated Credentials 
+    tab. Delete the federated credential from this section. Once the credential is deleted, you can proceed to delete the app registration in the Azure Portal and the Azure Service 
+    Connection in the AzDo portal.
 - Azure DevOps Environment and Approval Gates can only be managed via the UI, cannot be managed programmatically and was not incorporated in the automated deployment of the solution.
   - **Workaround**: Approval Gates can be easily configured manually. See [here](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/environments?view=azure-devops#approvals) for more information.
 - ADF publishing through the CI/CD pipeline using the npm task still throws and error in the logs due to the missing publish_config.json file but the pipeline completes successfully.

diff --git a/e2e_samples/parking_sensors/databricks/config/cluster.config.json b/e2e_samples/parking_sensors/databricks/config/cluster.config.json
@@ -1,8 +1,8 @@
 {
     "cluster_name": "ddo_cluster",
     "autoscale": { "min_workers": 1, "max_workers": 2 },
-    "spark_version": "15.4.x-scala2.12",
-    "autotermination_minutes": 10,
+    "spark_version": "14.3.x-scala2.12",
+    "autotermination_minutes": 30,
     "node_type_id": "Standard_D4as_v5",
     "data_security_mode": "SINGLE_USER",
     "runtime_engine": "PHOTON",

diff --git a/e2e_samples/parking_sensors/infrastructure/main.bicep b/e2e_samples/parking_sensors/infrastructure/main.bicep
@@ -70,10 +70,6 @@ module keyvault './modules/keyvault.bicep' = {
     keyvault_owner_object_id: keyvault_owner_object_id
     datafactory_principal_id: datafactory.outputs.datafactory_principal_id
   }
-
-  dependsOn: [
-    datafactory
-  ]
 }
 
 
@@ -107,10 +103,6 @@ module diagnostic './modules/diagnostic_settings.bicep' = if (enable_monitoring)
     loganalytics_workspace_name: loganalytics.outputs.loganalyticswsname
     datafactory_name: datafactory.outputs.datafactory_name    
   }
-  dependsOn: [
-    loganalytics
-    datafactory
-  ]
 }
 
 
@@ -149,8 +141,6 @@ module alerts './modules/alerts.bicep' = if (enable_monitoring) {
   }
   dependsOn: [
     loganalytics
-    datafactory
-    actiongroup    
   ]
 }
 
@@ -162,7 +152,6 @@ module data_quality_workbook './modules/data_quality_workbook.bicep' = if (enabl
   }
   dependsOn: [
     loganalytics
-    appinsights    
   ]
 }
 

diff --git a/e2e_samples/parking_sensors/infrastructure/modules/dashboard.bicep b/e2e_samples/parking_sensors/infrastructure/modules/dashboard.bicep
@@ -46,17 +46,7 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
                 {
                   name: 'options'
                   isOptional: true
-                }
-                {
-                  name: 'sharedTimeRange'
-                  isOptional: true
-                }
-              ]
-              #disable-next-line BCP036
-              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
-              settings: {
-                content: {
-                  options: {
+                  value: {
                     chart: {
                       metrics: [
                         {
@@ -96,8 +86,14 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
                     }
                   }
                 }
-              }
-            }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+            }            
           }
           {
             position: {
@@ -111,17 +107,7 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
                 {
                   name: 'options'
                   isOptional: true
-                }
-                {
-                  name: 'sharedTimeRange'
-                  isOptional: true
-                }
-              ]
-              #disable-next-line BCP036
-              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
-              settings: {
-                content: {
-                  options: {
+                  value: {
                     chart: {
                       metrics: [
                         {
@@ -161,8 +147,14 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
                     }
                   }
                 }
-              }
-            }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+            }            
           }
           {
             position: {
@@ -176,17 +168,7 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
                 {
                   name: 'options'
                   isOptional: true
-                }
-                {
-                  name: 'sharedTimeRange'
-                  isOptional: true
-                }
-              ]
-              #disable-next-line BCP036
-              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
-              settings: {
-                content: {
-                  options: {
+                  value: {
                     chart: {
                       metrics: [
                         {
@@ -236,8 +218,14 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
                     }
                   }
                 }
-              }
-            }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+            }            
           }
         ]
       }

diff --git a/e2e_samples/parking_sensors/scripts/clean_up.sh b/e2e_samples/parking_sensors/scripts/clean_up.sh
@@ -54,6 +54,14 @@ delete_all(){
         az ad sp list -o tsv --show-mine --query "[?contains(appDisplayName,'$prefix') && contains(appDisplayName,'$DEPLOYMENT_ID')].displayName"
     fi
 
+    log "\nENTRA APP REGISTRATIONS:\n"
+    if [[ -z $DEPLOYMENT_ID ]] 
+    then
+        az ad app list -o tsv --show-mine --query "[?contains(displayName,'$prefix')].displayName"
+    else
+        az ad app list -o tsv --show-mine --query "[?contains(displayName,'$prefix') && contains(displayName,'$DEPLOYMENT_ID')].displayName"
+    fi
+
     log "\nRESOURCE GROUPS:\n"
     if [[ -z $DEPLOYMENT_ID ]] 
     then
@@ -79,25 +87,46 @@ delete_all(){
 
             log "Deleting service connections that start with '$prefix' in name..."
             [[ -n $prefix ]] &&
+
+                sc_ids=($(az devops service-endpoint list --project "$AZDO_PROJECT" --organization "$AZDO_ORGANIZATION_URL" --query "[?contains(name, '$prefix')].id" -o tsv))
+                for sc_id in "${sc_ids[@]}"; do
+                    log "Processing Service Connection ID: $sc_id"
+                    cleanup_federated_credentials "$sc_id"
+                done
+                #Important:Giving time to the portal process the cleanup
+                wait_for_process
                 az devops service-endpoint list -o tsv --query "[?contains(name, '$prefix')].id" |
                 xargs -r -I % az devops service-endpoint delete --id % --yes
-
+                log "Finished cleaning up Service Connections"
             if [[ -z $DEPLOYMENT_ID ]]
             then
-                log "Deleting service principal that contain '$prefix' in name, created by yourself..."
+                log "Deleting service principals that contain '$prefix' in name, created by yourself..."
                 [[ -n $prefix ]] &&
                     az ad sp list --query "[?contains(appDisplayName,'$prefix')].appId" -o tsv --show-mine | 
                     xargs -r -I % az ad sp delete --id %
             else
-                log "Deleting service principal that contain '$prefix' and $DEPLOYMENT_ID in name, created by yourself..."
+                log "Deleting service principals that contain '$prefix' and $DEPLOYMENT_ID in name, created by yourself..."
                 [[ -n $prefix ]] &&
                     az ad sp list --query "[?contains(appDisplayName,'$prefix') && contains(appDisplayName,'$DEPLOYMENT_ID')].appId" -o tsv --show-mine | 
                     xargs -r -I % az ad sp delete --id %
             fi
 
             if [[ -z $DEPLOYMENT_ID ]]
             then
-                log "Deleting resource groups that comtain '$prefix' in name..."
+                log "Deleting app registrations that contain '$prefix' in name, created by yourself..."
+                [[ -n $prefix ]] &&
+                    az ad app list --query "[?contains(displayName,'$prefix')].appId" -o tsv --show-mine | 
+                    xargs -r -I % az ad app delete --id %
+            else
+                log "Deleting app registrations that contain '$prefix' and $DEPLOYMENT_ID in name, created by yourself..."
+                [[ -n $prefix ]] &&
+                    az ad app list --query "[?contains(displayName,'$prefix') && contains(displayName,'$DEPLOYMENT_ID')].appId" -o tsv --show-mine | 
+                    xargs -r -I % az ad app delete --id %
+            fi
+
+            if [[ -z $DEPLOYMENT_ID ]]
+            then
+                log "Deleting resource groups that contain '$prefix' in name..."
                 [[ -n $prefix ]] &&
                     az group list --query "[?contains(name,'$prefix') && ! contains(name,'dbw')].name" -o tsv |
                     xargs -I % az group delete --verbose --name % -y

diff --git a/e2e_samples/parking_sensors/scripts/common.sh b/e2e_samples/parking_sensors/scripts/common.sh
@@ -127,3 +127,52 @@ create_adf_trigger () {
     adfTUrl="${adfFactoryBaseUrl}/triggers/${name}?api-version=${apiVersion}"
     az rest --method put --uri "$adfTUrl" --body @"${ADF_DIR}"/trigger/"${name}".json -o none
 }
+
+# Function to give time for the portal to process the cleanup
+wait_for_process() {
+    local seconds=${1:-15}
+    log "Giving the portal $seconds seconds to process the information..."
+    sleep "$seconds"
+}
+
+cleanup_federated_credentials() {
+    ##Function used in the Clean_up.sh and deploy_azdo_service_connections_azure.sh scripts
+    local sc_id=$1
+    local spnAppObjId=$(az devops service-endpoint show --id "$sc_id" --org "$AZDO_ORGANIZATION_URL" -p "$AZDO_PROJECT" --query "data.appObjectId" -o tsv)
+    # if the Service connection does not have an associated Service Principal, 
+    # then it means it won't have associated federated credentials
+    if [ -z "$spnAppObjId" ]; then
+        log "Service Principal Object ID not found for Service Connection ID: $sc_id. Skipping federated credential cleanup."
+        return
+    fi
+
+    local spnCredlist=$(az ad app federated-credential list --id "$spnAppObjId" --query "[].id" -o json)
+    log "Attempting to delete federated credentials."
+
+    # Sometimes the Azure Portal needs a little bit more time to process the information.
+    if [ -z "$spnCredlist" ]; then
+        log "It was not possible to list Federated credentials for Service Principal. Retrying once more.."
+        wait_for_process
+        spnCredlist=$(az ad app federated-credential list --id "$spnAppObjId" --query "[].id" -o json)
+        if [ -z "$spnCredlist" ]; then
+            log "It was not possible to list Federated credentials for specified Service Principal."
+            return
+        fi
+    fi
+
+    local credArray=($(echo "$spnCredlist" | jq -r '.[]'))
+    #(&& and ||) to log success or failure of each delete operation
+    for cred in "${credArray[@]}"; do
+        az ad app federated-credential delete --federated-credential-id "$cred" --id "$spnAppObjId" &&
+        log "Deleted federated credential: $cred" || 
+        log "Failed to delete federated credential: $cred"
+    done
+    # Refresh the list of federated credentials
+    spnCredlist=$(az ad app federated-credential list --id "$spnAppObjId" --query "[].id" -o json)
+    if [ "$(echo "$spnCredlist" | jq -e '. | length > 0')" = "true" ]; then
+        log "Failed to delete federated credentials" "danger"
+        exit 1
+    fi
+  log "Completed federated credential cleanup for the Service Principal: $spnAppObjId"
+}
+
diff --git a/e2e_samples/parking_sensors/scripts/configure_databricks.sh b/e2e_samples/parking_sensors/scripts/configure_databricks.sh
@@ -27,6 +27,7 @@ set -o nounset
 # KEYVAULT_RESOURCE_ID
 # KEYVAULT_DNS_NAME
 # USER_NAME
+# AZURE_LOCATION
 
 . ./scripts/common.sh
 
@@ -54,6 +55,43 @@ databricks workspace import "$databricks_folder_name/01_explore.py" --file "./da
 databricks workspace import "$databricks_folder_name/02_standardize.py" --file "./databricks/notebooks/02_standardize.py" --format SOURCE --language PYTHON --overwrite
 databricks workspace import "$databricks_folder_name/03_transform.py" --file "./databricks/notebooks/03_transform.py" --format SOURCE --language PYTHON --overwrite
 
+# Define suitable VM for DB cluster
+file_path="./databricks/config/cluster.config.json"
+
+# Get available VM sizes in the specified region
+vm_sizes=$(az vm list-sizes --location "$AZURE_LOCATION" --output json)
+
+# Get available Databricks node types using the list-node-types API
+node_types=$(databricks clusters list-node-types --output json)
+
+# Extract VM names and node type IDs into temporary files
+echo "$vm_sizes" | jq -r '.[] | .name' > vm_names.txt
+# Get available Databricks node types using the list-node-types API and filter node types to only include those that support Photon
+photon_node_types=$(echo "$node_types" | jq -r '.node_types[] | select(.photon_driver_capable == true) | .node_type_id')
+
+# Find common VM sizes
+common_vms=$(grep -Fwf <(echo "$photon_node_types") vm_names.txt)
+
+# Find the VM with the least resources
+least_resource_vm=$(echo "$vm_sizes" | jq --arg common_vms "$common_vms" '
+  map(select(.name == ($common_vms | split("\n")[]))) |
+  sort_by(.numberOfCores, .memoryInMB) |
+  .[0]
+')
+log "VM with the least resources:$least_resource_vm" "info"
+
+# Update the JSON file with the least resource VM
+if [ -n "$least_resource_vm" ]; then
+    node_type_id=$(echo "$least_resource_vm" | jq -r '.name')
+    jq --arg node_type_id "$node_type_id" '.node_type_id = $node_type_id' "$file_path" > tmp.$$.json && mv tmp.$$.json "$file_path"
+    log "The JSON file at '$file_path' has been updated with the node_type_id: $node_type_id"
+else
+    log "No common VM options found between Azure and Databricks." "error"
+fi
+
+# Clean up temporary files
+rm vm_names.txt
+
 # Create initial cluster, if not yet exists
 # cluster.config.json file needs to refer to one of the available SKUs on yout Region
 # az vm list-skus --location <LOCATION> --all --output table