Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into feat/e2e-fabric-datao…
Browse files Browse the repository at this point in the history
…ps-sample-v0-2
  • Loading branch information
promisinganuj committed Feb 5, 2025
2 parents 89ef274 + 55a5f03 commit 227865f
Show file tree
Hide file tree
Showing 44 changed files with 1,796 additions and 99 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Data Generation

The data in the files below is generated using Python and the Faker library.

- parking_bay_data.json
- parking_sensor_data.json

The data includes dummy/fake records for testing and development purposes. The latitude and longitude coordinates are confined to the approximate location within the Microsoft Redmond Campus.

This data will be used for demonstrating:

- Ingestion, standardization, transformation of data engineering pipelines.
- Writing unit test cases for python and pyspark transformation code.
17 changes: 6 additions & 11 deletions e2e_samples/parking_sensors/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ Follow the setup prerequisites, permissions, and deployment environment options.
2. [Azure Account](https://azure.microsoft.com/en-us/free/) If you do not have one already, create an Azure Account.
- *Permissions needed*: ability to create and deploy to an azure [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview), a [service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals), and grant the [collaborator role](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to the service principal over the resource group.
3. [Azure DevOps Project](https://azure.microsoft.com/en-us/products/devops/) : Follow the documentation to create a new project, or use an existing project you wish to deploy these resources to.
- *Permissions needed*: ability to create [service connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml), [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml) and [variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml).
- *Permissions needed*: It is required to be able to create [Service Connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml), [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml) , [variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml) and allow *Manage Project Properties* as [Endpoint Administrator](https://learn.microsoft.com/en-us/azure/devops/pipelines/policies/permissions?view=azure-devops#set-service-connection-security-in-azure-pipelines).

#### Deployment Options

Expand Down Expand Up @@ -294,16 +294,6 @@ Set up the environment variables as specified, fork the GitHub repository, and l
**Login and Cluster Configuration**

- Ensure that you have completed the configuration for the variables described in the previous section, titled **Configuration: Variables and Login**.

- This configuration will be used during the environment deployment process to facilitate login.
- Create a `cluster.config.json` Spark configuration from the [`cluster.config.template.json`](./databricks/config/cluster.config.template.json) file. For the "node_type_id" field, select a SKU that is available from the following command in your subscription:

```bash
az vm list-usage --location "<YOUR_REGION>" -o table
```


In the repository we provide an example, but you need to make sure that the SKU exists on your region and that is available for your subscription.

2. **Deploy Azure resources**
- `cd` into the `e2e_samples/parking_sensors` folder of the repo.
Expand Down Expand Up @@ -442,6 +432,11 @@ The following lists some limitations of the solution and associated deployment s

- Azure DevOps Variable Groups linked to KeyVault can only be created via the UI, cannot be created programmatically and was not incorporated in the automated deployment of the solution.
- **Workaround**: Deployment add sensitive configuration as "secrets" in Variable Groups with the downside of duplicated information. If you wish, you may manually link a second Variable Group to KeyVault to pull out the secrets. KeyVault secret names should line up with required variables in the Azure DevOps pipelines. See [here](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml#link-secrets-from-an-azure-key-vault) for more information.
- Azure DevOps Service Connection Removal: If you encounter an error like: *"Cannot delete this service connection while federated credentials for app <app-id> exist in Entra tenant <tenant-id>. Please make sure federated credentials have been removed prior to deleting the service connection."* This issue occurs when you try to delete a Service Connection in the Azure DevOps (AzDo) portal, but the Service Connection has federated credentials that need to be manually removed from the Azure Portal.
- **Workaround - Manually Deleting Federated Credentials:**
Navigate to the Azure portal and locate your app registration under App Registrations. In the left navigation pane, select Certificates & Secrets and then the Federated Credentials
tab. Delete the federated credential from this section. Once the credential is deleted, you can proceed to delete the app registration in the Azure Portal and the Azure Service
Connection in the AzDo portal.
- Azure DevOps Environment and Approval Gates can only be managed via the UI, cannot be managed programmatically and was not incorporated in the automated deployment of the solution.
- **Workaround**: Approval Gates can be easily configured manually. See [here](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/environments?view=azure-devops#approvals) for more information.
- ADF publishing through the CI/CD pipeline using the npm task still throws and error in the logs due to the missing publish_config.json file but the pipeline completes successfully.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"cluster_name": "ddo_cluster",
"autoscale": { "min_workers": 1, "max_workers": 2 },
"spark_version": "15.4.x-scala2.12",
"autotermination_minutes": 10,
"spark_version": "14.3.x-scala2.12",
"autotermination_minutes": 30,
"node_type_id": "Standard_D4as_v5",
"data_security_mode": "SINGLE_USER",
"runtime_engine": "PHOTON",
Expand Down
11 changes: 0 additions & 11 deletions e2e_samples/parking_sensors/infrastructure/main.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,6 @@ module keyvault './modules/keyvault.bicep' = {
keyvault_owner_object_id: keyvault_owner_object_id
datafactory_principal_id: datafactory.outputs.datafactory_principal_id
}

dependsOn: [
datafactory
]
}


Expand Down Expand Up @@ -107,10 +103,6 @@ module diagnostic './modules/diagnostic_settings.bicep' = if (enable_monitoring)
loganalytics_workspace_name: loganalytics.outputs.loganalyticswsname
datafactory_name: datafactory.outputs.datafactory_name
}
dependsOn: [
loganalytics
datafactory
]
}


Expand Down Expand Up @@ -149,8 +141,6 @@ module alerts './modules/alerts.bicep' = if (enable_monitoring) {
}
dependsOn: [
loganalytics
datafactory
actiongroup
]
}

Expand All @@ -162,7 +152,6 @@ module data_quality_workbook './modules/data_quality_workbook.bicep' = if (enabl
}
dependsOn: [
loganalytics
appinsights
]
}

Expand Down
66 changes: 27 additions & 39 deletions e2e_samples/parking_sensors/infrastructure/modules/dashboard.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,7 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
{
name: 'options'
isOptional: true
}
{
name: 'sharedTimeRange'
isOptional: true
}
]
#disable-next-line BCP036
type: 'Extension/HubsExtension/PartType/MonitorChartPart'
settings: {
content: {
options: {
value: {
chart: {
metrics: [
{
Expand Down Expand Up @@ -96,8 +86,14 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
}
}
}
}
}
{
name: 'sharedTimeRange'
isOptional: true
}
]
#disable-next-line BCP036
type: 'Extension/HubsExtension/PartType/MonitorChartPart'
}
}
{
position: {
Expand All @@ -111,17 +107,7 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
{
name: 'options'
isOptional: true
}
{
name: 'sharedTimeRange'
isOptional: true
}
]
#disable-next-line BCP036
type: 'Extension/HubsExtension/PartType/MonitorChartPart'
settings: {
content: {
options: {
value: {
chart: {
metrics: [
{
Expand Down Expand Up @@ -161,8 +147,14 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
}
}
}
}
}
{
name: 'sharedTimeRange'
isOptional: true
}
]
#disable-next-line BCP036
type: 'Extension/HubsExtension/PartType/MonitorChartPart'
}
}
{
position: {
Expand All @@ -176,17 +168,7 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
{
name: 'options'
isOptional: true
}
{
name: 'sharedTimeRange'
isOptional: true
}
]
#disable-next-line BCP036
type: 'Extension/HubsExtension/PartType/MonitorChartPart'
settings: {
content: {
options: {
value: {
chart: {
metrics: [
{
Expand Down Expand Up @@ -236,8 +218,14 @@ resource dashboard 'Microsoft.Portal/dashboards@2022-12-01-preview' = {
}
}
}
}
}
{
name: 'sharedTimeRange'
isOptional: true
}
]
#disable-next-line BCP036
type: 'Extension/HubsExtension/PartType/MonitorChartPart'
}
}
]
}
Expand Down
37 changes: 33 additions & 4 deletions e2e_samples/parking_sensors/scripts/clean_up.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,14 @@ delete_all(){
az ad sp list -o tsv --show-mine --query "[?contains(appDisplayName,'$prefix') && contains(appDisplayName,'$DEPLOYMENT_ID')].displayName"
fi

log "\nENTRA APP REGISTRATIONS:\n"
if [[ -z $DEPLOYMENT_ID ]]
then
az ad app list -o tsv --show-mine --query "[?contains(displayName,'$prefix')].displayName"
else
az ad app list -o tsv --show-mine --query "[?contains(displayName,'$prefix') && contains(displayName,'$DEPLOYMENT_ID')].displayName"
fi

log "\nRESOURCE GROUPS:\n"
if [[ -z $DEPLOYMENT_ID ]]
then
Expand All @@ -79,25 +87,46 @@ delete_all(){

log "Deleting service connections that start with '$prefix' in name..."
[[ -n $prefix ]] &&

sc_ids=($(az devops service-endpoint list --project "$AZDO_PROJECT" --organization "$AZDO_ORGANIZATION_URL" --query "[?contains(name, '$prefix')].id" -o tsv))
for sc_id in "${sc_ids[@]}"; do
log "Processing Service Connection ID: $sc_id"
cleanup_federated_credentials "$sc_id"
done
#Important:Giving time to the portal process the cleanup
wait_for_process
az devops service-endpoint list -o tsv --query "[?contains(name, '$prefix')].id" |
xargs -r -I % az devops service-endpoint delete --id % --yes

log "Finished cleaning up Service Connections"
if [[ -z $DEPLOYMENT_ID ]]
then
log "Deleting service principal that contain '$prefix' in name, created by yourself..."
log "Deleting service principals that contain '$prefix' in name, created by yourself..."
[[ -n $prefix ]] &&
az ad sp list --query "[?contains(appDisplayName,'$prefix')].appId" -o tsv --show-mine |
xargs -r -I % az ad sp delete --id %
else
log "Deleting service principal that contain '$prefix' and $DEPLOYMENT_ID in name, created by yourself..."
log "Deleting service principals that contain '$prefix' and $DEPLOYMENT_ID in name, created by yourself..."
[[ -n $prefix ]] &&
az ad sp list --query "[?contains(appDisplayName,'$prefix') && contains(appDisplayName,'$DEPLOYMENT_ID')].appId" -o tsv --show-mine |
xargs -r -I % az ad sp delete --id %
fi

if [[ -z $DEPLOYMENT_ID ]]
then
log "Deleting resource groups that comtain '$prefix' in name..."
log "Deleting app registrations that contain '$prefix' in name, created by yourself..."
[[ -n $prefix ]] &&
az ad app list --query "[?contains(displayName,'$prefix')].appId" -o tsv --show-mine |
xargs -r -I % az ad app delete --id %
else
log "Deleting app registrations that contain '$prefix' and $DEPLOYMENT_ID in name, created by yourself..."
[[ -n $prefix ]] &&
az ad app list --query "[?contains(displayName,'$prefix') && contains(displayName,'$DEPLOYMENT_ID')].appId" -o tsv --show-mine |
xargs -r -I % az ad app delete --id %
fi

if [[ -z $DEPLOYMENT_ID ]]
then
log "Deleting resource groups that contain '$prefix' in name..."
[[ -n $prefix ]] &&
az group list --query "[?contains(name,'$prefix') && ! contains(name,'dbw')].name" -o tsv |
xargs -I % az group delete --verbose --name % -y
Expand Down
49 changes: 49 additions & 0 deletions e2e_samples/parking_sensors/scripts/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,52 @@ create_adf_trigger () {
adfTUrl="${adfFactoryBaseUrl}/triggers/${name}?api-version=${apiVersion}"
az rest --method put --uri "$adfTUrl" --body @"${ADF_DIR}"/trigger/"${name}".json -o none
}

# Function to give time for the portal to process the cleanup
wait_for_process() {
local seconds=${1:-15}
log "Giving the portal $seconds seconds to process the information..."
sleep "$seconds"
}

cleanup_federated_credentials() {
##Function used in the Clean_up.sh and deploy_azdo_service_connections_azure.sh scripts
local sc_id=$1
local spnAppObjId=$(az devops service-endpoint show --id "$sc_id" --org "$AZDO_ORGANIZATION_URL" -p "$AZDO_PROJECT" --query "data.appObjectId" -o tsv)
# if the Service connection does not have an associated Service Principal,
# then it means it won't have associated federated credentials
if [ -z "$spnAppObjId" ]; then
log "Service Principal Object ID not found for Service Connection ID: $sc_id. Skipping federated credential cleanup."
return
fi

local spnCredlist=$(az ad app federated-credential list --id "$spnAppObjId" --query "[].id" -o json)
log "Attempting to delete federated credentials."

# Sometimes the Azure Portal needs a little bit more time to process the information.
if [ -z "$spnCredlist" ]; then
log "It was not possible to list Federated credentials for Service Principal. Retrying once more.."
wait_for_process
spnCredlist=$(az ad app federated-credential list --id "$spnAppObjId" --query "[].id" -o json)
if [ -z "$spnCredlist" ]; then
log "It was not possible to list Federated credentials for specified Service Principal."
return
fi
fi

local credArray=($(echo "$spnCredlist" | jq -r '.[]'))
#(&& and ||) to log success or failure of each delete operation
for cred in "${credArray[@]}"; do
az ad app federated-credential delete --federated-credential-id "$cred" --id "$spnAppObjId" &&
log "Deleted federated credential: $cred" ||
log "Failed to delete federated credential: $cred"
done
# Refresh the list of federated credentials
spnCredlist=$(az ad app federated-credential list --id "$spnAppObjId" --query "[].id" -o json)
if [ "$(echo "$spnCredlist" | jq -e '. | length > 0')" = "true" ]; then
log "Failed to delete federated credentials" "danger"
exit 1
fi
log "Completed federated credential cleanup for the Service Principal: $spnAppObjId"
}

38 changes: 38 additions & 0 deletions e2e_samples/parking_sensors/scripts/configure_databricks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ set -o nounset
# KEYVAULT_RESOURCE_ID
# KEYVAULT_DNS_NAME
# USER_NAME
# AZURE_LOCATION

. ./scripts/common.sh

Expand Down Expand Up @@ -54,6 +55,43 @@ databricks workspace import "$databricks_folder_name/01_explore.py" --file "./da
databricks workspace import "$databricks_folder_name/02_standardize.py" --file "./databricks/notebooks/02_standardize.py" --format SOURCE --language PYTHON --overwrite
databricks workspace import "$databricks_folder_name/03_transform.py" --file "./databricks/notebooks/03_transform.py" --format SOURCE --language PYTHON --overwrite

# Define suitable VM for DB cluster
file_path="./databricks/config/cluster.config.json"

# Get available VM sizes in the specified region
vm_sizes=$(az vm list-sizes --location "$AZURE_LOCATION" --output json)

# Get available Databricks node types using the list-node-types API
node_types=$(databricks clusters list-node-types --output json)

# Extract VM names and node type IDs into temporary files
echo "$vm_sizes" | jq -r '.[] | .name' > vm_names.txt
# Get available Databricks node types using the list-node-types API and filter node types to only include those that support Photon
photon_node_types=$(echo "$node_types" | jq -r '.node_types[] | select(.photon_driver_capable == true) | .node_type_id')

# Find common VM sizes
common_vms=$(grep -Fwf <(echo "$photon_node_types") vm_names.txt)

# Find the VM with the least resources
least_resource_vm=$(echo "$vm_sizes" | jq --arg common_vms "$common_vms" '
map(select(.name == ($common_vms | split("\n")[]))) |
sort_by(.numberOfCores, .memoryInMB) |
.[0]
')
log "VM with the least resources:$least_resource_vm" "info"

# Update the JSON file with the least resource VM
if [ -n "$least_resource_vm" ]; then
node_type_id=$(echo "$least_resource_vm" | jq -r '.name')
jq --arg node_type_id "$node_type_id" '.node_type_id = $node_type_id' "$file_path" > tmp.$$.json && mv tmp.$$.json "$file_path"
log "The JSON file at '$file_path' has been updated with the node_type_id: $node_type_id"
else
log "No common VM options found between Azure and Databricks." "error"
fi

# Clean up temporary files
rm vm_names.txt

# Create initial cluster, if not yet exists
# cluster.config.json file needs to refer to one of the available SKUs on yout Region
# az vm list-skus --location <LOCATION> --all --output table
Expand Down
Loading

0 comments on commit 227865f

Please sign in to comment.