You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this issue is to create a feature engineering script that may be run repeatedly on the occupancy permit dataset, as new entries or files (by year) are added.
This issue was originally posted in the dc_doh_hackathon repository ,which can be found here: issue_10
Start with the Occupancy Permit data in the /Data Sets/Occupancy Permits/ folder in Dropbox.
Write a script that uses this data to produce a feature data table for the number of new occupancy permits issued in the last 4 weeks.
You can find the data format and examples on the Feature Dataset Format tab in this document
Input:
CSV files with data for each given year
Output:
A script that produces a CSV file with the below format:
1 row for each occupancy permit type, and each week, year, and census block
The dataset should include the following columns:
feature_id: The ID for the feature, in this case, "occupancy_permits_issued_last_4_weeks" feature_type: Occupancy permit type, found in the EVENTTYPESCODEDESC column of the source data feature_subtype: Left blank year: The ISO-8601 year of the feature value week: The ISO-8601 week number of the feature value census_block_2010: The 2010 Census Block of the feature value value: The value of the feature, i.e. the number of new occupancy permits of the specified types issued in the given census block during the previous 4 weeks starting from the year and week above.
The final script must be able to be run from the command line taking three arguments:
A folder with the occupancy permit data files (the script should concatenate and merge the files in the directory as appropriate)
The shapefile for census blocks
The output CSV filename
Please also provide a README.md that describes the script and how to run it.
You can model the solution for the command line modifications after the files here or here
Place all of your files in the codefordc/the-rat-hack repository under a new scripts/feature_engineering/extract_occupancy_permit_features/ folder
** Hints:**
The solution to Hackathon issue_3 may provide some helpful inspiration for the data cleaning steps.
The text was updated successfully, but these errors were encountered:
The purpose of this issue is to create a feature engineering script that may be run repeatedly on the occupancy permit dataset, as new entries or files (by year) are added.
This issue was originally posted in the dc_doh_hackathon repository ,which can be found here:
issue_10
Start with the Occupancy Permit data in the
/Data Sets/Occupancy Permits/
folder in Dropbox.Write a script that uses this data to produce a feature data table for the number of new occupancy permits issued in the last 4 weeks.
You can find the data format and examples on the
Feature Dataset Format
tab in this documentInput:
CSV files with data for each given year
Output:
A script that produces a CSV file with the below format:
feature_id
: The ID for the feature, in this case,"occupancy_permits_issued_last_4_weeks"
feature_type
: Occupancy permit type, found in theEVENTTYPESCODEDESC
column of the source datafeature_subtype
: Left blankyear
: The ISO-8601 year of the feature valueweek
: The ISO-8601 week number of the feature valuecensus_block_2010
: The 2010 Census Block of the feature valuevalue
: The value of the feature, i.e. the number of new occupancy permits of the specified types issued in the given census block during the previous 4 weeks starting from the year and week above.The final script must be able to be run from the command line taking three arguments:
Please also provide a
README.md
that describes the script and how to run it.You can model the solution for the command line modifications after the files here or
here
Place all of your files in the codefordc/the-rat-hack repository under a new
scripts/feature_engineering/extract_occupancy_permit_features/
folder** Hints:**
The solution to Hackathon issue_3 may provide some helpful inspiration for the data cleaning steps.
The text was updated successfully, but these errors were encountered: