Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Features from Occupancy Permit Data #11

Open
eclee25 opened this issue Jan 9, 2018 · 0 comments
Open

Extract Features from Occupancy Permit Data #11

eclee25 opened this issue Jan 9, 2018 · 0 comments

Comments

@eclee25
Copy link
Contributor

eclee25 commented Jan 9, 2018

The purpose of this issue is to create a feature engineering script that may be run repeatedly on the occupancy permit dataset, as new entries or files (by year) are added.

This issue was originally posted in the dc_doh_hackathon repository ,which can be found here:
issue_10

Start with the Occupancy Permit data in the /Data Sets/Occupancy Permits/ folder in Dropbox.
Write a script that uses this data to produce a feature data table for the number of new occupancy permits issued in the last 4 weeks.

You can find the data format and examples on the Feature Dataset Format tab in this document

Input:
CSV files with data for each given year

Output:
A script that produces a CSV file with the below format:

  • 1 row for each occupancy permit type, and each week, year, and census block
  • The dataset should include the following columns:

feature_id: The ID for the feature, in this case, "occupancy_permits_issued_last_4_weeks"
feature_type: Occupancy permit type, found in the EVENTTYPESCODEDESC column of the source data
feature_subtype: Left blank
year: The ISO-8601 year of the feature value
week: The ISO-8601 week number of the feature value
census_block_2010: The 2010 Census Block of the feature value
value: The value of the feature, i.e. the number of new occupancy permits of the specified types issued in the given census block during the previous 4 weeks starting from the year and week above.

The final script must be able to be run from the command line taking three arguments:

  1. A folder with the occupancy permit data files (the script should concatenate and merge the files in the directory as appropriate)
  2. The shapefile for census blocks
  3. The output CSV filename

Please also provide a README.md that describes the script and how to run it.

You can model the solution for the command line modifications after the files here or
here

Place all of your files in the codefordc/the-rat-hack repository under a new scripts/feature_engineering/extract_occupancy_permit_features/ folder

** Hints:**
The solution to Hackathon issue_3 may provide some helpful inspiration for the data cleaning steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants