Releases: pritamdodeja/tft_tasks
Releases · pritamdodeja/tft_tasks
v0.04
What's changed?
- tensorflow data validation now generates the statistics that drive the schema
- Simplification of MLMetaData class to determine datatypes at runtime
- Updated README.md with more information about the design
- Updated comments in code to describe how to run this interactively using argparse
What's next?
- Further simplification before adding more features.
v0.03
Release Notes v0.03
What's changed?
- Cleaner namespaces and safer implementation by using data classes and frozen sets.
- Refactored functions to avoid repetition wherever possible.
What's next?
- Better testing and documentation. A major reason for the refactoring was to make unit tests easier to write.
- I will try out this framework on some datasets to see if the structure/relationships can be simplified.
Full Changelog: v0.02...v0.03
v0.02
What's Changed
- Better documentation of functions in
tft_tasks.py
. - Cleanup of dependencies in task functions.
- Fixed a bug related to argparse behavior caused by interaction with apache beam pipeline options.
Future Plans
- Will be driven by feedback. I am thinking to extend this to include other components of tfx and provide richer performance information. I appreciate your feedback about the design and implementation of tft_tasks. Thank you!
Full Changelog: v0.01...v0.02
v0.01
What's Changed
- First Release. Supports the following functionality:
- Sample implementation of an end-to-end ML pipeline in
tft_tasks.py
using tensorflow transform. - Instrumentation of the pipeline via
TracePath
intrace_path.py
that captures and visualizes the interrelationships amongst the various functions. - An extensible task based framework via
task_dag
intft_tasks.py
that captures the dependencies among the various functions that comprise the pipeline, and handles executing pre-requisites. - Maintenance of task state via
task_state_dictionary
intft_tasks.py
so as not to repeat tasks that have already been performed. - Execution performance is captured, but not currently exposed directly to the user. Performance data is available in the
MyTracePath
object.
Future Plans
- Will be driven by feedback. I am thinking to extend this to include other components of tfx and provide richer performance information. I appreciate your feedback about the design and implementation of tft_tasks. Thank you!
Full Changelog: https://github.com/pritamdodeja/tft_tasks/commits/v0.01