09 Dec 15:08

pritamdodeja

v0.04 Latest

Latest

What's changed?

tensorflow data validation now generates the statistics that drive the schema
Simplification of MLMetaData class to determine datatypes at runtime
Updated README.md with more information about the design
Updated comments in code to describe how to run this interactively using argparse

What's next?

Further simplification before adding more features.

Assets 2

17 Jun 22:35

pritamdodeja

v0.03

Release Notes v0.03

What's changed?

Cleaner namespaces and safer implementation by using data classes and frozen sets.
Refactored functions to avoid repetition wherever possible.

What's next?

Better testing and documentation. A major reason for the refactoring was to make unit tests easier to write.
I will try out this framework on some datasets to see if the structure/relationships can be simplified.

Full Changelog: v0.02...v0.03

Assets 2

10 Jun 13:33

pritamdodeja

v0.02

What's Changed

Better documentation of functions in tft_tasks.py.
Cleanup of dependencies in task functions.
Fixed a bug related to argparse behavior caused by interaction with apache beam pipeline options.

Future Plans

Will be driven by feedback. I am thinking to extend this to include other components of tfx and provide richer performance information. I appreciate your feedback about the design and implementation of tft_tasks. Thank you!

Full Changelog: v0.01...v0.02

Assets 2

09 Jun 21:05

pritamdodeja

v0.01

What's Changed

First Release. Supports the following functionality:

Sample implementation of an end-to-end ML pipeline in tft_tasks.py using tensorflow transform.
Instrumentation of the pipeline via TracePath in trace_path.py that captures and visualizes the interrelationships amongst the various functions.
An extensible task based framework via task_dag in tft_tasks.py that captures the dependencies among the various functions that comprise the pipeline, and handles executing pre-requisites.
Maintenance of task state via task_state_dictionary in tft_tasks.py so as not to repeat tasks that have already been performed.
Execution performance is captured, but not currently exposed directly to the user. Performance data is available in the MyTracePath object.

Future Plans

Will be driven by feedback. I am thinking to extend this to include other components of tfx and provide richer performance information. I appreciate your feedback about the design and implementation of tft_tasks. Thank you!

Full Changelog: https://github.com/pritamdodeja/tft_tasks/commits/v0.01

Assets 2