Skip to content

Releases: pritamdodeja/tft_tasks

v0.04

09 Dec 15:08
Compare
Choose a tag to compare

What's changed?

  • tensorflow data validation now generates the statistics that drive the schema
  • Simplification of MLMetaData class to determine datatypes at runtime
  • Updated README.md with more information about the design
  • Updated comments in code to describe how to run this interactively using argparse

What's next?

  • Further simplification before adding more features.

v0.03

17 Jun 22:35
Compare
Choose a tag to compare

Release Notes v0.03

What's changed?

  • Cleaner namespaces and safer implementation by using data classes and frozen sets.
  • Refactored functions to avoid repetition wherever possible.

What's next?

  • Better testing and documentation. A major reason for the refactoring was to make unit tests easier to write.
  • I will try out this framework on some datasets to see if the structure/relationships can be simplified.

Full Changelog: v0.02...v0.03

v0.02

10 Jun 13:33
Compare
Choose a tag to compare

What's Changed

  1. Better documentation of functions in tft_tasks.py.
  2. Cleanup of dependencies in task functions.
  3. Fixed a bug related to argparse behavior caused by interaction with apache beam pipeline options.

Future Plans

  • Will be driven by feedback. I am thinking to extend this to include other components of tfx and provide richer performance information. I appreciate your feedback about the design and implementation of tft_tasks. Thank you!

Full Changelog: v0.01...v0.02

v0.01

09 Jun 21:05
Compare
Choose a tag to compare

What's Changed

  • First Release. Supports the following functionality:
  1. Sample implementation of an end-to-end ML pipeline in tft_tasks.py using tensorflow transform.
  2. Instrumentation of the pipeline via TracePath in trace_path.py that captures and visualizes the interrelationships amongst the various functions.
  3. An extensible task based framework via task_dag in tft_tasks.py that captures the dependencies among the various functions that comprise the pipeline, and handles executing pre-requisites.
  4. Maintenance of task state via task_state_dictionary in tft_tasks.py so as not to repeat tasks that have already been performed.
  5. Execution performance is captured, but not currently exposed directly to the user. Performance data is available in the MyTracePath object.

Future Plans

  • Will be driven by feedback. I am thinking to extend this to include other components of tfx and provide richer performance information. I appreciate your feedback about the design and implementation of tft_tasks. Thank you!

Full Changelog: https://github.com/pritamdodeja/tft_tasks/commits/v0.01