DataGristle is a Python toolbox of tough and flexible data connectors and analyzers. It's kind of an interactive mix between ETL and data analysis optimized for rapid analysis and manipulation of a wide variety of data.
It's neither an enterprise ETL tool, nor an enterprise analysis, reporting, or data mining tool. It's intended to be an easily-adopted tool for technical analysts that combines the most useful subset of data transformation and analysis capabilities necessary to do 80% of the work. Its open source python codebase allows it to be easily extended to with custom code to handle that always challenging last 20%.
Next Steps:
README markdown
attractive PDF output of gristle_determinator.py
metadata database population
Its objectives include:
multi-platform (linux, mac os, windows)
multi-language (primarily python)
free - no cripple-licensing
primary audience is programming data analysts - not non-technical analysts
primary environment is command-line rather than windows, graphical desktop or eclipse
extensible
allow a bi-directional iteration between ETL & data analysis
can quickly perform initial data analysis prior to longer-duration, deeper analysis with heavier-weight tools.
EXISTING UTILITIES:
gristle_determinator.py
Identifies file formats, generates metadata, prints file analysis report
gristle_diff.py
Shows differences between two files
gristle_file_converter.py
Converts a csv from one dialect to another. Can handle multi-character field delimiters as well as record delimiters.
gristle_filter.py
Applies simple filter logic to file.
Very simplistic utility.
gristle_freq.py
Prints a frequency distribution of any column of an input file.
gristle_graphviz_generator.py
Generates a graphiz dot file based upon an input file and command-line preferences.
gristle_scalar.py
Performs scalar operations (min, max, avg, count unique, etc) on a file
Very simplistic utility.
gristle_slicer.py
Used to extract a subset of columns and rows out of an input file.
gristle_viewer.py
Shows one record from a file at a time - formatted based on metadata.
Product's homepage
Requirements:
· Python