Data exploration tool backed by DuckDB with publishing to Posit-Connect
LookSee is a Python-based data investigation tool designed to help users explore datasets interactively. It leverages DuckDB for backend storage and querying, Streamlit for building an interactive UI, and Quarto for generating polished reports. LookSee is ideal for data analysts and developers who want a lightweight yet powerful tool for data exploration.
- Data Ingestion:
- Supports multiple file formats (CSV, Parquet, JSON) using DuckDB.
- Automatically extracts metadata from datasets.
- Column Explorer:
- Provides summary statistics (e.g., min, max, mean, standard deviation).
- Displays null counts and unique value counts for each column.
- Interactive Streamlit App:
- Upload datasets or use demo datasets.
- Explore metadata and column summaries interactively.
- Quarto Report Integration:
- Generate polished HTML reports from
.qmd
templates. - Publish reports to Posit Connect directly from LookSee.
- Generate polished HTML reports from
- Configurable via TOML:
- Centralised configuration for file type mappings, logging, and settings.
- Python 3.11 or later.
- Install the following tools:
- Quarto CLI: For rendering
.qmd
files. - Posit Connect CLI (
rsconnect-python
): For deploying Streamlit apps or Quarto reports to Posit Connect.
- Quarto CLI: For rendering
-
Clone this repository:
git clone https://github.com/your-repo/looksee.git cd looksee
-
Install dependencies using
pip
oruv
:pip install -r requirements.txt
-
Ensure
looksee.toml
is present in the root directory.
Run the Streamlit app to explore datasets interactively:
streamlit run streamlit_app.py
- Upload your dataset (CSV, Parquet, JSON) or use demo datasets.
- View metadata such as column names, data types, null counts, and unique counts.
- Explore summary statistics for individual columns.
Generate a polished report from a .qmd
template:
- Render the report locally:
quarto render report.qmd
- Publish the report to Posit Connect:
quarto publish connect report.qmd --server
You can also use LookSee programmatically in your Python projects:
from looksee import LookSee
# Initialise LookSee
looksee = LookSee()
# Ingest data
looksee.ingest_data("data.csv")
# Extract metadata
looksee.extract_metadata()
print(looksee.display_metadata())
# Get column summary
summary = looksee.column_summary("age")
print(summary)
The looksee.toml
file contains all configuration options:
[read_functions]
csv = "read_csv_auto"
parquet = "read_parquet"
json = "read_json_auto"
[settings]
default_table_name = "dataset"
log_file = "looksee.log"
- Add support for new file formats by extending the
[read_functions]
section. - Update logging settings in the
[settings]
section.
Run tests using pytest
to validate the functionality of LookSee:
pytest tests/
Ensure you have a sample dataset (e.g., sample.csv
) in the tests/
directory for testing purposes.
- Add your Posit Connect server:
rsconnect add-server --name posit-server --url https://your-connect-url/
- Authenticate with your API key:
rsconnect login --server posit-server
- Deploy the app:
rsconnect deploy streamlit --entrypoint streamlit_app.py --name looksee-app .
- Render and publish the report directly:
quarto publish connect report.qmd --server https://your-connect-url/
looksee/
├── looksee.py # Core LookSee class implementation
├── streamlit_app.py # Streamlit app for interactive exploration
├── report.qmd # Quarto template for generating reports
├── looksee.toml # Configuration file (file mappings, settings)
├── requirements.txt # Python dependencies
├── tests/
│ ├── test_looksee.py # Pytest tests for LookSee class
│ └── sample.csv # Sample dataset for testing
└── README.md # Project documentation (this file)
- Add support for additional file formats (e.g., Excel, SQLite).
- Enable advanced filtering and querying directly in the Streamlit app.
- Automate deployment workflows using GitHub Actions or similar CI/CD tools.
- Add more visualisation options (e.g., histograms or sparklines).
Contributions are welcome! Please follow these steps:
- Fork this repository.
- Create a feature branch (
git checkout -b feature-name
). - Commit your changes (
git commit -m "Add feature"
). - Push to your branch (
git push origin feature-name
). - Open a pull request.
This project is licensed under the MIT License.