Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a convenient way to visualize an sqlite3.Cursor in a terminal (and maybe also Jupyter) #128905

Open
jpivarski opened this issue Jan 16, 2025 · 1 comment
Labels
stdlib Python modules in the Lib dir topic-sqlite3 type-feature A feature request or enhancement

Comments

@jpivarski
Copy link

jpivarski commented Jan 16, 2025

Feature or enhancement

Proposal:

I think that the sqlite3 module is a hidden gem in the Python standard library: the functionality it provides is like a mini-Pandas, similar to the array module, which is a mini-NumPy. The only problem is that it's more programmatic than interactive, in that it's hard to look at the results of a query without either writing a loop to iterate over sqlite3.Cursor results or passing the query to Pandas (which instantiates all of the results in memory).

This came up in a problem I was working on and I didn't find any standard solutions for visualizing an SQLite query, so I wrote one on StackOverflow. Just it's just complex enough that variants of code like this shouldn't be floating around the web; a single version should be built into the sqlite3.Cursor object.

I'd like a method that you can use like this:

db.execute("SELECT * FROM compleat_works").print()

that prints a table like this:

title           | type      | characters | year_low | year_high
----------------+-----------+------------+----------+----------
'The Sonnets'   | 'poetry'  |       None |     1609 |      1609
'All’s Well tha | 'comedy'  |         23 |     1604 |      1605
'The Tragedy of | 'tragedy' |         42 |     1606 |      1606
'As You Like It | 'comedy'  |         27 |     1599 |      1600
'The Comedy of  | 'comedy'  |         18 |     1594 |      1594
...             | ...       |        ... |      ... |       ...
'A Lover’s Comp | 'poetry'  |       None |     1609 |      1609
'The Passionate | 'poetry'  |       None |     1599 |      1599
'The Phoenix an | 'poetry'  |       None |     1601 |      1601
'The Rape of Lu | 'poetry'  |          2 |     1594 |      1594
'Venus and Adon | 'poetry'  |          2 |     1593 |      1593
--- 44 rows ----+-----------+------------+----------+----------

Here's my sample implementation:

def print(
    self,
    *,
    first: int = 5,
    last: int = 5,
    show_header: bool = True,
    show_count: bool = True,
    total_width: int = 80,
    max_width: int = 15,
    stream = sys.stdout,
):
    # name of each column
    header = [x[0] for x in self.description]
    # width of the repr of each column
    widths = [min(len(x), max_width) for x in header]

    # iterate over the whole Cursor, but only keep `first + 2*last` rows in memory
    first_rows = []
    last_rows = []
    row_count = 0
    for row in self:
        if len(first_rows) < first + last:
            first_rows.append(row)
        last_rows.append(row)
        if len(last_rows) > last:
            last_rows = last_rows[1:]
        row_count += 1

    if row_count <= first + last:
        # if the number of rows <= `first + last`, show them all
        rows = first_rows
    else:
        # otherwise, show the `first`, an ellipsis row, and then `last`
        rows = first_rows[:first] + [[...] * len(header)] + last_rows

    # represent rows with mutable lists so that we can replace them with reprs
    rows = [list(x) for x in rows]

    align = [">"] * len(header)
    for row in rows:
        assert len(row) == len(header), f"{len(row)} columns != {len(header)} columns"
        for i, cell in enumerate(row):
            # if all values are str or bytes (ignoring None), left-align
            if cell != ... and isinstance(cell, (str, bytes)) and cell is not None:
                align[i] = "<"
            # replace data with their repr strings (except ellipsis)
            row[i] = "..." if cell == ... else repr(cell)
            # identify the maximum (string) width of each column, up to max_width
            widths[i] = min(max(widths[i], len(row[i])), max_width)

    # if the table is too wide, replace the last column with ellipsis
    if sum(widths) + (len(widths) - 1) * 2 > total_width:
        header[-1] = "..."
        widths[-1] = 3
        for row in rows:
            row[-1] = "..."

    # if the table is still too wide, remove columns
    while sum(widths) + (len(widths) - 1) * 2 > total_width and len(header) > 1:
        del header[-2]
        del widths[-2]
        for row in rows:
            del row[-2]

    # prepare a format string for each line of text
    formatter = " | ".join(f"{{:{a}{w}s}}" for a, w in zip(align, widths))
    # prepare the horizontal line between header and data
    header_separator = "-+-".join("-" * w for w in widths)

    if show_header:
        # print the table column names and a horizontal line under it
        stream.write(formatter.format(*[x[:w] for x, w in zip(header, widths)]) + "\n")
        stream.write(header_separator + "\n")
    for row in rows:
        # print each table row
        stream.write(formatter.format(*[x[:w] for x, w in zip(row, widths)]) + "\n")
    if show_count:
        # print the number of rows in another horizontal line
        count = f"--- {row_count} rows ---"
        stream.write(count + header_separator[len(count) :] + "\n")

And/or maybe a head method that only shows the beginning, rather than both the beginning and end (to avoid iterating over millions of rows). Maybe also a _repr_html_ method for IPython/Jupyter, but that's probably more specific than the standard model should be (only useful in a third-party library, albeit a major one).

What do you think? Is this going against the direction you want sqlite3 to go? (E.g. are you trying to reduce its presence, rather than increase it?) Is the idea sound, but the method names (print, head) are bad? Is it a problem that it "eats" the Cursor, and should instead be a method on sqlite3.Connection that takes a query: str?

I'm willing to write the PR; I just want to start by checking to see if such a PR would be welcome.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

@jpivarski jpivarski added the type-feature A feature request or enhancement label Jan 16, 2025
@erlend-aasland
Copy link
Contributor

Did you try the sqlite3 REPL? You can run it using python3 -m sqlite3 with Python 3.12 or newer (IIRC). When we added the REPL, we discussed the possibility of adding support for some of the SQLite shell commands, like .headers [on|off]. Take a look at Lib/sqlite3/main.py if you want to take a stab at it.

@picnixz picnixz added extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir and removed extension-modules C modules in the Modules dir labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-sqlite3 type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants