Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horrible query performance since 8.2.0+ #3325

Open
francois-egner opened this issue Oct 2, 2024 · 12 comments
Open

Horrible query performance since 8.2.0+ #3325

francois-egner opened this issue Oct 2, 2024 · 12 comments

Comments

@francois-egner
Copy link

francois-egner commented Oct 2, 2024

I was trying out drizzle, using a PostgreSQL database (v15). I experience a huge performance issue just querying one row of a table with just one row. So I dig a little deeper inside the code and found out that it internally calls the query method of the Client class. I also tried to call the query method of the Client class with a plain query text/string which resulted in a way faster performance. I was able to reproduce everything using the plain pg module:

const pg = new Client({
   connectionString: configuration.get('DATABASE_URL')
});

pg.connect();
const config = {
  text: 'select count(*) as count from matcher.swipes where id = $1',
  rowMode: 'array'
}
const result = await pg.query(config, ['3a1ec2e5-41b9-4866-a210-f5102f66ca4d'])

This pg.query call takes 40-50ms!

const result = await pg.query("select count(*) as count from matcher.swipes where id = '3a1ec2e5-41b9-4866-a210-f5102f66ca4d';")

This pg.query call takes 2-3ms

By downgrading over and over again I found out that version 8.1.0 is the last version that produces the same timing results for both query calls. Now I wonder what I am missing here. I wonder what I am doing wrong and why it seems to only be me having this problem...

@Dimitris-Tzilopoylos
Copy link

Hi there,

I believe this issue might be a duplicate of #3098. It seems to cover similar concerns.

@charmander
Copy link
Collaborator

(8.1.0 → 8.2.0 is when the entire protocol message parsing/serialization was rewritten into pg-protocol.)

This probably isn’t related to the issue, but note that Client#connect() is asynchronous; to avoid involving the query queue, make sure to await pg.connect().

@francois-egner
Copy link
Author

(8.1.0 → 8.2.0 is when the entire protocol message parsing/serialization was rewritten into pg-protocol.)

This probably isn’t related to the issue, but note that Client#connect() is asynchronous; to avoid involving the query queue, make sure to await pg.connect().

Thanks for the hint. I verified that this isnt the cause for my problem. But indeed I should await this for sure :)

@francois-egner
Copy link
Author

Hi there,

I believe this issue might be a duplicate of #3098. It seems to cover similar concerns.

Very interesting. Using Node 16 also fixes the issue with the latest node-progres (8.13.0). Does downgrading to 8.1.0 solve your other problem?

@thedadow451
Copy link

thedadow451 commented Oct 29, 2024

Having the same problem, downgrading to 8.1.0 solves it. I'm using bun, so it's not node specific.

This is taking ~50ms on localhost, so I guess it's 100x slowe than it should

await client.connect();
const start = performance.now();

await client.query({
    text: 'SELECT $1',
    values: [1],
});

const end = performance.now();

@thedadow451
Copy link

thedadow451 commented Oct 31, 2024

@charmander @brianc I went down the rabbit hole of debugging the problem on network and postgres message protocol level. I also did a bisect and confirmed regression was introduced with new protocol 520bd35.

I managed to create repro using tcp protocol. (works both on postgres 9 and postgres 17, so I don't believe it's a new regression in Postgres).

Here's the repro (https://gist.github.com/thedadow451/7c184281253a826c7da374c4f0ce46b7).

It seems like it all comes down to batching (it's not about network overhead since it's localhost), but Postgres takes ~40ms to respond is all messages (parse, bind, execute, sync) are not send as part of single message.

#edit
Managed to fix it locally, I can open PR if you are interested.

@charmander
Copy link
Collaborator

@thedadow451 A PR would be great if you have it, yep! Thanks for investigating.

@thedadow451
Copy link

thedadow451 commented Nov 1, 2024

Sure, I opened the PR @charmander, let me know if there is anything missing!

There seems to be 1000x difference with and without the fix. 🤯
I noticed it as it was affecting almost all of queries in drizzle-orm (my CI tests are taking x20 longer without this fix).

#3340

@francois-egner
Copy link
Author

@thedadow451 Thank you very much for actually doing the digging and providing the potential fix!
I am still wondering why its only a few people having this issue and why the current version is fast with Nodejs 16. Can you guess why after your investigation? With the current information I am not able to make a good guess here.

Thanks again @thedadow451 for the digging :)

@thedadow451
Copy link

@francois-egner

I wanted to verify node16 issue since I was using bun. Funny thing, I'm no longer able to reproduce the issue with either node16, node18 or bun.

I did some changes recently to my system like switching from podman to rootless docker, upgrading psql.

Please feel free to test patch locally (if you still have problem). It's not fixing the root cause, but could be good enough to workaround.

@francois-egner
Copy link
Author

francois-egner commented Nov 3, 2024

@thedadow451 I can confirm that your patch fixes the issue. From 50-60ms down to 5ms.
Node16 still works fine with the latest version (8.13.1). Node18 brings up bad query times again, which the patch fixes too.

@charmander
Copy link
Collaborator

charmander commented Jan 18, 2025

I’m a little surprised that multiple small synchronous writes to a net.Socket stream would translate to multiple TCP packets (as some comments above suggest happens), even with TCP_NODELAY set (which it is), but maybe that’s what changed after Node 16? (Still need to look into it.) What’s more confusing is that pg 8.1.0 also appears to call write multiple times in the same situation (_send call with no more), so I don’t know why downgrading pg would solve it.

Node 16.20.2 also sends multiple TCP packets for multiple writes, so I don’t know why downgrading to Node 16 is helping either, but of course excessive write calls would be a good thing to fix anyway.

Update: pg 8.1.0 is older than I realized, and doesn’t set TCP_NODELAY, so that explains one part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants