Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change MySQL's number handling to be more permissive #2290

Merged
merged 25 commits into from
Jun 10, 2020

Conversation

weiznich
Copy link
Member

@weiznich weiznich commented Feb 4, 2020

MySQL is a bit more... lax with types than other backends. It's very
easy to accidentally get a 64 bit integer or decimal when other backends
would continue to give you a 32 bit integer. Right now we're relying on
conversion happening in libmysqlclient for query_by_index. If we
ever want to dump libmysqlclient, we will need to move those
conversions into Diesel.

However, I've opted to do this in such a way that it affects sql_query
as well. Right now there are many differences between what our query
builder says and what MySQL does. (32 bit addition/subtraction returns
64 bit, 32 bit multiplication/division/sum return decimal). Ideally you
should be able to have a struct that derives both Queryable and
QueryableByName, and have that work with the same query built using
the query builder or sql_query.

In order for that to happen, we can't do the conversions until we hit
FromSql, since for sql_query that is the first and only time we
learn what the expected type is.

This is basically a rebase of #1457. I've adjusted the code to now already existing MysqlValue type. Additionally we now also have a lifetime on that type, so that we can just use &[u8] as raw value there, instead of the unsafe pointer/cell stuff. I've opted into not having the ffi type enum in MysqlValue, because we want to expose that type as part of #2182 and that would restrict possible implementations of the mysql backend to such ones that use libmysqlclient.
Additionally I've added a #[non_exhaustive] (requires rust 1.40) to the MysqlType enum and removed some unneed unsafe code from the date/time type conversations.

@weiznich weiznich requested a review from a team February 4, 2020 23:14
@weiznich weiznich force-pushed the rebase/1457 branch 2 times, most recently from 6b6455d to 05d3913 Compare February 4, 2020 23:16
| MYSQL_TYPE_MEDIUM_BLOB
| MYSQL_TYPE_LONG_BLOB
| MYSQL_TYPE_VAR_STRING
| MYSQL_TYPE_GEOMETRY => unimplemented!(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is how do we want to handle this case? Implementing TryFrom would imply that we change the return types in Row and NamedRow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I think we need some actual test cases demonstrating that we even can get those types in the first place to really answer this. Everything ending with 2 we should never get. I suspect everything that just has some modifier like LONG or VAR we can treat identically to the alternatives. Not sure on types like geometry and json. Need to dig more to comment on your statement about From vs TryFrom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/c-api-data-structures.html has at least some information about MYSQL_TYPE_GEOMETRY (and some of the other unhandled types)

@sgrif
Copy link
Member

sgrif commented Feb 5, 2020

Thank you for doing the work to rebase this. I promise I will make the time this week to look at it in depth. There's a few things that stood out on initial skim that I just need to refresh my memory on

@sgrif
Copy link
Member

sgrif commented Feb 5, 2020 via email

@weiznich weiznich added this to the 2.0 milestone Feb 11, 2020
@weiznich weiznich requested a review from sgrif February 17, 2020 22:04
@weiznich weiznich force-pushed the rebase/1457 branch 2 times, most recently from 6d201ef to 538be52 Compare February 19, 2020 22:32
@weiznich
Copy link
Member Author

I think this is blocked on #2248, as clippy complains about "needless fn main in doctest". In practice they are required because of #[macro_use] extern crate diesel;. Seems to be a clippy bug, but I'm not sure if it's worth reporting.

MYSQL_TYPE_GEOMETRY |
// json type, only available on newer mysql versions
// same encoding as string
MYSQL_TYPE_JSON => unimplemented!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these appear anywhere in the docs, and it's not clear if/how we ever receive these. My guess is that they're details of the wire protocol that are handled by the client

Copy link
Member Author

@weiznich weiznich Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've just played a bit with various column types:

CREATE TABLE type_test (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    some_enum ENUM ('a', 'b', 'c') NOT NULL,
    some_set SET('set_a', 'set_b', 'set_c') NOT NULL,
    jdoc JSON NOT NULL,
    geom GEOMETRY NOT NULL,
    some_bit BIT(1) NOT NULL,    
);

INSERT INTO(some_enum, some_set, jdoc, geom, some_bit) VALUES(
    'a',
    'set_b',
    '{"some_json": 42}',
     ST_GeomFromText('POINT(1 1)'),
     b'1',
);

Now using the code from this branch to execute the following sql queries:

    #[derive(Debug, QueryableByName)]
    struct SomeEnum {
        #[sql_type = "Text"]
        some_enum: String,
    }

    #[derive(Debug, QueryableByName)]
    struct SomeSet {
        #[sql_type = "Text"]
        some_set: String,
    }

    #[derive(Debug, QueryableByName)]
    struct Json {
        #[sql_type = "Text"]
        jdoc: String,
    }

    #[derive(Debug, QueryableByName)]
    struct Geometry {
        #[sql_type = "Text"]
        geom: String,
    }

    #[derive(Debug, QueryableByName)]
    struct Bit {
        #[sql_type = "Bool"]
        some_bit: bool,
    }

    let _ = dbg!(sql_query("SELECT some_enum FROM type_tests").get_result::<SomeEnum>(&connection));
    let _ = dbg!(sql_query("SELECT some_set FROM type_tests").get_result::<SomeSet>(&connection));
    let _ = dbg!(sql_query("SELECT jdoc FROM type_tests").get_result::<Json>(&connection));
    let _ = dbg!(sql_query("SELECT some_bit FROM type_tests").get_result::<Bit>(&connection));
    let _ = dbg!(sql_query("SELECT geom FROM type_tests").get_result::<Geometry>(&connection));

yields the following results:

[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_STRING
[diesel_tests/tests/types.rs:1430] sql_query("SELECT some_enum FROM type_tests").get_result::<SomeEnum>(&connection) = Ok(
    SomeEnum {
        some_enum: "b",
    },
)
[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_STRING
[diesel_tests/tests/types.rs:1431] sql_query("SELECT some_set FROM type_tests").get_result::<SomeSet>(&connection) = Ok(
    SomeSet {
        some_set: "set_c",
    },
)
[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_BLOB
[diesel_tests/tests/types.rs:1432] sql_query("SELECT jdoc FROM type_tests").get_result::<Json>(&connection) = Ok(
    Json {
        jdoc: "{\"abc\": 42, \"def\": {\"ghj\": \"foo\"}, \"baz\": []}",
    },
)
[diesel/src/mysql/connection/bind.rs:251] tpe = MYSQL_TYPE_BIT
thread 'main' panicked at 'not yet implemented: Hit currently unsupported type, those variants should probably be variants of MysqlType or at least be mapped to one of the existing types', diesel/src/mysql/connection/bind.rs:288:32
…
[diesel_tests/tests/types.rs:1434] sql_query("SELECT geom FROM type_tests").get_result::<Geometry>(&connection) = Err(
    DatabaseError(
        __Unknown,
        "Using unsupported buffer type: 255  (parameter: 1)",
    ),
)

This means at least the MYSQL_TYPE_BIT variant occurs on client side. As for the geometry error message: It seems like my mysqlclient implementation itself does not support that type, probably related to this bugreport. On the other hand the client does receive that type value, it just cannot handle it. (For example the native rust implementation here seems to handle that type.)

On the other hand doing something like this:

  table! {
        type_tests(id) {
            id -> Integer,
            some_enum -> Text,
            some_set -> Text,
            jdoc -> Text,
            geom -> Blob,
            some_bit ->Text,
        }
    }

 let dsl_res = dbg!(type_tests::table
        .select((
            type_tests::some_enum,
            type_tests::some_set,
            type_tests::jdoc,
            type_tests::geom,
            type_tests::some_bit,
        ))
        .get_result(&connection));

seems to work just fine:

[diesel_tests/tests/types.rs:1422] type_tests::table.select((type_tests::some_enum, type_tests::some_set,
                          type_tests::jdoc, type_tests::geom,
                          type_tests::some_bit)).get_result(&connection) = Ok(
    (
        "b",
        "set_c",
        "{\"abc\": 42, \"def\": {\"ghj\": \"foo\"}, \"baz\": []}",
        [
            0,
            0,
            0,
            0,
            1,
            1,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            240,
            63,
            0,
            0,
            0,
            0,
            0,
            0,
            240,
            63,
        ],
        "\u{1}",
    ),
)

Tested with:

  • mysql Ver 15.1 Distrib 10.3.22-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
  • libmysqlclient 8.0.19-0ubuntu0.19.10.3

Double(x) => Ok(x as Self),
Decimal(bytes) => {
let string = str::from_utf8(bytes)?;
let integer_portion = string.split('.').nth(0).unwrap_or_default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be enforcing that any fractional portion is zero?

@Razican
Copy link
Member

Razican commented Apr 12, 2020

Hi, is there something we can help with?

@weiznich
Copy link
Member Author

@Razican Would be great if someone could do some more research on which of the types are now actually returned by the mysql client API. The best case would be to have an example query/schema for each type variant and some pointer to a official mysql/mariadb documentation page/source code part stating when this type is returned.

@Razican
Copy link
Member

Razican commented Apr 14, 2020

@Razican Would be great if someone could do some more research on which of the types are now actually returned by the mysql client API. The best case would be to have an example query/schema for each type variant and some pointer to a official mysql/mariadb documentation page/source code part stating when this type is returned.

Hmm I have never worked on these kind of things. Do you have documentation on where to start?

@weiznich
Copy link
Member Author

@Razican Finding documentation is basically the problem here. Everything I've already found is linked already linked in the comments above, but that does not explain/mention all type variants at all. Otherwise it's probably just experimenting by using this branch, a bunch of debug statements, a mysql installation and trying to get back different kinds of answers as sketched out in this comment. Having concrete test cases for all that cases would be surely helpful.

@weiznich
Copy link
Member Author

For anyone interested in helping out with this: I've pushed some more experimental changes here with a lot of test cases. Trying to run and fix them on different client/server combinations may be useful.

Copy link
Member

@pksunkara pksunkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pretty thorough review. I might have missed some issues because I don't have a lot of code context yet. I just have one small minor comment. It looks like we are trying to do decimal parsing instead of reusing already existing libs.

use crate::mysql::{Mysql, MysqlValue};
use crate::sql_types::{BigInt, Binary, Double, Float, Integer, SmallInt, Text};

fn decimal_to_integer<T>(bytes: &[u8]) -> deserialize::Result<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be offloading part of this function to BigDecimal::from_bytes which should check all the possible errors for us. And then we should retrieve the T from it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bigdecimal is currently an optional dependency. In my opinion that should stay that way if possible.

@weiznich
Copy link
Member Author

I've pushed a quite large change adding some test cases and trying to fix those on at least some database versions. See this (weiznich#5) PR for details of tested database versions. Another independent test of those assumptions would be great (especially @diesel-rs/contributors )

(The current state is open for a fundamental review, but I will probably add some more changes and quite a bit of cleanup as soon as I've time to continue working on this)

}
}

impl ToSql<sql_types::Json, MysqlValue> for serde_json::Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been meaning to send a PR. I want to do impl ToSql<Json> for String also for Pg. Can we do that here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our rules for adding ToSql imps in diesel itself are roughly :

  1. Its a commonly used type mapping
  2. It cannot fail because of invalid values. All (or almost all) valid values of the rust type must be representable by the corresponding sql type.

I cannot answer the second point with: "Yes that assumption is true" as an arbitrary string is not a valid json value for the general case. Therefore such impl does not exist yet and I would prefer to keep it that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add more support for the Json types which is why I was looking at this. I am just concerned with the performance because the deserialization and serialization happening all the time. Since I was dealing with non-user provided JSON, I wanted to map it to string with an unwrap. I understand your point though, but I would still say that we should add String conversions (thought heavily discouraged in Docs) with unwrap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of what you are exactly trying to achieve in the end. In theory diesel already exposes some helper impls (although a bit more low level) than this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use the postgres @> operator to compare json without me needing to understand them. Let's say I have a table rules with json column rule. I want to query the following:

select * from rules where '{"a": "read", "b": 2}'::jsonb @> rule;

Where the {"a": "read", "b": 2} is user provided but validated externally and confirmed as JSON by a faster json checker (and not deserialized because I don't need the data).

So, 2 things here:

  1. I would never want to store rule as serde::Value in my code.
  2. I would never want to pass a certain json value as serde::Value since I do not want to serialize/deserialize it.

I hope I was able to explain what I mean here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general note: Diesel cannot reason about which things are externally validated. In such cases we generally choose to enforce an internal validation as well as we cannot assume that such an external validation has really happened. For this case this means I'm quite unwilling to add a ToSql<Json, DB> for String impl here as that will clearly blow up for some use cases.
That written for a third party crate there are several possibilities to do something like that without touching serde::Value:

  • Implement a custom query node for inserting the provided json as into the query sql
  • Use a new type wrapper around String and implement ToSql<Json, DB> for this
  • Implement a cast query node, send the value as plain String/Text and cast to jsonb afterwards (technically that's what's already done in your example query)
  • Introduce a new sql type that is basically a crate local version Json and implement

Beside of that diesel does currently not implement support for json operators. I think it's quite straight forward to define them via diesel::infix_operator! in a third party crate, but adding support for a common set of such operators is definitively something I would like to see as part of diesel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know about the insert query node and cast query node. Still working on understanding diesel more. Your answer addresses my problem.

I will definitely add support for the json operators in diesel later. Just trying to get the full overview first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is anything unclear about some internals, just keep asking questions. Unfortunately not everything is documented quite well…

About the operators: See this implementation of the postgres array operators implementation as example for built in support in diesel. Something like that also works in a third party crate.

sgrif and others added 7 commits June 3, 2020 11:54
MySQL is a bit more... lax with types than other backends. It's very
easy to accidentally get a 64 bit integer or decimal when other backends
would continue to give you a 32 bit integer. Right now we're relying on
conversion happening in `libmysqlclient` for `query_by_index`. If we
ever want to dump `libmysqlclient`, we will need to move those
conversions into Diesel.

However, I've opted to do this in such a way that it affects `sql_query`
as well. Right now there are many differences between what our query
builder says and what MySQL does. (32 bit addition/subtraction returns
64 bit, 32 bit multiplication/division/sum return decimal). Ideally you
should be able to have a struct that derives both `Queryable` and
`QueryableByName`, and have that work with the same query built using
the query builder or `sql_query`.

In order for that to happen, we can't do the conversions until we hit
`FromSql`, since for `sql_query` that is the first and only time we
learn what the expected type is.
Also add some links to relevant mysql/mariadb documentation there
Clippy (rightful) complained that we read from a potential unaligned pointer,
which would be undefined behaviour.
The problem is: Clippy continues to complain about this, even if we
use `read_unaligned()` here. This seems to be a clippy bug, see
rust-lang/rust-clippy#2881
@pksunkara
Copy link
Member

Is the only thing left is to get the CI working?

@weiznich
Copy link
Member Author

weiznich commented Jun 3, 2020

At least:

  • Getting the ci right
  • Double checking that the tests work wirh different mysql/mariadb versions
  • Writing/improving a changelog entry
  • minor style fixes

This change fixes a out of buffer write in newer versions of
libmysqlclient. That write was caused by a change in libmysqlclient
around version 8.0.19 that added another field to `MYSQL_TIME` and by
diesel using a statically generated binding (mysqlclient-sys). As
consequence diesel would allocate a to small buffer for the struct
while libmysql writes more bytes that expected, which results in a out
of bound write. Interestingly this does not happen with a recent
libmysqlclient version based on the sources in the mariadb source
code.

As a fix I move an updated version of `MYSQL_TIME` to diesel itself, as I do not have any
control over `mysqlclient-sys`. This change is compatible with older
libmysqlclient versions, because for them we just allocate a larger
buffer than they actually use, which is fine.
@weiznich
Copy link
Member Author

weiznich commented Jun 5, 2020

@diesel-rs/reviewers Assuming that the CI is passing I think this is ready for a final review

Copy link
Member

@pksunkara pksunkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

diesel_tests/tests/types_roundtrip.rs Outdated Show resolved Hide resolved
diesel/src/sql_types/mod.rs Show resolved Hide resolved
fn arbitrary_serde<G: quickcheck::Gen>(g: &mut G, depth: usize) -> serde_json::Value {
use rand::distributions::Alphanumeric;
use rand::Rng;
match g.gen_range(0, if depth > 0 { 4 } else { 6 }) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the 4 and 6 be reverse? Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's correct this way, because:

  • It works
  • It is used to prevent endless recursion, so that we just generate objects and lists for the first level, not for anything else. That means if depth > 0, i.e. we've already recursed at least once we only generate variants that cannot recurse again.

(The second constrain can be lifted to allow some more nesting)

@weiznich weiznich merged commit 9b76d54 into diesel-rs:master Jun 10, 2020
@gpit2286
Copy link

@weiznich I don't mind that you closed #1995, but these fixes don't seem to address the issue unless I'm misreading something. Is the fix to make bool read/write as text from here: #2290 (comment)

If so, there should be a note in the docs about it, yes?

@weiznich
Copy link
Member Author

@gpit2286 This PR changes the underlying behaviour in such a way that now correctly the Bit type is used for bit columns at protocol level. This does not yet change the inferred column types by diesel_cli (that would require a follow up PR) nor does it provide a Bit sql level type as I'm not sure which type would be supported mappings on rust side. (Probably Vec<bool>?)
I'm open to any followup changes here, but I think the fundamental breaking change at protocol level listed in #1995 is done here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants