Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Enable *int64 to write as optional timestamp, *int32 to write as optional date. #472

Open
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

DuanWeiFan
Copy link

At the moment, schema does not allow to be set from *int32 to date, or *int64 to timestamp.

We would like to add a statement that will redirect to it's Elem() whenever it encounters a reflect.Ptr.

This change was made in the past, but for some reason it was reverted later on. here
Adding similar logic back and update schema_test.go

@DuanWeiFan
Copy link
Author

Hi, can someone help take a look at this PR?
The main purpose is to support writing pointers as Parquet schema.

@DuanWeiFan DuanWeiFan changed the title Allow *int64 to write as optional timestamp, *int32 to write as optional date. Enable *int64 to write as optional timestamp, *int32 to write as optional date. Apr 18, 2023
@DuanWeiFan
Copy link
Author

Hi @achille-roussel could you help take a look at this pr?
At the moment parquet SchemaOf does not work for some pointers type. Would like to add this function so that it would work for all different pointers. Thank you.

@achille-roussel
Copy link

Hello @DuanWeiFan, I left Twilio so I'm currently not part of the maintainer team anymore for parquet-go.

Maybe someone else from the team can pick up the review? @kevinburkesegment @bartleyg

kevinburkesegment and others added 25 commits July 12, 2023 10:07
These aren't available outside of the segmentio organization.
ARM64 runners were being used as part of Segment's cloud
infrastructure that we don't have access to now. For now, just remove
the tests.
Dependabot flagged an issue, and while we are at it update all of the
dependencies.

This will fail until we rewrite all of the references in this repo to
match the new import scheme.
I noticed that checksum of testdata/issue368.parquet was different meaning
we were working with the wrong file.

There is no need for further investigation on why we end up with a different
file.
use interface{} instead of any
fixes segmentio#2

When reconstructing the schema for nested structs we call function `fieldByIndex`
to get the pointer to the underlying value of the field that we assign the
values we read into.

This is the comment on fieldByIndex function

```go
// fieldByIndex is like reflect.Value.FieldByIndex but returns the zero-value of
// reflect.Value if one of the fields was a nil pointer instead of panicking.
```

In theory if we have a field with value `*Foo` a call to fieldByIndex should do
the following when this field is not initialized yet (i.e is Nil)

- Create a zero value of `Foo` : `new(Foo)`
- Assign the zero value created to the field.

Before this change this is what fieldByIndex was doing

```go
			if v.IsNil() {
				v = reflect.Value{}
				break
```

`reflect.Value{}` IS NOT A ZERO VALUE of anything , it is an invalid value.

This commit ensures that we correctly initialize zero values of nested
pointer fields.
…field

fix: zero value of nested field pointer
gernest and others added 28 commits July 15, 2023 22:38
This commit  uses sync.Pool and other tricks to ensure
`GenericReader`  is optimized

Here are benchmark results

```
goos: darwin
goarch: amd64
pkg: github.com/parquet-go/parquet-go
cpu: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
                                               │    old.txt    │                new.txt                │
                                               │    sec/op     │    sec/op     vs base                 │
GenericReader/benchmarkRowType/go1.17-8           220.3µ ± ∞ ¹   170.5µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/benchmarkRowType/go1.18-8           154.4µ ± ∞ ¹   106.5µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.17-8              151.6µ ± ∞ ¹   116.6µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.18-8             102.10µ ± ∞ ¹   75.89µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.17-8                141.9µ ± ∞ ¹   107.7µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.18-8                90.40µ ± ∞ ¹   64.77µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.17-8                143.2µ ± ∞ ¹   108.5µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.18-8                92.92µ ± ∞ ¹   64.91µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.17-8                143.1µ ± ∞ ¹   109.2µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.18-8                91.16µ ± ∞ ¹   67.32µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.17-8               142.8µ ± ∞ ¹   109.1µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.18-8               91.87µ ± ∞ ¹   63.51µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.17-8            190.1µ ± ∞ ¹   149.6µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.18-8           135.16µ ± ∞ ¹   98.65µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.17-8    153.1µ ± ∞ ¹   118.3µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.18-8   106.51µ ± ∞ ¹   70.64µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.17-8               176.6µ ± ∞ ¹   145.6µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.18-8              121.36µ ± ∞ ¹   92.26µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.17-8        177.4µ ± ∞ ¹   145.4µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.18-8       123.77µ ± ∞ ¹   92.53µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.17-8                 151.1µ ± ∞ ¹   116.8µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.18-8                 99.38µ ± ∞ ¹   70.54µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.17-8                 217.4µ ± ∞ ¹   191.0µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.18-8                 157.1µ ± ∞ ¹   138.1µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.17-8         215.3µ ± ∞ ¹   192.1µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.18-8         157.1µ ± ∞ ¹   138.7µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.17-8                  2.067m ± ∞ ¹   2.036m ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.18-8                  1.720m ± ∞ ¹   1.686m ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.17-8              145.2µ ± ∞ ¹   111.4µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.18-8              94.19µ ± ∞ ¹   71.14µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/contact/go1.17-8                    273.4µ ± ∞ ¹   230.0µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/contact/go1.18-8                    201.6µ ± ∞ ¹   158.1µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.17-8        152.5µ ± ∞ ¹   115.5µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.18-8       100.67µ ± ∞ ¹   75.21µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.17-8        149.0µ ± ∞ ¹   115.4µ ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.18-8        96.61µ ± ∞ ¹   69.62µ ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                           161.5µ         125.5µ        -22.27%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                               │   old.txt    │                new.txt                 │
                                               │    row/s     │     row/s      vs base                 │
GenericReader/benchmarkRowType/go1.17-8          4.540M ± ∞ ¹    5.864M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/benchmarkRowType/go1.18-8          5.889M ± ∞ ¹    8.535M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.17-8             6.595M ± ∞ ¹    8.576M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.18-8             8.904M ± ∞ ¹   11.980M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.17-8               7.048M ± ∞ ¹    9.281M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.18-8               10.06M ± ∞ ¹    14.04M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.17-8               6.982M ± ∞ ¹    9.220M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.18-8               9.784M ± ∞ ¹   14.007M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.17-8               6.988M ± ∞ ¹    9.157M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.18-8               9.972M ± ∞ ¹   13.505M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.17-8              7.005M ± ∞ ¹    9.163M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.18-8              9.896M ± ∞ ¹   14.315M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.17-8           5.262M ± ∞ ¹    6.684M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.18-8           6.726M ± ∞ ¹    9.216M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.17-8   6.533M ± ∞ ¹    8.454M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.18-8   8.535M ± ∞ ¹   12.870M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.17-8              5.661M ± ∞ ¹    6.869M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.18-8              7.492M ± ∞ ¹    9.854M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.17-8       5.636M ± ∞ ¹    6.876M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.18-8       7.345M ± ∞ ¹    9.826M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.17-8                6.616M ± ∞ ¹    8.559M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.18-8                9.149M ± ∞ ¹   12.889M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.17-8                4.599M ± ∞ ¹    5.235M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.18-8                5.787M ± ∞ ¹    6.585M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.17-8        4.645M ± ∞ ¹    5.205M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.18-8        5.787M ± ∞ ¹    6.553M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.17-8                 483.7k ± ∞ ¹    491.2k ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.18-8                 528.9k ± ∞ ¹    539.8k ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.17-8             6.888M ± ∞ ¹    8.976M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.18-8             9.652M ± ∞ ¹   12.779M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/contact/go1.17-8                   3.658M ± ∞ ¹    4.348M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/contact/go1.18-8                   4.508M ± ∞ ¹    5.750M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.17-8       6.556M ± ∞ ¹    8.656M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.18-8       9.031M ± ∞ ¹   12.088M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.17-8       6.711M ± ∞ ¹    8.662M ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.18-8       9.411M ± ∞ ¹   13.058M ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                          5.904M          7.596M        +28.65%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                               │     old.txt     │                new.txt                 │
                                               │      B/op       │     B/op       vs base                 │
GenericReader/benchmarkRowType/go1.17-8          48000.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/benchmarkRowType/go1.18-8           43656.00 ± ∞ ¹     13.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.17-8             24000.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.18-8             21825.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.17-8               23999.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.18-8               21825.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.17-8               23999.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.18-8               21825.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.17-8               23999.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.18-8               21824.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.17-8              23999.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.18-8              21823.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.17-8            28.851Ki ± ∞ ¹   5.419Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.18-8            26.235Ki ± ∞ ¹   4.930Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.17-8   24000.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.18-8   21825.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.17-8                39.06Ki ± ∞ ¹   15.64Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.18-8                35.52Ki ± ∞ ¹   14.22Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.17-8         39.06Ki ± ∞ ¹   15.64Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.18-8         35.52Ki ± ∞ ¹   14.22Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.17-8                23999.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.18-8                21825.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.17-8                  46.87Ki ± ∞ ¹   23.45Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.18-8                  42.62Ki ± ∞ ¹   21.33Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.17-8          46.87Ki ± ∞ ¹   23.45Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.18-8          42.62Ki ± ∞ ¹   21.33Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.17-8                   728.9Ki ± ∞ ¹   682.0Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.18-8                   460.5Ki ± ∞ ¹   418.2Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.17-8             23999.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.18-8             21825.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/contact/go1.17-8                     78.12Ki ± ∞ ¹   31.28Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/contact/go1.18-8                     71.04Ki ± ∞ ¹   28.45Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.17-8       23999.000 ± ∞ ¹     2.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.18-8       21825.000 ± ∞ ¹     4.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.17-8       24003.000 ± ∞ ¹     5.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.18-8       21828.000 ± ∞ ¹     7.000 ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                            34.41Ki           109.8        -99.69%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                               │   old.txt    │              new.txt              │
                                               │  allocs/op   │  allocs/op    vs base             │
GenericReader/benchmarkRowType/go1.17-8          1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/benchmarkRowType/go1.18-8           909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.17-8             1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/booleanColumn/go1.18-8              909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.17-8               1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/int32Column/go1.18-8                909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.17-8               1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/int64Column/go1.18-8                909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.17-8               1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/floatColumn/go1.18-8                909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.17-8              1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/doubleColumn/go1.18-8               909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.17-8           1899.0 ± ∞ ¹    900.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/byteArrayColumn/go1.18-8           1727.0 ± ∞ ¹    818.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.17-8   1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/fixedLenByteArrayColumn/go1.18-8    909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.17-8              1.999k ± ∞ ¹   1.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/stringColumn/go1.18-8              1818.0 ± ∞ ¹    909.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.17-8       1.999k ± ∞ ¹   1.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/indexedStringColumn/go1.18-8       1818.0 ± ∞ ¹    909.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.17-8                1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/uuidColumn/go1.18-8                 909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.17-8                1.999k ± ∞ ¹   1.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/timeColumn/go1.18-8                1818.0 ± ∞ ¹    909.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.17-8        1.999k ± ∞ ¹   1.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/timeInMillisColumn/go1.18-8        1818.0 ± ∞ ¹    909.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.17-8                 16.98k ± ∞ ¹   15.98k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/mapColumn/go1.18-8                 15.37k ± ∞ ¹   14.46k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.17-8             1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/decimalColumn/go1.18-8              909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/contact/go1.17-8                   2.999k ± ∞ ¹   2.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/contact/go1.18-8                   2.727k ± ∞ ¹   1.818k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.17-8       1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/paddedBooleanColumn/go1.18-8        909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.17-8       1.000k ± ∞ ¹   0.000k ± ∞ ¹  ~ (p=1.000 n=1) ²
GenericReader/optionalInt32Column/go1.18-8        909.0 ± ∞ ¹      0.0 ± ∞ ¹  ~ (p=1.000 n=1) ²
geomean                                          1.434k                       ?               ³ ⁴
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
³ summaries must be >0 to compute geomean
⁴ ratios must be >0 to compute geomean
```
As far as we are concerned, when we receive the bytes we want, its none of our
business if there was an error returned by the interface.

The operation is deemed a success when requested bytes are served.
fix error handling when reading from io.ReaderAt
Absolute  performance  improvement on GenericReader
fixes segmentio#23
closes segmentio#24

Multiple layers of buffers were causing active rows memory to be overwritten  resulting
in data corruption.

While merging rows, we buffer reads to speed up processing, however for cases
where we need more data we were calling the underlying `RowGroup` multiple
times resulting in  active buffers being overwritten.

Fixing `ReadRows` is a bit challenging. Without buffered reads it becomes a
performance bottleneck.

Lucky for us there is an interface `RowWriterTo` which allows us to have
control over the active buffers.

This commit implements `RowWriterTo` for the `MergeRowGroups` implementation.
 In this case we ensure that values not yet written are ot corrupted.
we already state go 1.20 in go.mod
move PR link after description
I'm neither employed nor paid by twilio to maintain this.
Co-authored-by: Filip Petkovski <[email protected]>
Co-authored-by: Kevin Burke <[email protected]>
Update README to state its  community maintained
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants