-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TraceQL: support mixed-type attribute querying (int/float) #4391
base: main
Are you sure you want to change the base?
Conversation
I apologize for taking so long to get to this. Your analysis is correct! We do generate predicates per column and, since we store integers and floats independently we only scan one of the columns. Given how small int and float columns tend to be (compared to string columns) I think the performance hit of doing this is likely acceptable in exchange for the nicer behavior. What is the behavior in this case? I'm pretty sure this will work b/c the engine will request all values for the two attributes and do the work itself. I believe the engine layer will compare ints and floats correctly but I'm not 100% sure.
Tests should also be added here for the new behavior. These tests build a block and then search for a known trace using a large range of traceql queries. If you add tests here and they pass it means that your changes work from the parquet file all the way up through the engine. This will also break the "allConditions" optimization if the user types any query with a number comparison: I would like preserve the allConditions behavior in this case b/c it's such a nice optimization and number queries are common. I'm not quite sure why the |
3d8f31d
to
812d768
Compare
Thank you for confirming the approach and pointing out the
I verified that
Done. Let me know if I missed something.
Regarding the
Given my limited exposure to Tempo’s internals, I’d appreciate any guidance on whether these routes are viable or if there’s a simpler approach to preserve P.S. Do we care about comparisons with negative values? Should it also be covered? |
812d768
to
171afab
Compare
4970fbd
to
50f5ae5
Compare
This is a really cool change. Ran benchmarks and found no major regressions. Nice tests added ./tempodb. We try to keep those as comprehensive as possible given the complexity of the language.
This case is covered in the ./pkg/traceql tests so I wouldn't worry about it. It occurred to me that this case causes two "OpNone" conditions to the fetch layer and the condition itself is evaluated in the engine, so your changes will not impact it.
Nice improvements here. I like falling back to integer comparison (or nothing) based on if the float has a fractional part.
The right choice would be a Also, if you're interested, plug your queries into this test and run it. It will dump the iterator structure and you can see how your changes have impacted the hierarchy.
Yes, are they not already? reviewing your code I think they would work fine. I think my primary ask at this point would be to keep the int and float switch cases symmetrical. Even though it's trivial can you create a I'm a bit impressed you're taking this on. I wouldn't have guessed someone outside of Grafana would have had the time and patience to find this. benches
|
b0ffcde
to
10b04c5
Compare
I'm not sure if it's worth it. I'd rather rely on your opinion here.
Actually, it turned out they didn't work correctly with negative values. I've updated the shifting logic to fix this. Also, another edge case raises questions: what happens if a float hits MaxInt/MinInt? In some cases, it might cause jumps between MaxInt and MinInt.
Done! Let me know if this aligns with what you had in mind. Plus, I've added some tests in a separate commit. Feel free to let me know if they look odd or need adjustments.
Haha, thanks! Honestly, it's just curiosity. Tempo is a fascinating system, and I've wanted to dive into something challenging like this. It's fun to learn from real-world systems and see how they tackle performance and scalability. :) |
c05b608
to
a679803
Compare
We could try to get tricky here. Like if you do
Yup, I think this communicates better to a future reader what's going on. Thanks for the change. Ok, I was running your branch on Friday to test and we do have one final thing to figure out. This query does not work:
The reason is b/c we handle this special column here: tempo/tempodb/encoding/vparquet4/block_traceql.go Lines 1969 to 1986 in 14efba0
All well known and dedicated columns are strings ... except this one unfortunately. To do this correctly we have to scan both the well known column as well as the general float attribute column if the static value being compared against http status code is a float. To do this performantly I think we will need to build a |
Sounds like a plan. Will do it later. :)
Oh, that's a nice catch! But before rushing into handling this case, I want to address one quick concern. If a user specifies Anyway, if you see real value in covering this edge case, I'm happy to implement it. Let me know what you think! P.S. I found out that I should convert |
It's funny because I really want this PR in, but the only thing blocking it is handling http status code correctly. However, I'd really like to cut a vparquet5 that removes all well known columns (and other cleanup) which would unblock this PR. |
I believe that I finally learn on how to use |
7222fa2
to
9ac7d86
Compare
I wish we had time to work on this. It's an undefined cleanup pass on vParquet with a focus on reducing complexity, number of columns and footer size. One of the things I'd like accomplished is removing the well known columns and instead relying on dedicated columns. |
This comment was marked as outdated.
This comment was marked as outdated.
63aa82c
to
69aa605
Compare
Tested and works! but there's definitely some cleanup to do.
but we don't need to scan the generic attribute column for an int. Int values are guaranteed to be stored in the dedicated column for this attribute name so we only need to scan the generic column for a float. This should simplify the iterators to something like:
unsure why you're seeing nils. I can dig into that a bit. we shouldn't need the filter nil thing. |
69aa605
to
7e2974d
Compare
Oh my gosh! This is what happens when a review lasting too long. I started forgetting what I've been doing. :D Fixed.
If it looks good, there's one more step remaining - need to update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! it looks like we were able to get rid of those nil filter shenanigans. I think this is very very close. All functionality is accounted for. I did run some benchmarks and find a regression we should spend some time to understand. I do expect a bit of overhead due to this change but one particular query is showing a 20% increase in cpu.
I can help dig into this.
These are the queries used in the benches. The regression occurred on traceOrMatch
which you can see below. As you can tell they are crafted for internal data, but they can be rewritten for any block where they get some matches.
statuscode: { span.http.status_code = 200 }
traceOrMatch: { rootServiceName = `tempo-gateway` && (status = error || span.http.status_code = 500)}
complex: {resource.cluster=~"prod.*" && resource.namespace = "tempo-prod" && resource.container="query-frontend" && name = "HTTP GET - tempo_api_v2_search_tags" && span.http.status_code = 200 && duration > 1s}
benches
> benchstat before.txt after.txt
goos: darwin
goarch: arm64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet4
cpu: Apple M3 Pro
│ before.txt │ after.txt │
│ sec/op │ sec/op vs base │
BackendBlockTraceQL/statuscode-11 64.28m ± 2% 64.81m ± 1% +0.83% (p=0.043 n=10)
BackendBlockTraceQL/traceOrMatch-11 249.8m ± 8% 303.7m ± 9% +21.59% (p=0.000 n=10)
BackendBlockTraceQL/complex-11 4.944m ± 5% 4.880m ± 1% ~ (p=0.190 n=10)
geomean 42.98m 45.80m +6.56%
│ before.txt │ after.txt │
│ B/s │ B/s vs base │
BackendBlockTraceQL/statuscode-11 341.4Mi ± 2% 338.8Mi ± 1% ~ (p=0.052 n=10)
BackendBlockTraceQL/traceOrMatch-11 6.695Mi ± 8% 5.541Mi ± 9% -17.24% (p=0.000 n=10)
BackendBlockTraceQL/complex-11 182.0Mi ± 5% 184.4Mi ± 1% ~ (p=0.190 n=10)
geomean 74.66Mi 70.22Mi -5.94%
│ before.txt │ after.txt │
│ MB_io/op │ MB_io/op vs base │
BackendBlockTraceQL/statuscode-11 23.01 ± 0% 23.02 ± 0% +0.04% (p=0.000 n=10)
BackendBlockTraceQL/traceOrMatch-11 1.753 ± 0% 1.766 ± 0% +0.74% (p=0.000 n=10)
BackendBlockTraceQL/complex-11 943.7m ± 0% 943.7m ± 0% ~ (p=1.000 n=10) ¹
geomean 3.364 3.373 +0.26%
¹ all samples are equal
│ before.txt │ after.txt │
│ B/op │ B/op vs base │
BackendBlockTraceQL/statuscode-11 31.19Mi ± 1% 31.29Mi ± 1% ~ (p=0.436 n=10)
BackendBlockTraceQL/traceOrMatch-11 10.597Mi ± 17% 9.867Mi ± 35% ~ (p=0.579 n=10)
BackendBlockTraceQL/complex-11 5.387Mi ± 4% 5.413Mi ± 2% ~ (p=0.631 n=10)
geomean 12.12Mi 11.87Mi -2.09%
│ before.txt │ after.txt │
│ allocs/op │ allocs/op vs base │
BackendBlockTraceQL/statuscode-11 378.4k ± 0% 378.6k ± 0% +0.04% (p=0.000 n=10)
BackendBlockTraceQL/traceOrMatch-11 86.49k ± 1% 86.57k ± 1% ~ (p=0.218 n=10)
BackendBlockTraceQL/complex-11 79.81k ± 0% 79.83k ± 0% +0.02% (p=0.000 n=10)
geomean 137.7k 137.8k +0.05%
I cannot reproduce it. Could you tell me how you generated traces? I scribbled such a Frankenstein monsterpackage vparquet4
import (
"bytes"
"context"
"io"
"math/rand"
"os"
"sort"
"testing"
"time"
"github.com/google/uuid"
"github.com/stretchr/testify/require"
"github.com/grafana/tempo/pkg/tempopb"
"github.com/grafana/tempo/pkg/traceql"
"github.com/grafana/tempo/pkg/util/test"
"github.com/grafana/tempo/tempodb/backend"
"github.com/grafana/tempo/tempodb/backend/local"
"github.com/grafana/tempo/tempodb/encoding/common"
v1_common "github.com/grafana/tempo/pkg/tempopb/common/v1"
v1_resource "github.com/grafana/tempo/pkg/tempopb/resource/v1"
v1_trace "github.com/grafana/tempo/pkg/tempopb/trace/v1"
)
type testTrace struct {
traceID common.ID
trace *tempopb.Trace
}
type testIterator2 struct {
traces []testTrace
}
func (i *testIterator2) Next(context.Context) (common.ID, *tempopb.Trace, error) {
if len(i.traces) == 0 {
return nil, nil, io.EOF
}
tr := i.traces[0]
i.traces = i.traces[1:]
return tr.traceID, tr.trace, nil
}
func (i *testIterator2) Close() {
}
func newTestTraces(traceCount int) []testTrace {
traces := make([]testTrace, 0, traceCount)
for i := 0; i < traceCount; i++ {
traceID := test.ValidTraceID(nil)
if i%2 == 0 {
trace := MakeTraceWithCustomTags(traceID, "tempo-gateway", int64(i), true, true)
traces = append(traces, testTrace{traceID: traceID, trace: trace})
} else {
trace := MakeTraceWithCustomTags(traceID, "megaservice", int64(i), false, false)
traces = append(traces, testTrace{traceID: traceID, trace: trace})
}
}
sort.Slice(traces, func(i, j int) bool {
return bytes.Compare(traces[i].traceID, traces[j].traceID) == -1
})
return traces
}
var (
blockID = uuid.MustParse("6757b4d9-8d6b-4984-a2d7-8ef6294ca503")
)
func TestGenerateBlocks(t *testing.T) {
const (
traceCount = 10000
)
blockDir, ok := os.LookupEnv("TRACEQL_BLOCKDIR")
require.True(t, ok, "TRACEQL_BLOCKDIR env var must be set")
rawR, rawW, _, err := local.New(&local.Config{
Path: blockDir,
})
require.NoError(t, err)
r := backend.NewReader(rawR)
w := backend.NewWriter(rawW)
ctx := context.Background()
cfg := &common.BlockConfig{
BloomFP: 0.01,
BloomShardSizeBytes: 100 * 1024,
}
traces := newTestTraces(traceCount)
iter := &testIterator2{traces: traces}
meta := backend.NewBlockMeta(tenantID, blockID, VersionString, backend.EncNone, "")
meta.TotalObjects = int64(len(iter.traces))
_, err = CreateBlock(ctx, cfg, meta, iter, r, w)
require.NoError(t, err)
}
func MakeTraceWithCustomTags(traceID []byte, service string, intValue int64, isError bool, setHTTP500 bool) *tempopb.Trace {
now := time.Now()
traceID = test.ValidTraceID(traceID)
trace := &tempopb.Trace{
ResourceSpans: make([]*v1_trace.ResourceSpans, 0),
}
var attributes []*v1_common.KeyValue
attributes = append(attributes,
&v1_common.KeyValue{
Key: "stringTag",
Value: &v1_common.AnyValue{
Value: &v1_common.AnyValue_StringValue{StringValue: "value1"},
},
},
&v1_common.KeyValue{
Key: "intTag",
Value: &v1_common.AnyValue{
Value: &v1_common.AnyValue_IntValue{IntValue: intValue},
},
},
)
if setHTTP500 {
attributes = append(attributes,
&v1_common.KeyValue{
Key: "http.status_code",
Value: &v1_common.AnyValue{
Value: &v1_common.AnyValue_IntValue{IntValue: 500},
},
},
)
}
statusCode := v1_trace.Status_STATUS_CODE_OK
statusMsg := "OK"
if isError {
statusCode = v1_trace.Status_STATUS_CODE_ERROR
statusMsg = "Internal Error"
}
trace.ResourceSpans = append(trace.ResourceSpans, &v1_trace.ResourceSpans{
Resource: &v1_resource.Resource{
Attributes: []*v1_common.KeyValue{
{
Key: "service.name",
Value: &v1_common.AnyValue{
Value: &v1_common.AnyValue_StringValue{
StringValue: service,
},
},
},
{
Key: "other",
Value: &v1_common.AnyValue{
Value: &v1_common.AnyValue_StringValue{
StringValue: "other-value",
},
},
},
},
},
ScopeSpans: []*v1_trace.ScopeSpans{
{
Spans: []*v1_trace.Span{
{
Name: "test",
TraceId: traceID,
SpanId: make([]byte, 8),
ParentSpanId: make([]byte, 8),
Kind: v1_trace.Span_SPAN_KIND_CLIENT,
Status: &v1_trace.Status{
Code: statusCode,
Message: statusMsg,
},
StartTimeUnixNano: uint64(now.UnixNano()),
EndTimeUnixNano: uint64(now.Add(time.Second).UnixNano()),
Attributes: attributes,
DroppedLinksCount: rand.Uint32(),
DroppedAttributesCount: rand.Uint32(),
},
},
},
},
})
return trace
}
func BenchmarkMixTraceQL(b *testing.B) {
const query = "{ rootServiceName = `tempo-gateway` && (status = error || span.http.status_code = 500)}"
blockDir, ok := os.LookupEnv("TRACEQL_BLOCKDIR")
require.True(b, ok, "TRACEQL_BLOCKDIR env var must be set")
ctx := context.TODO()
r, _, _, err := local.New(&local.Config{Path: blockDir})
require.NoError(b, err)
rr := backend.NewReader(r)
meta, err := rr.BlockMeta(ctx, blockID, tenantID)
require.NoError(b, err)
opts := common.DefaultSearchOptions()
opts.StartPage = 3
opts.TotalPages = 2
block := newBackendBlock(meta, rr)
_, _, err = block.openForSearch(ctx, opts)
require.NoError(b, err)
b.ResetTimer()
bytesRead := 0
for i := 0; i < b.N; i++ {
e := traceql.NewEngine()
resp, err := e.ExecuteSearch(ctx, &tempopb.SearchRequest{Query: query}, traceql.NewSpansetFetcherWrapper(func(ctx context.Context, req traceql.FetchSpansRequest) (traceql.FetchSpansResponse, error) {
return block.Fetch(ctx, req, opts)
}))
require.NoError(b, err)
require.NotNil(b, resp)
// Read first 20 results (if any)
bytesRead += int(resp.Metrics.InspectedBytes)
}
b.SetBytes(int64(bytesRead) / int64(b.N))
b.ReportMetric(float64(bytesRead)/float64(b.N)/1000.0/1000.0, "MB_io/op")
} generate
after
before
UPD: I'm wondering how to check if a block has dedicated columns at all. |
8e1feb4
to
495c234
Compare
we generally pull a block generated from internal tracing data which is why the benchmarks contain references to loki and tempo. these blocks generally cover a large range of organically created trace data. nice work generating a large block. likely some pattern of data internally at Grafana is causing the regression. maybe you should write some float value http status codes and see what happens?
the meta.json will list all dedicated columns in a block. i do think there's a bug with the current implementation. fixing it may also resolve the regression. not sure. the query
I believe it should be this:
I'm looking into the regression now. |
Yeah, I have the same hypothesis. I've just been looking into how to correctly attach filtering by key. UPD: Done. |
495c234
to
ff2bc15
Compare
Yep, I also think so. Roaming around the code base I got an impression that dedicated columns aren't something by default. I feel I'm missing something. |
Dedicated columns need to be configured manually. They allow us to move data from the main attribute columns into their own "dedicated" column. The primary thing we use this for is to move very large attributes out of the main columns like sql queries or large json objects. The secondary use is to isolate important columns for querying. The docs have some details on how we pick which columns to configure. So I reran benchmarks and the regression is fairly intense and likely not something we can accept. I am still working on determining what is causing the regression and seeing if we can improve performance, but my heads down time is limited. This is really cool work and it's so close to mergeable, but the benches
|
I removed the previous comments because I discovered something promising. |
11f11e5
to
e7828db
Compare
e7828db
to
fde9918
Compare
Ok, I'm back! :) Since last time, I've learned a lot more about how Tempo stores traces in Parquet and how it fetches data. Here's a concise summary of what I found. TL;DRThe dictionary load/time is the bottleneck. Even when the key isn't found. I'd love to see if your dataset behaves similarly (tons of NULL, a small set of real keys, yet a large dictionary overhead). The datasetI reproduced a very similar performance regression where scanning for a key (e.g. Below is an example of my attribute-keys column ( Column 59
It would be really helpful if you share a similar dump of your dataset. For example: Getting parquet metadataparquet-reader --only-metadata /Users/joe/testblock/1/030c8c4f-9d47-4916-aadc-26b90b1d2bc4/data.parquet > metadata.txt
parquet-reader --columns=59 --dump /Users/joe/testblock/1/030c8c4f-9d47-4916-aadc-26b90b1d2bc4/data.parquet > column_59.txt
parquet-reader --columns=63 --dump /Users/joe/testblock/1/030c8c4f-9d47-4916-aadc-26b90b1d2bc4/data.parquet > column_63.txt
parquet-reader --columns=92 --dump /Users/joe/testblock/1/030c8c4f-9d47-4916-aadc-26b90b1d2bc4/data.parquet > column_92.txt That way we can see if you're also dealing with a large dictionary block for just a handful of real values (and lots of NULL). The regressionIn my case, I create an iterator to scan for Looking up float `http.status_code`subIters = append(subIters,
parquetquery.NewJoinIterator(
DefinitionLevelResourceSpansILSSpanAttrs,
[]parquetquery.Iterator{
makeIter(columnPathSpanAttrKey, parquetquery.NewStringInPredicate([]string{cond.Attribute.Name}), "key"),
makeIter(columnPathSpanAttrDouble, pred, "float"),
},
&attributeCollector{},
parquetquery.WithPool(pqAttrPool),
),
) If we replace real predicates with Both predicates are disabledsubIters = append(subIters,
parquetquery.NewJoinIterator(
DefinitionLevelResourceSpansILSSpanAttrs,
[]parquetquery.Iterator{
makeIter(columnPathSpanAttrKey, parquetquery.NewCallbackPredicate(func() bool { return false }), "key"),
makeIter(columnPathSpanAttrDouble, parquetquery.NewCallbackPredicate(func() bool { return false }), "float"),
},
&attributeCollector{},
parquetquery.WithPool(pqAttrPool),
),
) But if we keep the key scanning, the slowdown remains: key's predicate is enabled, value's predicate is disabledsubIters = append(subIters,
parquetquery.NewJoinIterator(
DefinitionLevelResourceSpansILSSpanAttrs,
[]parquetquery.Iterator{
makeIter(columnPathSpanAttrKey, parquetquery.NewStringInPredicate([]string{cond.Attribute.Name}), "key"),
makeIter(columnPathSpanAttrDouble, parquetquery.NewCallbackPredicate(func() bool { return false }), "float"),
},
&attributeCollector{},
parquetquery.WithPool(pqAttrPool),
),
) Even though the dataset doesn't have func (p *StringInPredicate) KeepColumnChunk(cc *ColumnChunkHelper) bool {
if d := cc.Dictionary(); d != nil {
return keepDictionary(d, p.KeepValue)
}
ci, err := cc.ColumnIndex() Benchmarking indicates dictionary loading overhead. Even though there's no matching key, opening and parsing the dictionary itself can be quite expensive. For example: bench
|
Yes, dictionaries can be expensive. For low/medium cardinality columns we have found that they can be a massive performance improvement. Especially when searching for rarely occurring data. They also provide very nice compression.
In vParquet5 I am proposing a set of non-dictionary encoded dedicated columns to put high cardinality data to combat the first issue.
The key not being there is actually one of the fastest things that can occur. It allows us to skip the entire row group.
I believe these stats are for the entire column in that row group and not just the dictionary. Distinct values 0 is weird. I wonder if that's a parquet-go bug. I'm seeing quite similar values for our internal datasets. The reason for the large number of nulls is b/c of the way parquet nests values. Everytime the structure iterates at a lower level than the column you are currently iterating there is a "null" value for this column b/c it didn't have a value. These values are not actually encoded into the column's pages. They are encoded into the repetition and definition levels and reinserted by parquet-go when you read the values. I have been digging deeper into this b/c of the PR you submitted. I have tried a few approaches not evaluating these nulls at all, but none of them come back with the expected perf improvements. While thinking about this I stumbled on this perf improvement which also improves our benchmarks. Once this is merged I'm going to try again b/c I think I'm close to a nice improvement for any situation in which heavy null iteration occurs (like yours!).
Yes! but these are also not doing anything :). Your example is returning false from all these functions which is basically telling the iterator to skip everything.
The dictionary must be opened in order to read the column. Even if we didn't have code that dealt with it explicitly it would still happen behind the scenes in parquet-go Great work here. You are pushing me into details of parquet-go I hadn't reviewed in awhile and finding some nice improvements. |
I've regenerated my dataset to make it more realistic. Then I ran a series of tests comparing different scenarios. Overall, it looks like your improvements not only speed up queries in general but also mitigate (somewhat) a regression introduced by my code. Dataset Generationk6 scriptProbably, you'll need this one to be able to generate massive number of traces: grafana/xk6-client-tracing#32 ENDPOINT=127.0.0.1:4317 ./k6 run --iterations=100000 --vus=1000 ./template.js
import { randomIntBetween } from 'https://jslib.k6.io/k6-utils/1.2.0/index.js';
import { sleep } from 'k6';
import tracing from 'k6/x/tracing';
export const options = {
vus: 1,
duration: "20m",
};
const endpoint = __ENV.ENDPOINT || "otel-collector:4317"
const orgid = __ENV.TEMPO_X_SCOPE_ORGID || "k6-test"
const client = new tracing.Client({
endpoint,
exporter: tracing.EXPORTER_OTLP,
tls: {
insecure: true,
},
headers: {
"X-Scope-Orgid": orgid
}
});
const traceDefaults = {
attributeSemantics: tracing.SEMANTICS_HTTP,
attributes: { "one": "three", "intAttr": 123, "floatAttr": 123.4 },
randomAttributes: { count: 2, cardinality: 5 },
randomEvents: { count: 0.1, exceptionCount: 0.2, randomAttributes: { count: 6, cardinality: 20 } },
}
const traceTemplates = [
{
defaults: traceDefaults,
spans: [
{ service: "shop-backend", name: "list-articles", duration: { min: 200, max: 900 }, attributes: { "http.status_code": 403 } },
{ service: "shop-backend", name: "authenticate", duration: { min: 50, max: 100 }, attributes: { "http.status_code": 412.0, "prettyFloat": 214.0 } },
{ service: "auth-service", name: "authenticate", attributes: { "http.status_code": 500 } },
{ service: "shop-backend", name: "fetch-articles", parentIdx: 0, attributes: { "http.status_code": 500.3, "zerovalue": 0.0 } },
{
service: "article-service",
name: "list-articles",
attributes: { "http.status_code": 200 },
links: [{ attributes: { "link-type": "parent-child" }, randomAttributes: { count: 2, cardinality: 5 } }]
},
{ service: "article-service", name: "select-articles", attributeSemantics: tracing.SEMANTICS_DB },
{ service: "postgres", name: "query-articles", attributeSemantics: tracing.SEMANTICS_DB, randomAttributes: { count: 5 } },
]
},
{
defaults: {
attributes: { "numbers": ["one", "two", "three"] },
attributeSemantics: tracing.SEMANTICS_HTTP,
randomEvents: { count: 2, randomAttributes: { count: 3, cardinality: 10 } },
},
spans: [
{ service: "shop-backend", name: "article-to-cart", duration: { min: 400, max: 1200 } },
{ service: "shop-backend", name: "authenticate", duration: { min: 70, max: 200 } },
{ service: "auth-service", name: "authenticate" },
{ service: "shop-backend", name: "get-article", parentIdx: 0 },
{ service: "article-service", name: "get-article" },
{ service: "article-service", name: "select-articles", attributeSemantics: tracing.SEMANTICS_DB },
{ service: "postgres", name: "query-articles", attributeSemantics: tracing.SEMANTICS_DB, randomAttributes: { count: 2 } },
{ service: "shop-backend", name: "place-articles", parentIdx: 0 },
{ service: "cart-service", name: "place-articles", attributes: { "article.count": 1, "http.status_code": 201 } },
{ service: "cart-service", name: "persist-cart" }
]
},
{
defaults: traceDefaults,
spans: [
{ service: "shop-backend", attributes: { "http.status_code": 403 } },
{ service: "shop-backend", name: "authenticate", attributes: { "http.request.header.accept": ["application/json"] } },
{
service: "auth-service",
name: "authenticate",
attributes: { "http.status_code": 403 },
randomEvents: { count: 0.5, exceptionCount: 2, randomAttributes: { count: 5, cardinality: 5 } }
},
]
},
{
defaults: traceDefaults,
spans: [
{ service: "shop-backend" },
{ service: "shop-backend", name: "authenticate", attributes: { "http.request.header.accept": ["application/json"] } },
{ service: "auth-service", name: "authenticate" },
{
service: "cart-service",
name: "checkout",
randomEvents: { count: 0.5, exceptionCount: 2, exceptionOnError: true, randomAttributes: { count: 5, cardinality: 5 } }
},
{
service: "billing-service",
name: "payment",
randomLinks: { count: 0.5, randomAttributes: { count: 3, cardinality: 10 } },
randomEvents: { exceptionOnError: true, randomAttributes: { count: 4 } }
}
]
},
]
export default function () {
const templateIndex = randomIntBetween(0, traceTemplates.length - 1)
const gen = new tracing.TemplatedGenerator(traceTemplates[templateIndex])
client.push(gen.traces())
sleep(randomIntBetween(1, 5));
}
export function teardown() {
client.shutdown();
} This yields a dataset that looks a bit more diverse. The new columns are less “weird”: Key column (#59)
Doubles column (#63)
Benchmark and TestHere’s the (truncated) test I used. Benchmark codeI run it like: go test -benchmem -count=10 -run=^$ -bench ^BenchmarkMix$ github.com/grafana/tempo/tempodb/encoding/vparquet4 func newQueryExecuter(t require.TestingT) func(query string) (*tempopb.SearchResponse, error) {
blockID := uuid.MustParse("ffd6c4e3-b711-4a4a-a46b-ea395dc54bf3")
ctx := context.TODO()
r, _, _, err := local.New(&local.Config{
Path: path.Join("/home/nd/develop/01/tempo-data/blocks"),
})
require.NoError(t, err)
rr := backend.NewReader(r)
meta, err := rr.BlockMeta(ctx, blockID, tenantID)
require.NoError(t, err)
opts := common.DefaultSearchOptions()
opts.StartPage = 0
opts.TotalPages = 1
block := newBackendBlock(meta, rr)
_, _, err = block.openForSearch(ctx, opts)
require.NoError(t, err)
e := traceql.NewEngine()
return func(query string) (*tempopb.SearchResponse, error) {
return e.ExecuteSearch(ctx, &tempopb.SearchRequest{Query: query}, traceql.NewSpansetFetcherWrapper(func(ctx context.Context, req traceql.FetchSpansRequest) (traceql.FetchSpansResponse, error) {
return block.Fetch(ctx, req, opts)
}))
}
}
func TestMix(t *testing.T) {
execute := newQueryExecuter(t)
{
resp, err := execute(`{span.http.status_code != 500}`)
require.NoError(t, err)
require.NotNil(t, resp)
require.Len(t, resp.Traces, 75392)
}
{
resp, err := execute(`{span.http.status_code = 500}`)
require.NoError(t, err)
require.NotNil(t, resp)
require.Len(t, resp.Traces, 18871)
}
}
func BenchmarkMix(b *testing.B) {
b.ResetTimer()
bytesRead := 0
b.StopTimer()
for i := 0; i < b.N; i++ {
execute := newQueryExecuter(b)
b.StartTimer()
resp, err := execute("{ span.http.status_code = 500 }")
b.StopTimer()
require.NoError(b, err)
require.NotNil(b, resp)
bytesRead += int(resp.Metrics.InspectedBytes)
}
b.SetBytes(int64(bytesRead) / int64(b.N))
b.ReportMetric(float64(bytesRead)/float64(b.N)/1000.0/1000.0, "MB_io/op")
} Results
It's somewhat speculative, but from these numbers, your improvements consistently lower overall query time and also reduce the additional regression my code introduces by ~4-5m. |
What this PR does:
Below is my understanding of the current limitations. Please feel free to correct me if I’ve misunderstood or overlooked something.
Attributes of the same type are stored in the same column. For example, integers are stored in one column and floats in another.
Querying operates in two stages:
The issue arises because predicates are generated based on the operand type. If an attribute is stored as a float but the operand is an integer, the predicate evaluates against the integers column instead of the floats column. This results in incorrect behavior.
Proposed Solution
The idea is to generate predicates for both integers and floats, allowing both columns to be scanned for the queried attribute.
In this PR, I’ve created a proof-of-concept by copying the existing
createAttributeIterator
function tocreateAttributeIterator2
. This duplication is intentional, as the original function is used in multiple places, and I want to avoid introducing unintended side effects until the approach is validated.WDYT? :)
Which issue(s) this PR fixes:
Fixes #4332
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]