About rm and tm merge message threads #6087

PleaseGiveMeTheCoke · 2023-11-27T07:43:27Z

PleaseGiveMeTheCoke
Nov 27, 2023

Context

At pull request #6061 , rm and tm merge message threads had been united as one. Is this a safe, effective and reasonable improvement? Here's the theoretical and data support for doing so

Theoretical Support

Increase the number of merge messages sent at a time

In the case of two threads, the only messages merged are those from a single client, rm or tm. In the case of one thread, both rm and tm messages are merged.

Simpler and lighter design

tm message volume is relatively small, the default off does not open the merge thread, so from the design point of view, mergeSendExecutorService will be modified to static class attributes to retain only a copy is more appropriate

Reduce overhead

Merged threads do mostly asynchronous IO that does not involve blocking, so a single thread can focus more on sending asynchronous requests, eliminating the overhead of context switching

Reduced blocking and empty polling

Adding double judgment reduces unnecessary empty polling by prioritizing messages when the list of messages to be sent is not empty, and by allowing more sleep when the list of messages to be sent is empty.

Data Support

Test Tools

Performance test using JMH, CPU occupancy test using arthas

Test Code

@Test
void start() throws Exception {
    Options opt = new OptionsBuilder()
            .include(MergedThreadTest.class.getSimpleName())
            .forks(1)
            .build();
    new Runner(opt).run();
}

@Benchmark
@Threads(16)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 5, time = 10, timeUnit = TimeUnit.SECONDS)
@BenchmarkMode(Mode.AverageTime)
public void sendRequest(Client client) throws TimeoutException {
    GlobalReportRequest globalReport = new GlobalReportRequest();
    globalReport.setXid("123");
    globalReport.setGlobalStatus(GlobalStatus.UnKnown);
    client.tmClient.sendSyncRequest(globalReport);

    GlobalLockQueryRequest request = new GlobalLockQueryRequest();
    request.setXid("123");
    request.setLockKey("lockKeys");
    request.setResourceId("resourceId");
    client.rmClient.sendSyncRequest(request);
}


@State(Scope.Benchmark)
public static class Client {

    static {
        System.setProperty("registry.type", "nacos");
        System.setProperty("registry.nacos.application", "seata-server1");
        System.setProperty("registry.nacos.serverAddr", "localhost:8848");
        System.setProperty("registry.nacos.group", "SEATA_GROUP");
        System.setProperty("registry.nacos.username", "nacos");
        System.setProperty("registry.nacos.password", "nacos");
    }

    public RmNettyRemotingClient rmClient = initRm();

    public TmNettyRemotingClient tmClient = initTM();

    private static RmNettyRemotingClient initRm(){
        RmNettyRemotingClient rmNettyRemotingClient = RmNettyRemotingClient.getInstance("testApplication",
                DefaultValues.DEFAULT_TX_GROUP);
        rmNettyRemotingClient.setTransactionMessageHandler(DefaultRMHandler.get());
        rmNettyRemotingClient.setResourceManager(DefaultResourceManager.get());
        TCCResource tccResource = new TCCResource();
        tccResource.setActionName("action");
        DefaultResourceManager.get().registerResource(tccResource);
        rmNettyRemotingClient.init();

        return rmNettyRemotingClient;
    }

    private static TmNettyRemotingClient initTM() {
        TmNettyRemotingClient tmNettyRemotingClient = TmNettyRemotingClient.getInstance("testApplication",
                DefaultValues.DEFAULT_TX_GROUP, "", "");
        tmNettyRemotingClient.init();
        return tmNettyRemotingClient;
    }
}

Test Data

Before thread merging

Performance Testing Data

8 thread
Warmup Iteration 2: 12139329.543 ±(99.9%) 848561.019 ns/op
Warmup Iteration 3: 11963078.561 ±(99.9%) 960887.166 ns/op
Warmup Iteration 4: 12981660.479 ±(99.9%) 1189936.352 ns/op
Warmup Iteration 5: 12492056.457 ±(99.9%) 883896.984 ns/op
Iteration 1: 13048335.217 ±(99.9%) 1128320.673 ns/op
Iteration 2: 13633428.156 ±(99.9%) 1217889.924 ns/op
Iteration 3: 17895714.077 ±(99.9%) 906983.698 ns/op
Iteration 4: 14793398.420 ±(99.9%) 1110027.805 ns/op
Iteration 5: 13455755.155 ±(99.9%) 658577.878 ns/op
Result "io.seata.core.rpc.netty.v1.MergedThreadTest.sendRequest":
14565326.205 ±(99.9%) 7590835.199 ns/op [Average]
(min, avg, max) = (13048335.217, 14565326.205, 17895714.077), stdev = 1971315.795
CI (99.9%): [6974491.006, 22156161.404] (assumes normal distribution)

16 thread
Warmup Iteration 2: 19813489.277 ±(99.9%) 968142.331 ns/op
Warmup Iteration 3: 20198656.297 ±(99.9%) 749479.682 ns/op
Warmup Iteration 4: 20573418.230 ±(99.9%) 923624.110 ns/op
Warmup Iteration 5: 20253840.605 ±(99.9%) 1177409.617 ns/op
Iteration 1: 20197239.104 ±(99.9%) 888597.560 ns/op
Iteration 2: 20102682.120 ±(99.9%) 985867.184 ns/op
Iteration 3: 21468587.136 ±(99.9%) 664266.590 ns/op
Iteration 4: 20359510.680 ±(99.9%) 548540.336 ns/op
Iteration 5: 19561127.870 ±(99.9%) 980520.992 ns/op
Result "io.seata.core.rpc.netty.v1.MergedThreadTest.sendRequest":
20337829.382 ±(99.9%) 2693668.053 ns/op [Average]
(min, avg, max) = (19561127.870, 20337829.382, 21468587.136), stdev = 699537.039
CI (99.9%): [17644161.329, 23031497.435] (assumes normal distribution)

CPU Occupancy Testing Data

After the thread merge

Performance Testing Data

8 thread
Warmup Iteration 2: 11032606.896 ±(99.9%) 510359.137 ns/op
Warmup Iteration 3: 11430048.542 ±(99.9%) 878519.234 ns/op
Warmup Iteration 4: 10928250.943 ±(99.9%) 862919.770 ns/op
Warmup Iteration 5: 10542404.005 ±(99.9%) 1162601.368 ns/op
Iteration 1: 10176809.317 ±(99.9%) 491029.968 ns/op
Iteration 2: 10377410.748 ±(99.9%) 704883.549 ns/op
Iteration 3: 9895530.796 ±(99.9%) 1226249.260 ns/op
Iteration 4: 11388840.466 ±(99.9%) 928430.523 ns/op
Iteration 5: 11275567.639 ±(99.9%) 929608.316 ns/op
Result "io.seata.core.rpc.netty.v1.MergedThreadTest.sendRequest":
10622831.793 ±(99.9%) 2583784.694 ns/op [Average]
(min, avg, max) = (9895530.796, 10622831.793, 11388840.466), stdev = 671000.680
CI (99.9%): [8039047.099, 13206616.487] (assumes normal distribution)

16 thread
Warmup Iteration 2: 16809090.598 ±(99.9%) 788713.429 ns/op
Warmup Iteration 3: 16926928.220 ±(99.9%) 1060596.141 ns/op
Warmup Iteration 4: 17194963.678 ±(99.9%) 1115132.284 ns/op
Warmup Iteration 5: 17232139.011 ±(99.9%) 1122411.667 ns/op
Iteration 1: 17220331.141 ±(99.9%) 678882.820 ns/op
Iteration 2: 16812748.717 ±(99.9%) 980338.756 ns/op
Iteration 3: 16119182.067 ±(99.9%) 758937.643 ns/op
Iteration 4: 15774596.784 ±(99.9%) 790845.394 ns/op
Iteration 5: 16058259.592 ±(99.9%) 972153.339 ns/op
Result "io.seata.core.rpc.netty.v1.MergedThreadTest.sendRequest":
16397023.660 ±(99.9%) 2302378.216 ns/op [Average]
(min, avg, max) = (15774596.784, 16397023.660, 17220331.141), stdev = 597920.311
CI (99.9%): [14094645.445, 18699401.876] (assumes normal distribution)

CPU Occupancy Testing Data

Test Conclusion

Thread merging has little effect on CPU utilization, but has a more significant effect on request delivery time, reducing request time by about 25%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About rm and tm merge message threads #6087

{{title}}

Replies: 0 comments

Select a reply

About rm and tm merge message threads #6087

PleaseGiveMeTheCoke Nov 27, 2023

Context

Theoretical Support

Increase the number of merge messages sent at a time

Simpler and lighter design

Reduce overhead

Reduced blocking and empty polling

Data Support

Test Tools

Test Code

Test Data

Before thread merging

Performance Testing Data

CPU Occupancy Testing Data

After the thread merge

Performance Testing Data

CPU Occupancy Testing Data

Test Conclusion

Replies: 0 comments

PleaseGiveMeTheCoke
Nov 27, 2023