Incremental Snapshot Feature #5

NiyiOdumosu · 2022-02-10T17:04:36Z

Hello @jpechane and @gunnarmorling, I have been reading the Debezium documentation on the incremental snapshot feature for SQL Server. The documentation says that the feature is incubating. I wanted to know if either of you is working on that feature and if so what is the expected timeline for delivery. If you are not working on that feature, can you point me to the engineers who are developing it? You help is greatly appreciated!

jpechane · 2022-02-11T08:00:32Z

@NiyiOdumosu Hi, the feature is already done. The incubating marker is more like a warning that an API or an implementation might change but it is probably no longer needed.

NiyiOdumosu · 2022-02-11T18:07:00Z

Thanks @jpechane ! I will try to test this feature out in a POC. Mind if I reach out to you if I have any questions?

gunnarmorling · 2022-02-11T18:21:24Z

Please bring any questions to our mailing list: https://groups.google.com/g/debezium. That way, it will get the most eyeballs, and future readers will benefit from any replies, too. Thanks!

…

Message ID: ***@***.*** .com>

NiyiOdumosu · 2022-03-09T14:36:07Z

Hello!
In the past, we have used the DBZ SQL Server connector to migrate large volumes of historical data. If a table had billions of records and the connector failed while migrating, we would have to restart the connector and it would produce data from the beginning of the table again. We hoped that the incremental snapshot feature would solve this by taking a snapshots of the table and resume producing from the last committed offset.

Testing Scenario
We simulated a failure by deleting the connector in the middle of the migration. Then we restarted the connector hoping it will pick up where it left off. Instead what is happening is that it is producing the rows from the beginning of the table. Below are the stats. Keep in mind there are 1,408,376 records in the table.

How can we eliminate or at least minimize the duplicates so that the connector doesn't reproduce all the data it produced before it failed? Are there any configurations that we can modify to assist with this?

Any help would be greatly appreciated!
@jpechane @gunnarmorling

jpechane · 2022-03-15T08:06:49Z

@NiyiOdumosu Hi, could you please move the issue to Jira or to the chat/mailing list? The GitHub issues are not used by us. Thanks. Also while in that please provide the logs and also the offsets value before the connector restart.

NiyiOdumosu · 2022-03-15T13:10:52Z

Hey @jpechane I have posted this question on the google groups twice and I have not received a response. I was not aware there was a Jira backlog I can post it to. Can you please send me the jira link?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Snapshot Feature #5

Incremental Snapshot Feature #5

NiyiOdumosu commented Feb 10, 2022

jpechane commented Feb 11, 2022

NiyiOdumosu commented Feb 11, 2022

gunnarmorling commented Feb 11, 2022 via email

NiyiOdumosu commented Mar 9, 2022 •

edited

Loading

jpechane commented Mar 15, 2022

NiyiOdumosu commented Mar 15, 2022

Incremental Snapshot Feature #5

Incremental Snapshot Feature #5

Comments

NiyiOdumosu commented Feb 10, 2022

jpechane commented Feb 11, 2022

NiyiOdumosu commented Feb 11, 2022

gunnarmorling commented Feb 11, 2022 via email

NiyiOdumosu commented Mar 9, 2022 • edited Loading

jpechane commented Mar 15, 2022

NiyiOdumosu commented Mar 15, 2022

NiyiOdumosu commented Mar 9, 2022 •

edited

Loading