Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental Snapshot Feature #5

Open
NiyiOdumosu opened this issue Feb 10, 2022 · 6 comments
Open

Incremental Snapshot Feature #5

NiyiOdumosu opened this issue Feb 10, 2022 · 6 comments

Comments

@NiyiOdumosu
Copy link

Hello @jpechane and @gunnarmorling, I have been reading the Debezium documentation on the incremental snapshot feature for SQL Server. The documentation says that the feature is incubating. I wanted to know if either of you is working on that feature and if so what is the expected timeline for delivery. If you are not working on that feature, can you point me to the engineers who are developing it? You help is greatly appreciated!

@jpechane
Copy link
Contributor

@NiyiOdumosu Hi, the feature is already done. The incubating marker is more like a warning that an API or an implementation might change but it is probably no longer needed.

@NiyiOdumosu
Copy link
Author

Thanks @jpechane ! I will try to test this feature out in a POC. Mind if I reach out to you if I have any questions?

@gunnarmorling
Copy link
Member

gunnarmorling commented Feb 11, 2022 via email

@NiyiOdumosu
Copy link
Author

NiyiOdumosu commented Mar 9, 2022

Hello!
In the past, we have used the DBZ SQL Server connector to migrate large volumes of historical data. If a table had billions of records and the connector failed while migrating, we would have to restart the connector and it would produce data from the beginning of the table again. We hoped that the incremental snapshot feature would solve this by taking a snapshots of the table and resume producing from the last committed offset.

Testing Scenario
We simulated a failure by deleting the connector in the middle of the migration. Then we restarted the connector hoping it will pick up where it left off. Instead what is happening is that it is producing the rows from the beginning of the table. Below are the stats. Keep in mind there are 1,408,376 records in the table.

ca72b7ad-b493-470a-9500-a5d73b6c7d51

How can we eliminate or at least minimize the duplicates so that the connector doesn't reproduce all the data it produced before it failed? Are there any configurations that we can modify to assist with this?

Any help would be greatly appreciated!
@jpechane @gunnarmorling

@jpechane
Copy link
Contributor

@NiyiOdumosu Hi, could you please move the issue to Jira or to the chat/mailing list? The GitHub issues are not used by us. Thanks. Also while in that please provide the logs and also the offsets value before the connector restart.

@NiyiOdumosu
Copy link
Author

Hey @jpechane I have posted this question on the google groups twice and I have not received a response. I was not aware there was a Jira backlog I can post it to. Can you please send me the jira link?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants