-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCC模式下,开启它的useTCCFence功能,mysql事务隔离级别是RR,如果prepare阶段发生悬挂 && rollback阶段也发生悬挂,会出现异常【Deadlock found when trying to get lock; try restarting transaction】 #6679
Comments
我认为这是出现了资源悬挂并且隔离级别为RR才会出现,故我们应该根据这两者制定解决方案: I believe this occurs due to resource suspension and isolation level RR, so we should devise a solution based on these two factors: 1.Since this is resource suspension, the suspended resources will eventually be rolled back. Therefore, we should catch such exceptions and wrap them as Seata's unified exceptions, informing the user that these exceptions can be ignored, as Seata's rollback retry mechanism handles them. Simply notifying the user should suffice. 2.We can integrate with Spring's @transactional annotation and set the isolation level during fence operations according to this annotation. This approach maintains a consistent isolation level with the business side and avoids introducing additional configuration for users. |
Ⅰ. Issue Description
TCC模式下,开启它的useTCCFence功能,mysql事务隔离级别是RR,如果prepare阶段发生悬挂 && rollback阶段也发生悬挂,会出现异常【Deadlock found when trying to get lock; try restarting transaction】
Ⅱ. Describe what happened
TCC模式下,mysql事务隔离级别是RR,如果prepare阶段发生悬挂 && rollback阶段也发生悬挂,因为rollback方法【org.apache.seata.rm.fence.SpringFenceHandler#rollbackFence】会重试,当rollback悬挂消失(prepare悬挂还未消失 or 比rollback慢一步执行)时,此时就可能出现多个请求同时执行rollback方法,这些请求会开启不同的本地事务, 每个本地事务都会执行一次【select ... for update】查询,由于此时prepare阶段还处于悬挂状态,所以表【tcc_fence_log】还没有该分支事务的fence记录,由于该分支事务的fence记录是不存在的,所以【select ... for update】查询会从行锁 退化成 间隙锁,由于不同事务是可以同时获取同一范围的间隙锁,所以这多个rollback请求都不会被阻塞,于是都去执行【insert】操作,在执行insert操作时,他们都需要等待彼此的间隙锁,于是发生了死锁!
Ⅲ. Describe what you expected to happen
多个rollback请求同时执行发生冲突时,应该出现【duplicate key exception】,避免出现死锁!
Ⅳ. Anything else we need to know?
我的想法(方案)有4个:
在rollback方法中,将【select .... for update】和【insert】操作调换下位置,先执行insert,再执行for update,避免间隙锁引起的死锁问题。(但是由于insert tcc fence操作一般都是在prepare阶段做的,prepare悬挂导致insert tcc fence操作转移到了rollback方法毕竟是少数,如果调换【select .... for update】和【insert】操作的位置,会导致每次rollback操作都需要执行两次sql操作,性能会降低,所以不推荐!)
使用redis等中间件做一个分布式锁,对【org.apache.seata.rm.fence.SpringFenceHandler】的【prepareFence】、【commitFence】、【rollbackFence】操作都需要获取分布式锁才能操作,这样也能避免死锁问题(但是这样会导致这三个操作都需要有两次网络io操作,性能也会降低,所以也不推荐!)
由于死锁是因为RR级别下的间隙锁造成的,那如果把事务隔离级别调低,换成RC,此时就没有间隙锁,自然也就不会产生死锁!于是有了如下方案:对【org.apache.seata.rm.fence.SpringFenceHandler】的【prepareFence】、【commitFence】、【rollbackFence】的db操作,是通过【org.springframework.transaction.support.TransactionTemplate#execute】实现的,所以只需要单独对这三个方法做一下改造,在执行这些方法时,临时把【TransactionTemplate】的事务隔离级别换成 RC,执行完后再换回来默认的事务隔离级别即可。
(推荐方案)由于死锁是因为RR级别下的间隙锁造成的,那如果把事务隔离级别调低,换成RC,此时就没有间隙锁,自然也就不会产生死锁!于是有了如下方案:在【io.seata.rm.tcc.config.TCCFenceConfig】配置文件中加入事务隔离级别属性【isolationLevel】,允许用户通过【seata.tcc.fence.isolationLevel】自定义tccFence的事务隔离级别。在【org.apache.seata.rm.fence.SpringFenceConfig#afterPropertiesSet】中判断 如果用户没有自定义事务隔离级别,则使用默认的事务隔离级别,相反,如果用户自定义了事务隔离级别,那么此时将【TransactionTemplate】的事务隔离级别 替换成 自定义事务隔离级别。这样,就可以通过这个拓展点,解决RR级别下的死锁问题。
方案3 or 方案4改动后,当prepare阶段发生悬挂 && rollback阶段也发生悬挂时,报错如下,避免了死锁问题:
如果认定这是一个 bug or 优化,我可以尝试提交修改的 PR ~~~
Ⅵ. Environment:
JDK version(e.g. java -version): 11
Seata client/server version: 1.8.0
Database version: 8.0.29
The text was updated successfully, but these errors were encountered: