<p>由于CAP不是使用的 MS DTC 或其他类型的2PC分布式事务机制,所以存在至少消息严格交付一次的问题,具体的说在基于消息的系统中,存在一下三种可能:</p>
<p>Imdempotence (which you may read a formal definition of on <ahref="https://en.wikipedia.org/wiki/Idempotence">Wikipedia</a>, when we are talking about messaging, is when a message redelivery can be handled without ending up in an unintended state.</p>
<p>Before we talk about idempotency, let's talk about the delivery of messages on the consumer side.</p>
<p>Since CAP is not a used MS DTC or other type of 2PC distributed transaction mechanism, there is a problem that at least the message is strictly delivered once. Specifically, in a message-based system, there are three possibilities:</p>
<ul>
<li>Exactly Once(*) (仅有一次)</li>
<li>At Most Once (最多一次)</li>
<li>At Least Once (最少一次)</li>
<li>Exactly Once(*) </li>
<li>At Most Once </li>
<li>At Least Once </li>
</ul>
<p>带 * 号表示在实际场景中,很难达到。</p>
<p>Exactly once has a (*) next to it, because in the general case, it is simply not possible.</p>
<h3id="at-most-once">At Most Once<aclass="headerlink"href="#at-most-once"title="Permanent link">¶</a></h3>
<p>In the sunshine scenario, this is all well and good – your messages will be received, and work transactions will be committed, and you will be happy.</p>
<p>However, the sun does not always shine, and stuff tends to fail – especially if you do enough stuff. Consider e.g. what would happen if anything fails after having performed step (1), and then – when you try to execute step (4)/(2) (i.e. put the message back into the queue) – the network was temporarily unavailable, or the message broker restarted, or the host machine decided to reboot because it had installed an update.</p>
<p>This can be OK if it's what you want, but most things in CAP revolve around the concept of DURABLE messages, i.e. messages whose contents is just as important as the data in your database.</p>
<h3id="at-least-once">At Least Once<aclass="headerlink"href="#at-least-once"title="Permanent link">¶</a></h3>
<p>这个交付保证包含你收到至少一次的消息,当出现故障时,可能会收到多次消息。</p>
<p>它需要稍微改变我们执行步骤的顺序,它要求消息队列系统支持事务或ACK机制,比如传统的 begin-commit-rollback 协议(MSMQ是这样),或者是 receive-ack-nack 协议(RabbitMQ,Azure Service Bus等是这样的)。</p>
<p>This delivery guarantee covers the case when you are guaranteed to receive all messages either once, or maybe more times if something has failed.</p>
<p>It requires a slight change to the order we are executing our steps in, and it requires that the message queue system supports transactions, either in the form of the traditional begin-commit-rollback protocol (MSMQ does this), or in the form of a receive-ack-nack protocol (RabbitMQ, Azure Service Bus, etc. do this).</p>
<p>Check this out – if we do this:</p>
<divclass="codehilite"><pre><span></span>1. Grab lease on message in queue
<p>and the "lease" we grabbed on the message in step (1) is associated with an appropriate timeout, then we are guaranteed that no matter how wrong things go, we will only actually remove the message from the queue (i.e. execute step (4)/(2)) if we have successfully committed our "work transaction".</p>
<h3id="what-is-a-work-transaction">What is a "work transaction"?<aclass="headerlink"href="#what-is-a-work-transaction"title="Permanent link">¶</a></h3>
<p>It depends on what you're doing 😄 maybe it's a transaction in a relational database (which traditionally have pretty good support in this regard), maybe it's a transaction in a document database that happens to support transaction (like RavenDB or Postgres), or maybe it's a conceptual transaction in the form of whichever work you happen to carry out as a consequence of handling a message, e.g. update a bunch of documents in MongoDB, move some files around in the file system, or mutate some obscure in-mem data structure.</p>
<p>The fact that the "work transaction" is just a conceptual thing is what makes it impossible to support the aforementioned Exactly Once delivery guarantee – it's just not generally possible to commit or roll back a "work transaction" and a "queue transaction" (which is what we could call the protocol carried out with the message queue systems) atomically and consistently.</p>
<h2id="idempotence-at-cap">Idempotence at CAP<aclass="headerlink"href="#idempotence-at-cap"title="Permanent link">¶</a></h2>
<p>In the CAP, the delivery guarantees we use is <strong>At Least Once</strong>.</p>
<p>Since we have a temporary storage medium (database table), we may be able to do At Most Once, but in order to strictly guarantee that the message will not be lost, we do not provide related functions or configurations.</p>
<h3id="why-are-we-not-providingachieving-idempotency">Why are we not providing(achieving) idempotency ?<aclass="headerlink"href="#why-are-we-not-providingachieving-idempotency"title="Permanent link">¶</a></h3>
<ol>
<li>
<p>The message was successfully written, but the execution of the Consumer method failed. </p>
<p>There are a lot of reasons why the Consumer method fails. I don't know if the specific scene is blindly retrying or not retrying is an incorrect choice.
For example, if the consumer is debiting service, if the execution of the debit is successful, but fails to write the debit log, the CAP will judge that the consumer failed to execute and try again. If the client does not guarantee idempotency, the framework will retry it, which will inevitably lead to serious consequences for multiple debits.</p>
</li>
<li>
<p>The implementation of the Consumer method succeeded, but received the same message. </p>
<p>The scenario is also possible here. If the Consumer has been successfully executed at the beginning, but for some reason, such as the Broker recovery, and received the same message, the CAP will consider this a new after receiving the Broker message. The message will be executed again by the Consumer. Because it is a new message, the CAP cannot be idempotent at this time.</p>
</li>
<li>
<p>The current data storage mode can not be idempotent. </p>
<p>Since the table of the CAP message is deleted after 1 hour for the successfully consumed message, if the historical message cannot be idempotent. Historically, if the broker has maintained or manually processed some messages for some reason.</p>
</li>
<li>
<p>Industry practices.</p>
<p>Many event-driven frameworks require users to ensure idempotent operations, such as ENode, RocketMQ, etc...</p>
</li>
</ol>
<p>From an implementation point of view, CAP can do some less stringent idempotence, but strict idempotent cannot.</p>
<p>Another way of making message processing idempotent, is to simply track IDs of processed messages explicitly, and then make your code handle a redelivery.</p>
<p>Assuming that you are keeping track of message IDs by using an <codeclass="codehilite">IMessageTracker</code> that uses the same transactional data store as the rest of your work, your code might look somewhat like this:</p>
<p>As for the implementation of <codeclass="codehilite">IMessageTracker</code>, you can use a storage message Id such as Redis or a database and the corresponding processing state.</p>
<divclass="footnote">
<hr/>
<ol>
<liid="fn:1">
<p>The chapter refers to the <ahref="https://github.com/rebus-org/Rebus/wiki/Delivery-guarantees">Delivery guarantees</a> of rebus, which I think is described very good. <aclass="footnote-backref"href="#fnref:1"rev="footnote"title="Jump back to footnote 1 in the text">↩</a></p>
<ahref="https://github.com/dotnetcore/CAP/edit/master/docs/user-guide/en/cap/transactions.md"title="Edit this page"class="md-icon md-content__icon"></a>
<p>CAP does not directly provide out-of-the-box MS DTC or 2PC-based distributed transactions, instead we provide a solution that can be used to solve problems encountered in distributed transactions.</p>
<p>In a distributed environment, using 2PC or DTC-based distributed transactions can be very expensive due to the overhead involved in communication, as is performance. In addition, since distributed transactions based on 2PC or DTC are also subject to the <strong>CAP theorem</strong>, it will have to give up availability (A in CAP) when network partitioning occurs.</p>
<p>A distributed transaction is a very complex process with a lot of moving parts that can fail. Also, if these parts run on different machines or even in different data centers, the process of committing a transaction could become very long and unreliable.</p>
<p>This could seriously affect the user experience and overall system bandwidth. So <strong>one of the best ways to solve the problem of distributed transactions is to avoid them completely</strong>.<supid="fnref2:1"><aclass="footnote-ref"href="#fn:1"rel="footnote">1</a></sup></p>
</blockquote>
<p>For the processing of distributed transactions, CAP uses the "Eventual Consistency and Compensation" scheme.</p>
<h3id="eventual-consistency-and-compensation-1">Eventual Consistency and Compensation <supid="fnref:1"><aclass="footnote-ref"href="#fn:1"rel="footnote">1</a></sup><aclass="headerlink"href="#eventual-consistency-and-compensation-1"title="Permanent link">¶</a></h3>
<p>By far, one of the most feasible models of handling consistency across microservices is <ahref="https://en.wikipedia.org/wiki/Eventual_consistency">eventual consistency</a>.</p>
<p>This model doesn’t enforce distributed ACID transactions across microservices. Instead, it proposes to use some mechanisms of ensuring that the system would be eventually consistent at some point in the future.</p>
<h4id="a-case-for-eventual-consistency">A Case for Eventual Consistency<aclass="headerlink"href="#a-case-for-eventual-consistency"title="Permanent link">¶</a></h4>
<p>For example, suppose we need to solve the following task:</p>
<ul>
<li>register a user profile </li>
<li>do some automated background check that the user can actually access the system</li>
</ul>
<p>The second task is to ensure, for example, that this user wasn’t banned from our servers for some reason.</p>
<p>But it could take time, and we’d like to extract it to a separate microservice. It wouldn’t be reasonable to keep the user waiting for so long just to know that she was registered successfully.</p>
<p><strong>One way to solve it would be with a message-driven approach including compensation</strong>. Let’s consider the following architecture:</p>
<ul>
<li>the user microservice tasked with registering a user profile </li>
<li>the validation microservice tasked with doing a background check </li>
<li>the messaging platform that supports persistent queues </li>
</ul>
<p>The messaging platform could ensure that the messages sent by the microservices are persisted. Then they would be delivered at a later time if the receiver weren’t currently available</p>
<p>In this architecture, a happy scenario would be:</p>
<ul>
<li>the user microservice registers a user, saving information about her in its local database</li>
<li>the user microservice marks this user with a flag. It could signify that this user hasn’t yet been validated and doesn’t have access to full system functionality</li>
<li>a confirmation of registration is sent to the user with a warning that not all functionality of the system is accessible right away</li>
<li>the user microservice sends a message to the validation microservice to do the background check of a user</li>
<li>the validation microservice runs the background check and sends a message to the user microservice with the results of the check</li>
<li>if the results are positive, the user microservice unblocks the user</li>
<li>if the results are negative, the user microservice deletes the user account</li>
</ul>
<p>After we’ve gone through all these steps, the system should be in a consistent state. However, for some period of time, the user entity appeared to be in an incomplete state.</p>
<p>The last step, when the user microservice removes the invalid account, is a compensation phase.</p>
<li>if the validation microservice is not accessible, then the messaging platform with its persistent queue functionality ensures that the validation microservice would receive this message at some later time</li>
<li>suppose the messaging platform fails, then the user microservice tries to send the message again at some later time, for example, by scheduled batch-processing of all users that were not yet validated</li>
<li>if the validation microservice receives the message, validates the user but can’t send the answer back due to the messaging platform failure, the validation microservice also retries sending the message at some later time</li>
<li>if one of the messages got lost, or some other failure happened, the user microservice finds all non-validated users by scheduled batch-processing and sends requests for validation again</li>
</ul>
<p>Even if some of the messages were issued multiple times, this wouldn’t affect the consistency of the data in the microservices’ databases.</p>
<p><strong>By carefully considering all possible failure scenarios, we can ensure that our system would satisfy the conditions of eventual consistency. At the same time, we wouldn’t need to deal with the costly distributed transactions.</strong></p>
<p>But we have to be aware that ensuring eventual consistency is a complex task. It doesn’t have a single solution for all cases.</p>
<divclass="footnote">
<hr/>
<ol>
<liid="fn:1">
<p>This chapter is quoted from: <ahref="https://www.baeldung.com/transactions-across-microservices">https://www.baeldung.com/transactions-across-microservices</a> <aclass="footnote-backref"href="#fnref:1"rev="footnote"title="Jump back to footnote 1 in the text">↩</a><aclass="footnote-backref"href="#fnref2:1"rev="footnote"title="Jump back to footnote 1 in the text">↩</a></p>