- 20 Mar, 2019 1 commit
-
-
Jason Yellick authored
The config update detection code in Raft is a re-implementation of the fabric config checking mechanisms. This should probably be changed in the long term, but one of its currently unhandled cases, is when a config update references the consenters in the write set, but does not modify them. This can occur especially when adding a new organization to the orderer group. This CR adds the additional version checks necessary to detect and ignore these sorts of updates. Change-Id: Ib35a97e1cdbd557705f11da183c8547f9f85539a Signed-off-by:
Jason Yellick <jyellick@us.ibm.com>
-
- 16 Mar, 2019 1 commit
-
-
Jay Guo authored
- MaxInflightMsgs is internal to etcd/raft and should be exposed to users with a more appropriate name: MaxInflightBlocks - MaxSizePerMsg is also internal to etcd/raft, and it's defaulted to PreferredMaxBytes in BatchSize, so that if a big block is created, it is sent in a its own etcd/raft message, instead of being batched with other blocks. This parameter takes effect when a batch of entries is sent to lagged node. During normal replication, each block is sent in its own message. It's not necessary to expose this config option to users. - SnapInterval is renamed to SnapshotIntervalSize FAB-14593 #done Change-Id: Icaf2848a41c5f0f0a02f4b0b4a80ba852fddd584 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
- 15 Mar, 2019 1 commit
-
-
Jason Yellick authored
Presently, the block metadata encodes the TLS certificates of all of the Raft consenters in the system for each block. Because these TLS certs are non-trivial in size, and there may be a large set of consenters, this actually creates a significant amount of waste on the filesystem. As a small optimization, this CR modifies the block metadata to only store the nodeIDs instead of the full set of consenter info. It then correlates the consenter slice found in the channel config data with this slice of nodeIDs to build a mapping between the two (which was previously persisted). Change-Id: Iaa66dacbcc48a041318c8a718099a873b9626240 Signed-off-by:
Jason Yellick <jyellick@us.ibm.com>
-
- 14 Mar, 2019 1 commit
-
-
Jason Yellick authored
There are presently two etcdraft protos around metadata. One is the metadata stored in the config and it is named 'Metadata', the other is the metadata stored in each block, this is named 'RaftMetadata'. This causes confusion when reading the code. This CR transforms those names to be: Metadata -> ConfigMetadata RaftMetadata -> BlockMetadata Change-Id: Ia0394ebe78f5541996c010c3c67d760f336f75d8 Signed-off-by:
Jason Yellick <jyellick@us.ibm.com>
-
- 11 Mar, 2019 1 commit
-
-
yacovm authored
This change set adds a basic validity check that prevents admins from adding the same consenter twice. Change-Id: I0e5efbd78a77a060b060c20b447609e604749815 Signed-off-by:
yacovm <yacovm@il.ibm.com> (cherry picked from commit afc680484e503a7a4366177993da195906fddc73)
-
- 09 Mar, 2019 2 commits
-
-
Jay Guo authored
If a single config tx adds and removes 1 cert in consenter set, it is considered to be cert rotation, therefore should be allowed and supported. Change-Id: Id2e78ae294cfb21501d1344e61ee1430088a0a68 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
If tick interval and election timeout are fairly large (500ms and 10 by default), it would take a newly created channel 5-10 seconds to elect a leader and start serving requests. If there's a split vote, leaderless period can be prolonged further. This CR starts proactive campaign instead of passively wait for timeout. Change-Id: Ife0e5b8bd7e2b52c3fde8ba887ffe99a5e1ff0ab Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
- 08 Mar, 2019 1 commit
-
-
Adarsh Saraf authored
This CR adds the following metrics for etcdraft: - data_persist_duration - # normal_proposals_received - # config_proposals_received Change-Id: I34b9bdcb3ba7eb0bb074eb7abc93c2f5b5b5c5ad Signed-off-by:
Adarsh Saraf <adarshsaraf123@gmail.com> (cherry picked from commit cca81e5581073a1fad036146b64d9321950c4f3d)
-
- 07 Mar, 2019 4 commits
-
-
Jay Guo authored
If a node crashes while writing to wal, it's likely that the last wal file is broken, and causes UnexpectedEOF when node is restarted and tries to load wal. This CR fixes this by leveraging etcd/wal.Repair util, which essentially truncates last wal file so it can be properly decoded. Change-Id: I0bd037578b74e8d51e30ba178b94a39102f3bae5 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
If a lagged node is not aware of newly added node(s), it simply cannot communicate with them. When it suspects its own eviction, it inspects certificates in them, would find itself still among consenter set. And if the leader happens to be among those new nodes, we have a deadlock. To break this, when a node suspects its own eviction, it also triggers a catchup to actually pull blocks and commit them, so that it can eventually recognizes new nodes. Change-Id: Id2e9a423221e4a3bebeeeaf7da5f120396f342dd Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Two nodes can exchange messages only if `configurator.Configure` is called on both sides. Currently, etcdraft UT network mock assumes nodes can accept messages from any node as long as it's connected, however this does not reflect the actual behavior of network and UT failed to capture bugs such as FAB-14413 because of this. This CR makes network mock more realistic by adding `links` to it. A link is open if nodes on both sides have called `Configure`. Also, this CR fixes another minor problem in UT, where `support` should get prepared *prior to* the start of new chain. Change-Id: I08b0cd84e9e0774dd130a70293d7c6b328ffb94c Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
If two config tx that modify consenter sets are sent back-to-back, the second one is revalidated because sequence number has advanced. And we should perform consenter set check to make sure it only udpates set by 1. Change-Id: I9f7e54a3c0854d1f130d4a061466edd398e35bde Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
- 05 Mar, 2019 5 commits
-
-
Jay Guo authored
If revalidation fails, it should have no impact on block cutting timer (not start/stop it). Change-Id: I9f8cb6bdffa6f95319fe14cdb66e4cfcac9851aa Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
If channel creation config contains only a subset of orderers in system channel, it should just create a new channel with specified set of orderers, instead of reconfiguring system channel. This CR fixes config block commitment in etcdraft to properly handle channel creation config. Change-Id: I408e707ae580ce041e8fc98e76311226437720ae Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This is enlarged because sometimes leader is repeatedly ticked to trigger a desired heartbeat to new node, which may lead to leader steping down if it's too small. This does not prolong UT execution time because we do not rely on ticks to trigger election. Change-Id: I4957808fa73275e1932345f1f7e9c7930d6f696d Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
When chain is started from latest snapshot, it should also load the ConfState from snapshot, to be persisted correctly persisted in subsequent snapshots. Change-Id: I0d4ae3419e7ab7f4d4dfae6251b88ec4b051885b Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set adds logic in the etcdraft chain that detects that the node is evicted from the channel, even if it was disconnected from the cluster while it was evicted. If a node fails sending to other nodes a consensus message, it starts suspecting that something is amiss. It then tries to pull the latest config block, and see if it is in the channel or not. If it is not, it: 1) Halts the chain 2) Pulls all blocks until the block that evicts the node. Change-Id: Ic3526d834fbef515119bb899fe62d30fdcf53267 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
- 04 Mar, 2019 4 commits
-
-
Jay Guo authored
UT: lagged node can catch up using snapshot Even if snapshot interval is 1 Byte, it's NOT guaranteed that snapshot is taken for EVERY block, because snapshot is taken async on one go routine (for the exact purpose of not taking snapshot too excessively, i.e. per every block) Although, this is non-deterministic and our assertions should be more lenient. Change-Id: Iab9305482d683b00c1331e85df1a6dc78868d06d Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This is using a specif revision instead of release because we need commit 23731b to fix FAB-13920: long leader failover when `CheckQuorum` and `PreVote` are enabled, due to lease check in raft. At the time of upgrading, that commit is not included in any of etcd release yet. Change-Id: I0467352d35180f45a9931d7afd66f82d6c10990d Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
`Propose` and `ProposeConfChagen` block when node is leaderless. However they are currently invoked by the same go routine that consumes data from applyC, which processes SoftState to reflect leader change. This may lead to deadlock: - block is created on leader and about to be `Propose`ed, - leader steps down due to loss of quorum - `Propose` is called and blocks - SoftState is passed on applyC - `serveRequest` is not able to consume applyC - deadlock Change-Id: If4dc048f82983862b5f253231dafd513b442bf53 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Yoav Tock authored
This is the fourth of four (4/4) sub-tasks that focus on the "green" path of consensus-type migration from Kafka to Raft. By "green" we mean that there are no failures or aborts along the way. The 4 sub-tasks are staged in a way that minimizes dependencies between them. In this sub-task we introduce changes to the etcd/raft-base OSNs such that they can restart from a ledger that was started as Kafka, migrated, and restarted. This change concludes all the changes needed to implement the green path on the "Raft" side. See respective JIRA item for further details. Change-Id: I5b408e1cfcb8cf42c39bed4df6c5496792175ef0 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
- 03 Mar, 2019 15 commits
-
-
Adarsh Saraf authored
This CR adds the following raft-specific metrics: - cluster_size - is_leader - committed_block_number - snapshot_block_number - # leader_changes - # proposal_failures Change-Id: Ie35f2984bc8eaecdd5826b1e1aa21e6c759fdd83 Signed-off-by:
Adarsh Saraf <adarshsaraf123@gmail.com>
-
Jay Guo authored
After reconnect the leader to network, we tick it to resend previous data. However, if it's ticked too excessively, it may step down to follower due to `CheckQuorum` being enabled. This causes the test to fail. Instead of ticking it, this CR changes one test to simply enque one more tx to trigger resend of previous data. Change-Id: I211c7bf59dc6322509336ed8b120d869ea1f42f6 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Instead of taking snapshot every N blocks, this CR changes it to taking snapshot every N bytes. This also sets default SnapshotInterval to 100MB, if it's unset. Otherwise data in memory is never compacted till OOM. Meanwhile, DefaultSnapshotCatchUpEntries is shrunk so it does not take too much space to preserve extra entries every time a snapshot is taken. Slow nodes are catching up using blockpuller, which is also efficient. Change-Id: I79cfeb8652fcbafdeb5793bf4f06267b95a858d6 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
When a snapshot is taken, stale etcdraft WAL files should be purged to free disk space, as well as old snapshot files. However, we still keep several snapshot files around, in case the latest file is corrupted, etcdraft will automatically load an older one, until there's none left. Change-Id: I2b8168dbc0c3e5bd56a081c104dd7dc9defbcd92 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set removes Step RPC from the cluster protobuf, and renames Submit stream to a Step stream, and makes both transaction forwarding and consensus messages use the new Step stream. It also makes both egress Send() and Recv(), have a maximum timeout (the RPC timeout in the config). A Send or Recv that is used to send a consensus message, or send (receive) a transaction (status) will now abort prematurely in order to protect against any liveness issue on the remote node, and also to return an answer to clients within a timely manner. Change-Id: Id942b248212f5c324e12af34fce48f96fdbb6aea Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
If something in AfterEach hangs till test timeout, the coredump produced might covers up the actual assertion failure in test body. This CR fixes this in etcdraft UT. This CR also adds LongEventualTimeout to some places in etcdraft chain_test to prevent flakes due to slow wal sync. Change-Id: I585e59e5eb587f9e9eb082c5eb8f681141b16e55 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
The creation of BlockPuller takes latest certificates, therefore should be done on-demand to guarantee its validity. Change-Id: I327275da495a85126feb58c84b460bed98f7b860 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR hopefully makes occasional UT timeout more debuggable. Also, it fixes a go routine leak in UT. Change-Id: Ia0ac63b2394061dd13a570d71eae6b4139fb73b0 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
When CheckQuorum is enabled, leader steps down if it cannot reach the quorum of network, so that clients have a chance to disconnect and try other nodes. Change-Id: I901c0e3009f9d354a2b504fe16174432345055b3 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
In etcdraft UT, we often need to deterministically elect a leader. This was done by ticking ONLY one node in the network, so it is the only node that start campaign. HOWEVER, there are several problems with this approach: 1. it's slow. We need real time interval between ticks due to the way fake clock is implemented: it drops tick on the floor in case of slow consumer. 2. there is random factor in election timeout of etcd/raft. It is calculated as follow: ``` randomElectionTimeout = electionTimeout + rand.Intn(electionTimeout) ``` in another word, if we send electionTimeout ticks, it's not guaranteed to trigger a leader election 3. if CheckQuorum is enabled, a lease is imposed on follower nodes which gets expired if electionTimeout <= elapsedTicks < randomElectionTimeout (if it's greater than randomElectionTimeout, it's reset to 0 and node starts campaign) In this CR, we send an artificial MsgTimeoutNow to the node to be elected. This message reliably triggers campaign and skip the lease check. This CR also fixes several potential data race and flakes in tests. Change-Id: I3c8e0bcadbb8cfa1ae3393de2ea711fdd0d8b7aa Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
If there are MaxInflightMsgs blocks proposed but not committed, chain blocks further incoming requests. Change-Id: I58c84e23c882ccc152e5c9a248434e466a8b5266 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR changes Errored to return a channel that is closed when node becomes candidate. Change-Id: Ibd0ece763b9d93c4da93825d1b302ecc55a9b32e Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Store raft SoftState in raft chain so it returns error while election is ongoing. This prevents a disconnected follower from returning success on Broadcast API. Change-Id: Ib6619b230938f0d6c10240b8cd8e34e346056145 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR changes type of etcdraft observe channel from uint64 to raft.SoftState, so that chain_test can assert not only leader id, but also the state of node. Change-Id: Ia0c5f8c9060c234ceb84133e0c5598ed064dd1ee Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Add a lock to guard manipulation of `StepStub`. Change-Id: Icaadb1f5aea0cb7f266f24ed6756c4f6541768bd Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
- 27 Feb, 2019 4 commits
-
-
Artem Barger authored
Currently there is a single instance of ledger shared between instance of chain mock in unit-tests. This commit introduces ledger instance per chain. Change-Id: I333fa2819490c995931a7e0d241eb6428e67c87e Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Artem Barger authored
Change-Id: Ib77c866a30ed5108ad53908b0ca25a60a89e9a7c Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Jay Guo authored
This CR rewrites BlockCreator so that it doesn't return nil block. BEFORE: blockcreator holds a channel of created blocks, which is buffered with size of createdBlocksBuffersize (default 20). It also stores the hash and number of latest block. When requested to create new block, blockcreator does so by assembling a block based on that hash and number, enque the block to buffered channel. If channel is full, a nil is returned. When commit a block, it drains the channel. If there's nothing in the channel, it implies the blockcreator is manipulated by raft follower, therefore blockreator simply updates hash and number. NOW: what we need is actually as simple as: a blockcreator holds the hash and number of latest block. When it is requested to create a block, it just uses that hash and number to assemble one. And ONLY raft leader holds a blockcreator. Followers blindly commit whatever comes from consensus. When a follower is elected as new leader, it simply looks up the ledger, find hash and number of latest block, and creates a new blockcreator. Change-Id: I226ee34d666fbb1e8d034dc22ea6800df993f7a4 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Change-Id: I9fe663c4efa46e5644571d238f4d7ea8f4e51626 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-