1. 03 Mar, 2019 9 commits
    • Jay Guo's avatar
      [FAB-13455] Initialize BlockPuller on demand. · e4060ed3
      Jay Guo authored
      The creation of BlockPuller takes latest certificates, therefore
      should be done on-demand to guarantee its validity.
      Change-Id: I327275da495a85126feb58c84b460bed98f7b860
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13199] Start etcdraft chain sequentially in UT · 8c445197
      Jay Guo authored
      This CR hopefully makes occasional UT timeout more debuggable.
      Also, it fixes a go routine leak in UT.
      Change-Id: Ia0ac63b2394061dd13a570d71eae6b4139fb73b0
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-12709] Enable CheckQuorum · 50a09fd0
      Jay Guo authored
      When CheckQuorum is enabled, leader steps down if it cannot reach
      the quorum of network, so that clients have a chance to disconnect
      and try other nodes.
      Change-Id: I901c0e3009f9d354a2b504fe16174432345055b3
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-12709] Use another way to elect leader in UT · 5c3e2fce
      Jay Guo authored
      In etcdraft UT, we often need to deterministically elect a leader.
      This was done by ticking ONLY one node in the network, so it is
      the only node that start campaign.
      HOWEVER, there are several problems with this approach:
      1. it's slow. We need real time interval between ticks due to the
         way fake clock is implemented: it drops tick on the floor in
         case of slow consumer.
      2. there is random factor in election timeout of etcd/raft. It is
         calculated as follow:
      randomElectionTimeout = electionTimeout + rand.Intn(electionTimeout)
         in another word, if we send electionTimeout ticks, it's not
         guaranteed to trigger a leader election
      3. if CheckQuorum is enabled, a lease is imposed on follower nodes
         which gets expired if
            electionTimeout <= elapsedTicks < randomElectionTimeout
         (if it's greater than randomElectionTimeout, it's reset to 0 and
         node starts campaign)
      In this CR, we send an artificial MsgTimeoutNow to the node to be
      elected. This message reliably triggers campaign and skip the lease
      This CR also fixes several potential data race and flakes in tests.
      Change-Id: I3c8e0bcadbb8cfa1ae3393de2ea711fdd0d8b7aa
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13178] Use MaxInflightMsgs to throttle requests · 9b78a9d8
      Jay Guo authored
      If there are MaxInflightMsgs blocks proposed but not
      committed, chain blocks further incoming requests.
      Change-Id: I58c84e23c882ccc152e5c9a248434e466a8b5266
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13438] Errored should reflect correct state · 0276480c
      Jay Guo authored
      This CR changes Errored to return a channel that is
      closed when node becomes candidate.
      Change-Id: Ibd0ece763b9d93c4da93825d1b302ecc55a9b32e
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13438] Store raft SoftState · 21a49bad
      Jay Guo authored
      Store raft SoftState in raft chain so it returns error
      while election is ongoing. This prevents a disconnected
      follower from returning success on Broadcast API.
      Change-Id: Ib6619b230938f0d6c10240b8cd8e34e346056145
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13438] pass SoftState on observe channel · 657b8095
      Jay Guo authored
      This CR changes type of etcdraft observe channel from uint64
      to raft.SoftState, so that chain_test can assert not only leader
      id, but also the state of node.
      Change-Id: Ia0c5f8c9060c234ceb84133e0c5598ed064dd1ee
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13613] Fix race in etcdraft chain UT · 5dadb3a5
      Jay Guo authored
      Add a lock to guard manipulation of `StepStub`.
      Change-Id: Icaadb1f5aea0cb7f266f24ed6756c4f6541768bd
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
  2. 27 Feb, 2019 15 commits
    • Artem Barger's avatar
      FAB-12986: ledger per chain for raft chain_test.go · c7db89e0
      Artem Barger authored
      Currently there is a single instance of ledger shared between instance
      of chain mock in unit-tests. This commit introduces ledger instance per
      Change-Id: I333fa2819490c995931a7e0d241eb6428e67c87e
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • Artem Barger's avatar
      [FAB-12945] add raft reconfiguration unit-tests · 07b7309c
      Artem Barger authored
      Change-Id: Ib77c866a30ed5108ad53908b0ca25a60a89e9a7c
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13178] A dumb version of etcdraft BlockCreator · ff843afd
      Jay Guo authored
      This CR rewrites BlockCreator so that it doesn't return nil block.
      blockcreator holds a channel of created blocks, which is buffered
      with size of createdBlocksBuffersize (default 20). It also stores
      the hash and number of latest block.
      When requested to create new block, blockcreator does so
      by assembling a block based on that hash and number, enque the
      block to buffered channel. If channel is full, a nil is returned.
      When commit a block, it drains the channel. If there's nothing in
      the channel, it implies the blockcreator is manipulated by raft
      follower, therefore blockreator simply updates hash and number.
      what we need is actually as simple as: a blockcreator holds the
      hash and number of latest block. When it is requested to create
      a block, it just uses that hash and number to assemble one.
      And ONLY raft leader holds a blockcreator. Followers blindly
      commit whatever comes from consensus. When a follower is elected
      as new leader, it simply looks up the ledger, find hash and number
      of latest block, and creates a new blockcreator.
      Change-Id: I226ee34d666fbb1e8d034dc22ea6800df993f7a4
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13456] Fix race in etcdraft test · e8514271
      Jay Guo authored
      Change-Id: I9fe663c4efa46e5644571d238f4d7ea8f4e51626
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13456] Use empty peer list to join raft cluster · 2755580a
      Jay Guo authored
      When joining a fresh node to existing etcdraft cluster, it
      should be using empty peer list to call `StartNode`.
      Change-Id: Ib6acf6fd9b2956680c99d5d7370ce439228d3bfa
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13178] do not accept new env when conf in flight · 06ce7476
      Jay Guo authored
      When new leader performs failover check and finds out leftover
      ConfChange, it should re-propose that AND do not accept new env
      until ConfChange is applied.
      This CR also moves type b config failover check from `serveRaft`
      to `serveRequest` go routine. `serveRaft` now only takes care
      of raft related logics, therefore making the code easier to
      reason about.
      Change-Id: I4e32cb155abff52e7d1c0fbe4c6f2aa5e5ef1605
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13349] Add more assertion to etcdraft UT. · 81c2e195
      Jay Guo authored
      Change-Id: I85cb5b5cc633f27fa4f317bd2b0f6d947fd97a6d
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13350] Fix etcdraft flaky test · 2077c063
      Jay Guo authored
      Change-Id: Icd364101a350ab6fa2959f196f346c717a16337f
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13360] Fix an etcdraft flaky UT · 45d6f1b9
      Jay Guo authored
      This CR removes a redundant assertion in etcdraft UT
      to avoid flakiness.
      The assertion being removed checks that expected number
      of MsgApp are dropped since node is disconnected from
      network. However, when a raft candidate is elected as
      leader, it broadcasts MsgApp containing empty data to
      followers, and sometimes this is being counted as part
      of dropped MsgApp, which causes unmatched expectation.
      We could be more precise by inspecting MsgApp on wire,
      and neglect empty messages, so we still perform this
      assertion. But it is anyway a redundant check, therefore
      can be safely removed.
      This CR also further reduces total test time.
      Change-Id: I74af9fecfebe20e44c6736a644352f8b67b624e3
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13367] Fix flaky etcdraft UT · bebfee73
      Jay Guo authored
      In raft, to commit a message, leader firstly broadcasts
      MsgApp to replicate data, and once it receives MsgAppResp
      from quorum, it
      - applies message to state machine
      - broadcasts another MsgApp to instruct followers to commit
      and these are done in parallel.
      To emulate situation where a node is disconnected *after*
      config block is committed and *before* node add/remove in
      that config block is proposed to raft, `WriteConfigBlock`
      stub on leader was overloaded to disconnect node from the
      network, so that ConfChange is proposed but dropped.
      However this becomes racy when we want to assert the config
      block is committed on followers as well, since we might
      disconnect leader too fast, and cause the second round of
      MsgApp to be dropped, so that config block is not committed
      on followers. This is fine in real case because new leader
      will continue the effort to commit this block.
      To circumvent this flakiness, this CR changes UT to overload
      `StepStub` instead of `WriteConfigBlockStub`, so that we can
      be more precise on when to disconnect a node from network.
      Change-Id: Ic1b7d28c043e779c7cc258c2e08bfaa3578bc429
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13053] Add an UT to assert retransmission. · 20579e8b
      Jay Guo authored
      This CR adds an UT to attest retransmission
      of etcd/raft MsgApp.
      Change-Id: Ic06003ae7da9dc1dcc991103e6748b55a47f04dc
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-11996] Fix failed UT · d12f1e49
      Jay Guo authored
      The UT being fixed in this CR submits a malformed config env,
      which crashes test. It passed because we never wait for the
      block to be committed and shut down test early.
      Change-Id: I40311ccbf03d8ffb73d4467e5695a28b4834d61e
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-12949] Fix etcdraft reconfiguration UT · c5cb8b22
      Jay Guo authored
      Change-Id: I64c232fe6433b9f6ca0dadbabfcc771f3f2c623f
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13199] Reduce etcdraft test time. · dd399991
      Jay Guo authored
      Change-Id: Idc65a77259ae7a153aa4d077f9e72d1e2961ab46
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Artem Barger's avatar
      [FAB-12949] finish reconfiguration after restart · eb5aef1d
      Artem Barger authored
      This commits handles the case where config transaction of
      type B is submitted and Raft cluster quorum fails/shutdowns.
      During startup, leader compares last known configuration with Raft
      config state and if there is a difference, finalize reconfiguration by
      proposing delta.
      Change-Id: I3cdf03533602489cb56c503f1d6651f27a5fc6a1
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
  3. 05 Dec, 2018 1 commit
  4. 30 Nov, 2018 2 commits
    • Adarsh Saraf's avatar
      [FAB-12354] Optimistic chain creation in etcd/raft · c0f21330
      Adarsh Saraf authored
      This CR enables optimistic creation of a chain of blocks in etcd/raft to
      pipeline block creation and consensus on the created blocks. We cannot
      do the pipelining for config blocks since all messages need to be
      revalidated upon a config change.
      Change-Id: Iabf1d4c75584afe8f641a18153d5e1b4b94f6bcc
      Signed-off-by: default avatarAdarsh Saraf <adarshsaraf123@gmail.com>
    • Artem Barger's avatar
      [FAB-12576] failover while handling tx type B · f98f7c4e
      Artem Barger authored
      Raft cluster reconfigiration consists of two parts, first leader has to
      consent on configuration block, next leader has to extract new cluster
      configuration and propose raft configuration changes. However leader
      might fail between first and second parts, therefore newly selected
      leader should be able to detect there is unfinished reconfiguration and
      to finish reconfiguration.
      This commit adds logic to manage leadership failover, where new leader
      checks whenever last committed block is configuration block, whenever
      there are pending configuration changes and complete reconfiguration by
      proposing raft clust configuration changes.
      Change-Id: I05dc1f60c9ab692521887b50f726d96ea47878dc
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
  5. 19 Nov, 2018 5 commits
  6. 08 Nov, 2018 2 commits
  7. 29 Oct, 2018 2 commits
  8. 18 Oct, 2018 1 commit
  9. 17 Oct, 2018 1 commit
  10. 16 Oct, 2018 1 commit
  11. 14 Oct, 2018 1 commit