1. 20 Mar, 2019 1 commit
    • Jason Yellick's avatar
      FAB-14735 Ignore unchanged consenters in update · 59c3387d
      Jason Yellick authored
      
      
      The config update detection code in Raft is a re-implementation of the
      fabric config checking mechanisms.  This should probably be changed in
      the long term, but one of its currently unhandled cases, is when a
      config update references the consenters in the write set, but does not
      modify them.  This can occur especially when adding a new organization
      to the orderer group.  This CR adds the additional version checks
      necessary to detect and ignore these sorts of updates.
      
      Change-Id: Ib35a97e1cdbd557705f11da183c8547f9f85539a
      Signed-off-by: default avatarJason Yellick <jyellick@us.ibm.com>
      59c3387d
  2. 16 Mar, 2019 1 commit
    • Jay Guo's avatar
      FAB-14593 Refine etcdraft parameters · 2a4e15e9
      Jay Guo authored
      
      
      - MaxInflightMsgs is internal to etcd/raft and should be exposed
      to users with a more appropriate name: MaxInflightBlocks
      
      - MaxSizePerMsg is also internal to etcd/raft, and it's defaulted
      to PreferredMaxBytes in BatchSize, so that if a big block is created,
      it is sent in a its own etcd/raft message, instead of being batched
      with other blocks. This parameter takes effect when a batch of entries
      is sent to lagged node. During normal replication, each block is
      sent in its own message.
        It's not necessary to expose this config option to users.
      
      - SnapInterval is renamed to SnapshotIntervalSize
      
      FAB-14593 #done
      
      Change-Id: Icaf2848a41c5f0f0a02f4b0b4a80ba852fddd584
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      2a4e15e9
  3. 15 Mar, 2019 1 commit
    • Jason Yellick's avatar
      FAB-14618 Store only nodeIDs in metadata · 9a124d65
      Jason Yellick authored
      
      
      Presently, the block metadata encodes the TLS certificates of all of the
      Raft consenters in the system for each block.  Because these TLS certs
      are non-trivial in size, and there may be a large set of consenters,
      this actually creates a significant amount of waste on the filesystem.
      
      As a small optimization, this CR modifies the block metadata to only
      store the nodeIDs instead of the full set of consenter info.  It then
      correlates the consenter slice found in the channel config data with
      this slice of nodeIDs to build a mapping between the two (which was
      previously persisted).
      
      Change-Id: Iaa66dacbcc48a041318c8a718099a873b9626240
      Signed-off-by: default avatarJason Yellick <jyellick@us.ibm.com>
      9a124d65
  4. 14 Mar, 2019 1 commit
    • Jason Yellick's avatar
      FAB-14619 Rename Raft metadata protos · d645c833
      Jason Yellick authored
      
      
      There are presently two etcdraft protos around metadata.  One is the
      metadata stored in the config and it is named 'Metadata', the other is
      the metadata stored in each block, this is named 'RaftMetadata'.  This
      causes confusion when reading the code.  This CR transforms those names
      to be:
      
       Metadata -> ConfigMetadata
       RaftMetadata -> BlockMetadata
      
      Change-Id: Ia0394ebe78f5541996c010c3c67d760f336f75d8
      Signed-off-by: default avatarJason Yellick <jyellick@us.ibm.com>
      d645c833
  5. 11 Mar, 2019 1 commit
  6. 09 Mar, 2019 2 commits
    • Jay Guo's avatar
      FAB-14539 Support cert rotation in single config tx · 095dfc1b
      Jay Guo authored
      
      
      If a single config tx adds and removes 1 cert in consenter set,
      it is considered to be cert rotation, therefore should be allowed
      and supported.
      
      Change-Id: Id2e78ae294cfb21501d1344e61ee1430088a0a68
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      095dfc1b
    • Jay Guo's avatar
      [FAB-14025] Proactive campaign · bd4a7524
      Jay Guo authored
      
      
      If tick interval and election timeout are fairly large (500ms and
      10 by default), it would take a newly created channel 5-10 seconds
      to elect a leader and start serving requests. If there's a split vote,
      leaderless period can be prolonged further.
      
      This CR starts proactive campaign instead of passively wait for timeout.
      
      Change-Id: Ife0e5b8bd7e2b52c3fde8ba887ffe99a5e1ff0ab
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      bd4a7524
  7. 08 Mar, 2019 1 commit
  8. 07 Mar, 2019 4 commits
    • Jay Guo's avatar
      FAB-14454 Fix broken wal file · 1ff8d184
      Jay Guo authored
      
      
      If a node crashes while writing to wal, it's likely that the last
      wal file is broken, and causes UnexpectedEOF when node is restarted
      and tries to load wal.
      
      This CR fixes this by leveraging etcd/wal.Repair util, which essentially
      truncates last wal file so it can be properly decoded.
      
      Change-Id: I0bd037578b74e8d51e30ba178b94a39102f3bae5
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      1ff8d184
    • Jay Guo's avatar
      [FAB-14467] Catch up after eviction suspicion. · 487b9a90
      Jay Guo authored
      
      
      If a lagged node is not aware of newly added node(s), it simply
      cannot communicate with them. When it suspects its own eviction,
      it inspects certificates in them, would find itself still among
      consenter set. And if the leader happens to be among those new
      nodes, we have a deadlock.
      
      To break this, when a node suspects its own eviction, it also
      triggers a catchup to actually pull blocks and commit them, so
      that it can eventually recognizes new nodes.
      
      Change-Id: Id2e9a423221e4a3bebeeeaf7da5f120396f342dd
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      487b9a90
    • Jay Guo's avatar
      [FAB-14466] More realistic etcdraft UT network mock · d50bd8d7
      Jay Guo authored
      
      
      Two nodes can exchange messages only if `configurator.Configure`
      is called on both sides. Currently, etcdraft UT network mock assumes
      nodes can accept messages from any node as long as it's connected,
      however this does not reflect the actual behavior of network and
      UT failed to capture bugs such as FAB-14413 because of this.
      
      This CR makes network mock more realistic by adding `links` to it.
      A link is open if nodes on both sides have called `Configure`.
      
      Also, this CR fixes another minor problem in UT, where `support`
      should get prepared *prior to* the start of new chain.
      
      Change-Id: I08b0cd84e9e0774dd130a70293d7c6b328ffb94c
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      d50bd8d7
    • Jay Guo's avatar
      [FAB-14380] Check consenter set during revalidation · f4b6fe4c
      Jay Guo authored
      
      
      If two config tx that modify consenter sets are sent back-to-back,
      the second one is revalidated because sequence number has advanced.
      And we should perform consenter set check to make sure it only udpates
      set by 1.
      
      Change-Id: I9f7e54a3c0854d1f130d4a061466edd398e35bde
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      f4b6fe4c
  9. 05 Mar, 2019 5 commits
    • Jay Guo's avatar
      [FAB-14441] Failed revalidation should have no impact · 66de7ea9
      Jay Guo authored
      
      
      If revalidation fails, it should have no impact on block cutting
      timer (not start/stop it).
      
      Change-Id: I9f8cb6bdffa6f95319fe14cdb66e4cfcac9851aa
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      66de7ea9
    • Jay Guo's avatar
      [FAB-14346] 1/2 Fix channel creation in etcdraft · 7e692383
      Jay Guo authored
      
      
      If channel creation config contains only a subset of orderers
      in system channel, it should just create a new channel with
      specified set of orderers, instead of reconfiguring system
      channel. This CR fixes config block commitment in etcdraft
      to properly handle channel creation config.
      
      Change-Id: I408e707ae580ce041e8fc98e76311226437720ae
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      7e692383
    • Jay Guo's avatar
      [FAB-14381] Enlarge ELECTION_TIMEOUT in etcdraft UT · 09f8250b
      Jay Guo authored
      
      
      This is enlarged because sometimes leader is repeatedly ticked
      to trigger a desired heartbeat to new node, which may lead to
      leader steping down if it's too small.
      
      This does not prolong UT execution time because we do not rely
      on ticks to trigger election.
      
      Change-Id: I4957808fa73275e1932345f1f7e9c7930d6f696d
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      09f8250b
    • Jay Guo's avatar
      [FAB-14274] loads raft ConfState when start the chain. · 2c8a8ac2
      Jay Guo authored
      
      
      When chain is started from latest snapshot, it should also load
      the ConfState from snapshot, to be persisted correctly persisted
      in subsequent snapshots.
      
      Change-Id: I0d4ae3419e7ab7f4d4dfae6251b88ec4b051885b
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      2c8a8ac2
    • yacovm's avatar
      [FAB-13750] Detect eviction from channel and halt · 23a6eed3
      yacovm authored
      
      
      This change set adds logic in the etcdraft chain that detects
      that the node is evicted from the channel, even if it was
      disconnected from the cluster while it was evicted.
      
      If a node fails sending to other nodes a consensus message,
      it starts suspecting that something is amiss.
      
      It then tries to pull the latest config block, and see
      if it is in the channel or not.
      
      If it is not, it:
      
      1) Halts the chain
      2) Pulls all blocks until the block that evicts
         the node.
      
      Change-Id: Ic3526d834fbef515119bb899fe62d30fdcf53267
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      23a6eed3
  10. 04 Mar, 2019 4 commits
    • Jay Guo's avatar
      [FAB-14278] Fix etcdraft flaky UT · 3c7ed978
      Jay Guo authored
      
      
      UT: lagged node can catch up using snapshot
      
      Even if snapshot interval is 1 Byte, it's NOT guaranteed
      that snapshot is taken for EVERY block, because snapshot
      is taken async on one go routine (for the exact purpose
      of not taking snapshot too excessively, i.e. per every
      block)
      
      Although, this is non-deterministic and our assertions
      should be more lenient.
      
      Change-Id: Iab9305482d683b00c1331e85df1a6dc78868d06d
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      3c7ed978
    • Jay Guo's avatar
      [FAB-14004] Bump etcd/raft lib version · d9a204dd
      Jay Guo authored
      
      
      This is using a specif revision instead of release because
      we need commit 23731b to fix FAB-13920: long leader failover
      when `CheckQuorum` and `PreVote` are enabled, due to lease
      check in raft. At the time of upgrading, that commit is not
      included in any of etcd release yet.
      
      Change-Id: I0467352d35180f45a9931d7afd66f82d6c10990d
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      d9a204dd
    • Jay Guo's avatar
      [FAB-14192] Fix deadlock in etcdraft chain · a7654a5b
      Jay Guo authored
      
      
      `Propose` and `ProposeConfChagen` block when node is leaderless.
      However they are currently invoked by the same go routine that
      consumes data from applyC, which processes SoftState to reflect
      leader change.
      
      This may lead to deadlock:
      - block is created on leader and about to be `Propose`ed,
      - leader steps down due to loss of quorum
      - `Propose` is called and blocks
      - SoftState is passed on applyC
      - `serveRequest` is not able to consume applyC
      - deadlock
      
      Change-Id: If4dc048f82983862b5f253231dafd513b442bf53
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      a7654a5b
    • Yoav Tock's avatar
      FAB-13669 consensus migration: kafka2raft green path #4 · 2213f40e
      Yoav Tock authored
      
      
      This is the fourth of four (4/4) sub-tasks that focus on
      the "green" path of consensus-type migration from Kafka to Raft.
      
      By "green" we mean that there are no failures or aborts along the
      way. The 4 sub-tasks are staged in a way that minimizes dependencies
      between them.
      
      In this sub-task we introduce changes to the etcd/raft-base OSNs such
      that they can restart from a ledger that was started as Kafka, migrated,
      and restarted. This change concludes all the changes needed to implement
      the green path on the "Raft" side.
      
      See respective JIRA item for further details.
      
      Change-Id: I5b408e1cfcb8cf42c39bed4df6c5496792175ef0
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
      2213f40e
  11. 03 Mar, 2019 15 commits
    • Adarsh Saraf's avatar
      [FAB-11937] Provide Raft-specific metrics · ca922401
      Adarsh Saraf authored
      
      
      This CR adds the following raft-specific metrics:
      	- cluster_size
      	- is_leader
      	- committed_block_number
      	- snapshot_block_number
      	- # leader_changes
      	- # proposal_failures
      
      Change-Id: Ie35f2984bc8eaecdd5826b1e1aa21e6c759fdd83
      Signed-off-by: default avatarAdarsh Saraf <adarshsaraf123@gmail.com>
      ca922401
    • Jay Guo's avatar
      [FAB-14031] Fix flake in etcdraft UT · c44a50bb
      Jay Guo authored
      
      
      After reconnect the leader to network, we tick it to resend
      previous data. However, if it's ticked too excessively, it
      may step down to follower due to `CheckQuorum` being enabled.
      This causes the test to fail.
      
      Instead of ticking it, this CR changes one test to simply
      enque one more tx to trigger resend of previous data.
      
      Change-Id: I211c7bf59dc6322509336ed8b120d869ea1f42f6
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      c44a50bb
    • Jay Guo's avatar
      [FAB-13656] Size-based snapshotting · 566562e7
      Jay Guo authored
      
      
      Instead of taking snapshot every N blocks, this CR
      changes it to taking snapshot every N bytes.
      
      This also sets default SnapshotInterval to 100MB, if
      it's unset. Otherwise data in memory is never compacted
      till OOM.
      
      Meanwhile, DefaultSnapshotCatchUpEntries is shrunk so
      it does not take too much space to preserve extra entries
      every time a snapshot is taken. Slow nodes are catching up
      using blockpuller, which is also efficient.
      
      Change-Id: I79cfeb8652fcbafdeb5793bf4f06267b95a858d6
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      566562e7
    • Jay Guo's avatar
      [FAB-13059] Purge etcdraft WAL and Snapshot files · 942762e5
      Jay Guo authored
      
      
      When a snapshot is taken, stale etcdraft WAL files should be
      purged to free disk space, as well as old snapshot files.
      
      However, we still keep several snapshot files around, in case
      the latest file is corrupted, etcdraft will automatically load
      an older one, until there's none left.
      
      Change-Id: I2b8168dbc0c3e5bd56a081c104dd7dc9defbcd92
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      942762e5
    • yacovm's avatar
      [FAB-13805] Unify Step and Submit into a stream · 38c1515c
      yacovm authored
      
      
      This change set removes Step RPC from the cluster protobuf,
      and renames Submit stream to a Step stream, and makes both
      transaction forwarding and consensus messages use the
      new Step stream.
      
      It also makes both egress Send() and Recv(), have a maximum
      timeout (the RPC timeout in the config).
      A Send or Recv that is used to send a consensus message,
      or send (receive) a transaction (status) will now abort prematurely
      in order to protect against any liveness issue on the remote node,
      and also to return an answer to clients within a timely manner.
      
      Change-Id: Id942b248212f5c324e12af34fce48f96fdbb6aea
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      38c1515c
    • Jay Guo's avatar
      [FAB-13967] Polling, instead of waiting in AfterEach · 577301d9
      Jay Guo authored
      
      
      If something in AfterEach hangs till test timeout, the
      coredump produced might covers up the actual assertion
      failure in test body. This CR fixes this in etcdraft UT.
      
      This CR also adds LongEventualTimeout to some places in
      etcdraft chain_test to prevent flakes due to slow wal sync.
      
      Change-Id: I585e59e5eb587f9e9eb082c5eb8f681141b16e55
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      577301d9
    • Jay Guo's avatar
      [FAB-13455] Initialize BlockPuller on demand. · e4060ed3
      Jay Guo authored
      
      
      The creation of BlockPuller takes latest certificates, therefore
      should be done on-demand to guarantee its validity.
      
      Change-Id: I327275da495a85126feb58c84b460bed98f7b860
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      e4060ed3
    • Jay Guo's avatar
      [FAB-13199] Start etcdraft chain sequentially in UT · 8c445197
      Jay Guo authored
      
      
      This CR hopefully makes occasional UT timeout more debuggable.
      
      Also, it fixes a go routine leak in UT.
      
      Change-Id: Ia0ac63b2394061dd13a570d71eae6b4139fb73b0
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      8c445197
    • Jay Guo's avatar
      [FAB-12709] Enable CheckQuorum · 50a09fd0
      Jay Guo authored
      
      
      When CheckQuorum is enabled, leader steps down if it cannot reach
      the quorum of network, so that clients have a chance to disconnect
      and try other nodes.
      
      Change-Id: I901c0e3009f9d354a2b504fe16174432345055b3
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      50a09fd0
    • Jay Guo's avatar
      [FAB-12709] Use another way to elect leader in UT · 5c3e2fce
      Jay Guo authored
      
      
      In etcdraft UT, we often need to deterministically elect a leader.
      This was done by ticking ONLY one node in the network, so it is
      the only node that start campaign.
      
      HOWEVER, there are several problems with this approach:
      1. it's slow. We need real time interval between ticks due to the
         way fake clock is implemented: it drops tick on the floor in
         case of slow consumer.
      2. there is random factor in election timeout of etcd/raft. It is
         calculated as follow:
      ```
      randomElectionTimeout = electionTimeout + rand.Intn(electionTimeout)
      ```
         in another word, if we send electionTimeout ticks, it's not
         guaranteed to trigger a leader election
      3. if CheckQuorum is enabled, a lease is imposed on follower nodes
         which gets expired if
            electionTimeout <= elapsedTicks < randomElectionTimeout
         (if it's greater than randomElectionTimeout, it's reset to 0 and
         node starts campaign)
      
      In this CR, we send an artificial MsgTimeoutNow to the node to be
      elected. This message reliably triggers campaign and skip the lease
      check.
      
      This CR also fixes several potential data race and flakes in tests.
      
      Change-Id: I3c8e0bcadbb8cfa1ae3393de2ea711fdd0d8b7aa
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      5c3e2fce
    • Jay Guo's avatar
      [FAB-13178] Use MaxInflightMsgs to throttle requests · 9b78a9d8
      Jay Guo authored
      
      
      If there are MaxInflightMsgs blocks proposed but not
      committed, chain blocks further incoming requests.
      
      Change-Id: I58c84e23c882ccc152e5c9a248434e466a8b5266
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      9b78a9d8
    • Jay Guo's avatar
      [FAB-13438] Errored should reflect correct state · 0276480c
      Jay Guo authored
      
      
      This CR changes Errored to return a channel that is
      closed when node becomes candidate.
      
      Change-Id: Ibd0ece763b9d93c4da93825d1b302ecc55a9b32e
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      0276480c
    • Jay Guo's avatar
      [FAB-13438] Store raft SoftState · 21a49bad
      Jay Guo authored
      
      
      Store raft SoftState in raft chain so it returns error
      while election is ongoing. This prevents a disconnected
      follower from returning success on Broadcast API.
      
      Change-Id: Ib6619b230938f0d6c10240b8cd8e34e346056145
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      21a49bad
    • Jay Guo's avatar
      [FAB-13438] pass SoftState on observe channel · 657b8095
      Jay Guo authored
      
      
      This CR changes type of etcdraft observe channel from uint64
      to raft.SoftState, so that chain_test can assert not only leader
      id, but also the state of node.
      
      Change-Id: Ia0c5f8c9060c234ceb84133e0c5598ed064dd1ee
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      657b8095
    • Jay Guo's avatar
      [FAB-13613] Fix race in etcdraft chain UT · 5dadb3a5
      Jay Guo authored
      
      
      Add a lock to guard manipulation of `StepStub`.
      
      Change-Id: Icaadb1f5aea0cb7f266f24ed6756c4f6541768bd
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      5dadb3a5
  12. 27 Feb, 2019 4 commits
    • Artem Barger's avatar
      FAB-12986: ledger per chain for raft chain_test.go · c7db89e0
      Artem Barger authored
      
      
      Currently there is a single instance of ledger shared between instance
      of chain mock in unit-tests. This commit introduces ledger instance per
      chain.
      
      Change-Id: I333fa2819490c995931a7e0d241eb6428e67c87e
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
      c7db89e0
    • Artem Barger's avatar
      [FAB-12945] add raft reconfiguration unit-tests · 07b7309c
      Artem Barger authored
      
      
      Change-Id: Ib77c866a30ed5108ad53908b0ca25a60a89e9a7c
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
      07b7309c
    • Jay Guo's avatar
      [FAB-13178] A dumb version of etcdraft BlockCreator · ff843afd
      Jay Guo authored
      
      
      This CR rewrites BlockCreator so that it doesn't return nil block.
      
      BEFORE:
      blockcreator holds a channel of created blocks, which is buffered
      with size of createdBlocksBuffersize (default 20). It also stores
      the hash and number of latest block.
      
      When requested to create new block, blockcreator does so
      by assembling a block based on that hash and number, enque the
      block to buffered channel. If channel is full, a nil is returned.
      
      When commit a block, it drains the channel. If there's nothing in
      the channel, it implies the blockcreator is manipulated by raft
      follower, therefore blockreator simply updates hash and number.
      
      NOW:
      what we need is actually as simple as: a blockcreator holds the
      hash and number of latest block. When it is requested to create
      a block, it just uses that hash and number to assemble one.
      And ONLY raft leader holds a blockcreator. Followers blindly
      commit whatever comes from consensus. When a follower is elected
      as new leader, it simply looks up the ledger, find hash and number
      of latest block, and creates a new blockcreator.
      
      Change-Id: I226ee34d666fbb1e8d034dc22ea6800df993f7a4
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      ff843afd
    • Jay Guo's avatar
      [FAB-13456] Fix race in etcdraft test · e8514271
      Jay Guo authored
      
      
      Change-Id: I9fe663c4efa46e5644571d238f4d7ea8f4e51626
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      e8514271