1. 20 Mar, 2019 2 commits
    • Jay Guo's avatar
      FAB-14656 Respect snapshot interval when node restarts · ade6dc48
      Jay Guo authored
      
      
      When a node is restarted, it loads WAL data since last snapshot
      into memory, however this chunk of data is not currently taken
      into account as part of accumulated data, which is compared to
      snapshot interval.
      
      This CR fixes this. Also it changes some log level to improve
      serviceability.
      
      FAB-14656 #done
      
      Change-Id: If152071e64fd8268d20362c593d24af4ab2be355
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      ade6dc48
    • Jay Guo's avatar
      FAB-14540 transfer leader if cert of it is rotated · 7e440c73
      Jay Guo authored
      
      
      When the certificate of leader is rotated, it will certainly be
      disconnected after reconfiguring communication. Instead of waiting
      for ElectionTimeout and elect new leader, the old leader should be
      more cooporative and transfer its leadership to others.
      
      Note that proposals sent during this transition will be automatically
      dropped by etcd/raft, however transition should be fairly short.
      
      Change-Id: Iabd005d00864afe09b4738f1ed36b939b1d83eed
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      7e440c73
  2. 16 Mar, 2019 1 commit
    • Jay Guo's avatar
      FAB-14593 Refine etcdraft parameters · 2a4e15e9
      Jay Guo authored
      
      
      - MaxInflightMsgs is internal to etcd/raft and should be exposed
      to users with a more appropriate name: MaxInflightBlocks
      
      - MaxSizePerMsg is also internal to etcd/raft, and it's defaulted
      to PreferredMaxBytes in BatchSize, so that if a big block is created,
      it is sent in a its own etcd/raft message, instead of being batched
      with other blocks. This parameter takes effect when a batch of entries
      is sent to lagged node. During normal replication, each block is
      sent in its own message.
        It's not necessary to expose this config option to users.
      
      - SnapInterval is renamed to SnapshotIntervalSize
      
      FAB-14593 #done
      
      Change-Id: Icaf2848a41c5f0f0a02f4b0b4a80ba852fddd584
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      2a4e15e9
  3. 15 Mar, 2019 1 commit
    • Jason Yellick's avatar
      FAB-14618 Store only nodeIDs in metadata · 9a124d65
      Jason Yellick authored
      
      
      Presently, the block metadata encodes the TLS certificates of all of the
      Raft consenters in the system for each block.  Because these TLS certs
      are non-trivial in size, and there may be a large set of consenters,
      this actually creates a significant amount of waste on the filesystem.
      
      As a small optimization, this CR modifies the block metadata to only
      store the nodeIDs instead of the full set of consenter info.  It then
      correlates the consenter slice found in the channel config data with
      this slice of nodeIDs to build a mapping between the two (which was
      previously persisted).
      
      Change-Id: Iaa66dacbcc48a041318c8a718099a873b9626240
      Signed-off-by: default avatarJason Yellick <jyellick@us.ibm.com>
      9a124d65
  4. 14 Mar, 2019 2 commits
    • Jason Yellick's avatar
      FAB-14620 Refactor detectConfChange · a2323d98
      Jason Yellick authored
      
      
      The detectConfChange presently returns several values, not all of which
      are always non-nil, and not all of which being nil caused fatal results.
      This is a pre-amble to converting the consenter id -> TLS cert map to a
      slice of ids.
      
      Change-Id: I02d01137e5cb0ce1250c166539170c44575e4fd4
      Signed-off-by: default avatarJason Yellick <jyellick@us.ibm.com>
      a2323d98
    • Jason Yellick's avatar
      FAB-14619 Rename Raft metadata protos · d645c833
      Jason Yellick authored
      
      
      There are presently two etcdraft protos around metadata.  One is the
      metadata stored in the config and it is named 'Metadata', the other is
      the metadata stored in each block, this is named 'RaftMetadata'.  This
      causes confusion when reading the code.  This CR transforms those names
      to be:
      
       Metadata -> ConfigMetadata
       RaftMetadata -> BlockMetadata
      
      Change-Id: Ia0394ebe78f5541996c010c3c67d760f336f75d8
      Signed-off-by: default avatarJason Yellick <jyellick@us.ibm.com>
      d645c833
  5. 13 Mar, 2019 1 commit
    • yacovm's avatar
      [FAB-14634] Write raw blocks if evicted · 7356c663
      yacovm authored
      
      
      This change set adds Append() to the consenter support,
      and changes the eviction logic to use Append() instead
      of WriteBlock, so that the blocks that are pulled from OSNs
      are not changed and are written as is into the ledger.
      
      Change-Id: I76abe64990f1855b53dadb0655c5830169ef7ed1
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      7356c663
  6. 11 Mar, 2019 1 commit
  7. 09 Mar, 2019 1 commit
  8. 08 Mar, 2019 1 commit
  9. 07 Mar, 2019 3 commits
    • Jay Guo's avatar
      [FAB-14467] Catch up after eviction suspicion. · 487b9a90
      Jay Guo authored
      
      
      If a lagged node is not aware of newly added node(s), it simply
      cannot communicate with them. When it suspects its own eviction,
      it inspects certificates in them, would find itself still among
      consenter set. And if the leader happens to be among those new
      nodes, we have a deadlock.
      
      To break this, when a node suspects its own eviction, it also
      triggers a catchup to actually pull blocks and commit them, so
      that it can eventually recognizes new nodes.
      
      Change-Id: Id2e9a423221e4a3bebeeeaf7da5f120396f342dd
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      487b9a90
    • Jay Guo's avatar
      [FAB-14466] More realistic etcdraft UT network mock · d50bd8d7
      Jay Guo authored
      
      
      Two nodes can exchange messages only if `configurator.Configure`
      is called on both sides. Currently, etcdraft UT network mock assumes
      nodes can accept messages from any node as long as it's connected,
      however this does not reflect the actual behavior of network and
      UT failed to capture bugs such as FAB-14413 because of this.
      
      This CR makes network mock more realistic by adding `links` to it.
      A link is open if nodes on both sides have called `Configure`.
      
      Also, this CR fixes another minor problem in UT, where `support`
      should get prepared *prior to* the start of new chain.
      
      Change-Id: I08b0cd84e9e0774dd130a70293d7c6b328ffb94c
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      d50bd8d7
    • Jay Guo's avatar
      [FAB-14380] Check consenter set during revalidation · f4b6fe4c
      Jay Guo authored
      
      
      If two config tx that modify consenter sets are sent back-to-back,
      the second one is revalidated because sequence number has advanced.
      And we should perform consenter set check to make sure it only udpates
      set by 1.
      
      Change-Id: I9f7e54a3c0854d1f130d4a061466edd398e35bde
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      f4b6fe4c
  10. 05 Mar, 2019 7 commits
  11. 04 Mar, 2019 6 commits
    • yacovm's avatar
      [FAB-14136] Always Deliver if cluster smaller than 3 · 8b87f05a
      yacovm authored
      
      
      When a single node etcdraft cluster is expanded and a new node
      is added, the new node needs to pull blocks from the existing node.
      
      However, the existing node loses leadership and then rejects deliver requests.
      
      This change set, makes a raft leader not reject deliver responses if the cluster
      has less than 3 members in it.
      
      Change-Id: I75bd028d5a46fcb6ae81dc29012e3e839149c319
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      8b87f05a
    • Jay Guo's avatar
      [FAB-14004] Bump etcd/raft lib version · d9a204dd
      Jay Guo authored
      
      
      This is using a specif revision instead of release because
      we need commit 23731b to fix FAB-13920: long leader failover
      when `CheckQuorum` and `PreVote` are enabled, due to lease
      check in raft. At the time of upgrading, that commit is not
      included in any of etcd release yet.
      
      Change-Id: I0467352d35180f45a9931d7afd66f82d6c10990d
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      d9a204dd
    • Jay Guo's avatar
      [FAB-14240] Do not use `support.Height` in chain · f7f806fd
      Jay Guo authored
      
      
      Commitment of block is doen aync win blockwritter, therefore
      `support.Height` may not correctly reflect the actual number
      of latest block. Decision made based on `support.Height` while
      chain is running may lead to replaying of same block.
      
      This CR fixes this by keeping a reference to the last block,
      which is initialized by reading the tail of ledger at start.
      
      Change-Id: I6c5e5fed4c1464c459603f4484a44b8b91b017b6
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      f7f806fd
    • Jay Guo's avatar
      [FAB-14192] Fix deadlock in etcdraft chain · a7654a5b
      Jay Guo authored
      
      
      `Propose` and `ProposeConfChagen` block when node is leaderless.
      However they are currently invoked by the same go routine that
      consumes data from applyC, which processes SoftState to reflect
      leader change.
      
      This may lead to deadlock:
      - block is created on leader and about to be `Propose`ed,
      - leader steps down due to loss of quorum
      - `Propose` is called and blocks
      - SoftState is passed on applyC
      - `serveRequest` is not able to consume applyC
      - deadlock
      
      Change-Id: If4dc048f82983862b5f253231dafd513b442bf53
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      a7654a5b
    • Yoav Tock's avatar
      FAB-13669 consensus migration: kafka2raft green path #4 · 2213f40e
      Yoav Tock authored
      
      
      This is the fourth of four (4/4) sub-tasks that focus on
      the "green" path of consensus-type migration from Kafka to Raft.
      
      By "green" we mean that there are no failures or aborts along the
      way. The 4 sub-tasks are staged in a way that minimizes dependencies
      between them.
      
      In this sub-task we introduce changes to the etcd/raft-base OSNs such
      that they can restart from a ledger that was started as Kafka, migrated,
      and restarted. This change concludes all the changes needed to implement
      the green path on the "Raft" side.
      
      See respective JIRA item for further details.
      
      Change-Id: I5b408e1cfcb8cf42c39bed4df6c5496792175ef0
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
      2213f40e
    • Yoav Tock's avatar
      FAB-13666 consensus migration: kafka2raft green path #3 · e1f10434
      Yoav Tock authored
      
      
      This is the third of four (3/4) sub-tasks that focus on
      the "green" path of consensus-type migration from Kafka to Raft.
      
      By "green" we mean that there are no failures or aborts along
      the way. The 4 sub-tasks are staged in a way that minimizes
      dependencies between them.
      
      In this sub-task we introduce changes to the Kafka-base OSNs
      such that they implement the entire green path, until they commit
      migration and are ready to be restarted. This
      change concludes all the changes needed to implement the green
      path on the "Kafka" side.
      
      See respective JIRA item for further details.
      
      Note: some of the mocks had to be re-generated by couterfeiter to make the
      unit tests build.
      
      Change-Id: I2747f7d8017c344e9ff3bdd9dd98cbaa1480083f
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
      e1f10434
  12. 03 Mar, 2019 14 commits
    • Adarsh Saraf's avatar
      [FAB-11937] Provide Raft-specific metrics · ca922401
      Adarsh Saraf authored
      
      
      This CR adds the following raft-specific metrics:
      	- cluster_size
      	- is_leader
      	- committed_block_number
      	- snapshot_block_number
      	- # leader_changes
      	- # proposal_failures
      
      Change-Id: Ie35f2984bc8eaecdd5826b1e1aa21e6c759fdd83
      Signed-off-by: default avatarAdarsh Saraf <adarshsaraf123@gmail.com>
      ca922401
    • Jay Guo's avatar
      [FAB-14129] Add more logs to etcdraft chain · fb59b1bb
      Jay Guo authored
      
      
      This CR adds more debug logs to etcdraft chain to facilitate debugging.
      
      Change-Id: I2d70869bc8823babb3ab50782bd4472637ed5820
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      fb59b1bb
    • Jay Guo's avatar
      [FAB-13656] Size-based snapshotting · 566562e7
      Jay Guo authored
      
      
      Instead of taking snapshot every N blocks, this CR
      changes it to taking snapshot every N bytes.
      
      This also sets default SnapshotInterval to 100MB, if
      it's unset. Otherwise data in memory is never compacted
      till OOM.
      
      Meanwhile, DefaultSnapshotCatchUpEntries is shrunk so
      it does not take too much space to preserve extra entries
      every time a snapshot is taken. Slow nodes are catching up
      using blockpuller, which is also efficient.
      
      Change-Id: I79cfeb8652fcbafdeb5793bf4f06267b95a858d6
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      566562e7
    • yacovm's avatar
      [FAB-13805] Unify Step and Submit into a stream · 38c1515c
      yacovm authored
      
      
      This change set removes Step RPC from the cluster protobuf,
      and renames Submit stream to a Step stream, and makes both
      transaction forwarding and consensus messages use the
      new Step stream.
      
      It also makes both egress Send() and Recv(), have a maximum
      timeout (the RPC timeout in the config).
      A Send or Recv that is used to send a consensus message,
      or send (receive) a transaction (status) will now abort prematurely
      in order to protect against any liveness issue on the remote node,
      and also to return an answer to clients within a timely manner.
      
      Change-Id: Id942b248212f5c324e12af34fce48f96fdbb6aea
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      38c1515c
    • Jay Guo's avatar
      [FAB-13455] Initialize BlockPuller on demand. · e4060ed3
      Jay Guo authored
      
      
      The creation of BlockPuller takes latest certificates, therefore
      should be done on-demand to guarantee its validity.
      
      Change-Id: I327275da495a85126feb58c84b460bed98f7b860
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      e4060ed3
    • Jay Guo's avatar
      [FAB-11863] Clean orderer network failure logs · 082a9102
      Jay Guo authored
      
      
      Every failed attempt to send Step request is logged at ERROR level,
      which pollutes etcdraft orderer leader logs when a follower is down.
      
      This CR changes it to log at DEBUG level, except for the first failed
      attempt and the first successful delivery after failure(s).
      
      Test done: manually run cft integration test and inspect logs.
      
      Change-Id: I1dd3468de1f6745f658c15e83a5e644e0b0492d6
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      082a9102
    • Jay Guo's avatar
      [FAB-13059] put raft snapshotting in go routine · 2ad9d9da
      Jay Guo authored
      
      
      This CR puts Raft snapshotting into a go routine to avoid
      excessive snapshotting due to extreme small SnapshotInterval.
      
      This is also preperation for WAL files pruning.
      
      Change-Id: Ib43a2197c533bdc224a4bc52ff6cb418b62a0c33
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      2ad9d9da
    • Jay Guo's avatar
      [FAB-12709] Enable CheckQuorum · 50a09fd0
      Jay Guo authored
      
      
      When CheckQuorum is enabled, leader steps down if it cannot reach
      the quorum of network, so that clients have a chance to disconnect
      and try other nodes.
      
      Change-Id: I901c0e3009f9d354a2b504fe16174432345055b3
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      50a09fd0
    • Jay Guo's avatar
      [FAB-13178] Move `SendSubmit` out of serveRequest · 100e1ad7
      Jay Guo authored
      
      
      When gRPC buffer of `Submit` stream is full, `SendSubmit` would
      block, which freezes the `serveRequest` go routine. This CR moves
      this out of go routine, and clients should be blocked on waiting
      for room in buffer.
      
      Change-Id: I62cd261b9419bd8df3fa1bfaeff14551168d2e65
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      100e1ad7
    • Jay Guo's avatar
      [FAB-13178] Use MaxInflightMsgs to throttle requests · 9b78a9d8
      Jay Guo authored
      
      
      If there are MaxInflightMsgs blocks proposed but not
      committed, chain blocks further incoming requests.
      
      Change-Id: I58c84e23c882ccc152e5c9a248434e466a8b5266
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      9b78a9d8
    • Jay Guo's avatar
      [FAB-13438] Errored should reflect correct state · 0276480c
      Jay Guo authored
      
      
      This CR changes Errored to return a channel that is
      closed when node becomes candidate.
      
      Change-Id: Ibd0ece763b9d93c4da93825d1b302ecc55a9b32e
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      0276480c
    • Jay Guo's avatar
      [FAB-13438] Store raft SoftState · 21a49bad
      Jay Guo authored
      
      
      Store raft SoftState in raft chain so it returns error
      while election is ongoing. This prevents a disconnected
      follower from returning success on Broadcast API.
      
      Change-Id: Ib6619b230938f0d6c10240b8cd8e34e346056145
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      21a49bad
    • Jay Guo's avatar
      [FAB-13438] pass SoftState on observe channel · 657b8095
      Jay Guo authored
      
      
      This CR changes type of etcdraft observe channel from uint64
      to raft.SoftState, so that chain_test can assert not only leader
      id, but also the state of node.
      
      Change-Id: Ia0c5f8c9060c234ceb84133e0c5598ed064dd1ee
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      657b8095
    • Jay Guo's avatar
      [FAB-13447] new leader should wait for in flight msg · 0d247c1d
      Jay Guo authored
      
      
      Newly elected raft leader should wait for in flight blocks
      to be committed, before accepting new envelopes and creating
      new blocks. Otherwise all those blocks created would be uncle
      blocks and we don't permit this situation in Fabric.
      
      Change-Id: Ia5adac185263735eace1fc805ebea0f5c98b2fb1
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
      0d247c1d