1. 03 Mar, 2019 31 commits
    • yacovm's avatar
      [FAB-14111] Remove unused function IsMembershipUpdate · 89e67f35
      yacovm authored
      The function is no longer used.
      Change-Id: Ic19762fd1cfa0d691816220e74977ba9efcce366
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-14045] Send messages asynchronously in clusters · 5c2e2122
      yacovm authored
      This change set makes consensus messages be sent asynchronously, over
      a buffered channel with a size of 10.
      Consensus messages are dropped when the buffer overflows,
      and Submit messages are blocking on the buffer.
      Without this change set, the sending of large messages takes milliseconds,
      while with the change set it takes micro-seconds.
      Change-Id: Id60b05b96eed6d9d04f89b8967945b18ddfbef94
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-14041] Validate boot block is system channel block · fc87b4ff
      yacovm authored
      This change set makes the orderer validate that the bootstrap block
      it is spawned with (or created from configtx.yaml), contains
      a ConsortiumsConfig.
      Change-Id: I26abf8ac8719fbb472351b036137debc7a911665
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13934] Add GinkgoRecover to integration tests. · 9e9000a2
      Jay Guo authored
      This makes assertion failure more debuggable.
      Change-Id: I66f8ac8c9b755eaab37f89a10a39c3bfa44ef39a
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13059] Purge etcdraft WAL and Snapshot files · 942762e5
      Jay Guo authored
      When a snapshot is taken, stale etcdraft WAL files should be
      purged to free disk space, as well as old snapshot files.
      However, we still keep several snapshot files around, in case
      the latest file is corrupted, etcdraft will automatically load
      an older one, until there's none left.
      Change-Id: I2b8168dbc0c3e5bd56a081c104dd7dc9defbcd92
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-13805] Unify Step and Submit into a stream · 38c1515c
      yacovm authored
      This change set removes Step RPC from the cluster protobuf,
      and renames Submit stream to a Step stream, and makes both
      transaction forwarding and consensus messages use the
      new Step stream.
      It also makes both egress Send() and Recv(), have a maximum
      timeout (the RPC timeout in the config).
      A Send or Recv that is used to send a consensus message,
      or send (receive) a transaction (status) will now abort prematurely
      in order to protect against any liveness issue on the remote node,
      and also to return an answer to clients within a timely manner.
      Change-Id: Id942b248212f5c324e12af34fce48f96fdbb6aea
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13618] Fix test flake in OSN eviction test · 06671310
      yacovm authored
      The integration test that checks that an orderer is evicted from a channel
      and stops its service for the channel has broken due to:
      1) A removed log message that it used as an indicator was removed
         in a parallel CR.
      2) In another parallel CR, the communication layer now puts message into
         the log asynchronously and doesn't block - and as a result -
         a node might be evicted from the channel but the other nodes will
         close the connection to it before it has a chance of obtaining the block
         that evicts it from the channel.
      For (1) - the message that no longer exists was removed from the test.
      For (2) - the node that is removed is now always the leader, and this way
                it always gets the block update (because it sends it in the first
      Change-Id: Ib67d1a448447ef44d9b41f52c8ee8bddb6b064ce
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-14010] Integration test- remove OSN from cluster · e1b2171d
      yacovm authored
      This change set adds an integration test that removes an OSN
      from an application channel and system channel and ensures
      that the OSN gracefully shuts down for these channels.
      Change-Id: Idcdad8083f5881c6194185ad5f623c9c64323a02
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13967] Polling, instead of waiting in AfterEach · 577301d9
      Jay Guo authored
      If something in AfterEach hangs till test timeout, the
      coredump produced might covers up the actual assertion
      failure in test body. This CR fixes this in etcdraft UT.
      This CR also adds LongEventualTimeout to some places in
      etcdraft chain_test to prevent flakes due to slow wal sync.
      Change-Id: I585e59e5eb587f9e9eb082c5eb8f681141b16e55
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Yoav Tock's avatar
      FAB-13264 consensus migration: kafka2raft green path #1 · 316db769
      Yoav Tock authored
      This is the first of four (1/4) sub-tasks that focus on the
      "green" path of consensus-type migration from Kafka to Raft. 
      By "green" we mean that there are no failures or aborts along
      the way. The flow of the green path and the changes made in
      these 4 tasks are described below. The 4 sub-tasks are staged
      in a way that minimizes dependencies between them.
      In this sub-task we introduce changes to the 
      orderer/common/bootstrap package (see details below).
      In essence, Just before the last config block of the system
      channel (COMMIT) is written to the ledger, the bootstrap file
      (a.k.a "genesis.block", do not confuse with the first block of
      the ledger) is swapped with the last block of the system
      channel. This sub-task extends package orderer/common/bootstrap
      to support this functionality.
      See respective JIRA item for further details.
      The "green" path for migration is the following:
      1. Start with a Kafka-based ordering service
      2. Send a config update tx (START-TX) on the system channel that:
       - Has ConsensusType.MigrationState=START
       - This will disable the creation of new channels
       - This will disable the processing of normal (standard channel) transactions
      3. Wait until the START-TX is committed and get the block height H of that tx
      4. Send a config update tx (CONTEXT-TX) on each of the standard channels that:
       - Has ConsensusType.MigrationState=CONTEXT
       - Has ConsensusType.MigrationContext=H
       - Has ConsensusType.Type="etcdraft"
       - Has ConsensusType.Metadata=<a marshaled etcdraft metadata: Consenters,
         Options, etc>
      5. Send a config update tx (COMMIT-TX) on the system channel that:
       - Has ConsensusType.MigrationState=COMMIT
       - Has ConsensusType.MigrationContext=H
       - Has ConsensusType.Type="etcdraft"
       - Has ConsensusType.Metadata=<a marshaled etcdraft metadata: Consenters,
         Options, etc>
       - The metadata should be the same as for the standard channels, with the same
       - If committed successfully, no further configuration will be possible
      6. Restart each orderer
       - The orderer will bootstrap into an etcdraft mode
       - Each channel will form a cluster
       - Normal transactions can resume now
      7. In order to configure the channels (system or standard), make sure that
         the first
         config update tx (on any given channel) after migration has:
       - Has ConsensusType.MigrationState=NONE
       - Has ConsensusType.MigrationS=NONE
       - In addition to other changes to the channel's config.
      Change-Id: Iccd146bb7260bafa4e4d8c4ee457d2ac19f5a642
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13455] Initialize BlockPuller on demand. · e4060ed3
      Jay Guo authored
      The creation of BlockPuller takes latest certificates, therefore
      should be done on-demand to guarantee its validity.
      Change-Id: I327275da495a85126feb58c84b460bed98f7b860
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-11863] Clean orderer network failure logs · 082a9102
      Jay Guo authored
      Every failed attempt to send Step request is logged at ERROR level,
      which pollutes etcdraft orderer leader logs when a follower is down.
      This CR changes it to log at DEBUG level, except for the first failed
      attempt and the first successful delivery after failure(s).
      Test done: manually run cft integration test and inspect logs.
      Change-Id: I1dd3468de1f6745f658c15e83a5e644e0b0492d6
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13059] put raft snapshotting in go routine · 2ad9d9da
      Jay Guo authored
      This CR puts Raft snapshotting into a go routine to avoid
      excessive snapshotting due to extreme small SnapshotInterval.
      This is also preperation for WAL files pruning.
      Change-Id: Ib43a2197c533bdc224a4bc52ff6cb418b62a0c33
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13199] Start etcdraft chain sequentially in UT · 8c445197
      Jay Guo authored
      This CR hopefully makes occasional UT timeout more debuggable.
      Also, it fixes a go routine leak in UT.
      Change-Id: Ia0ac63b2394061dd13a570d71eae6b4139fb73b0
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-12709] Add integration test for CheckQuorum · bd6bd0ec
      Jay Guo authored
      Change-Id: Ie80ff2f11de59a216a94fc61330f9d625ed16e59
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-12709] Enable CheckQuorum · 50a09fd0
      Jay Guo authored
      When CheckQuorum is enabled, leader steps down if it cannot reach
      the quorum of network, so that clients have a chance to disconnect
      and try other nodes.
      Change-Id: I901c0e3009f9d354a2b504fe16174432345055b3
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-12709] Use another way to elect leader in UT · 5c3e2fce
      Jay Guo authored
      In etcdraft UT, we often need to deterministically elect a leader.
      This was done by ticking ONLY one node in the network, so it is
      the only node that start campaign.
      HOWEVER, there are several problems with this approach:
      1. it's slow. We need real time interval between ticks due to the
         way fake clock is implemented: it drops tick on the floor in
         case of slow consumer.
      2. there is random factor in election timeout of etcd/raft. It is
         calculated as follow:
      randomElectionTimeout = electionTimeout + rand.Intn(electionTimeout)
         in another word, if we send electionTimeout ticks, it's not
         guaranteed to trigger a leader election
      3. if CheckQuorum is enabled, a lease is imposed on follower nodes
         which gets expired if
            electionTimeout <= elapsedTicks < randomElectionTimeout
         (if it's greater than randomElectionTimeout, it's reset to 0 and
         node starts campaign)
      In this CR, we send an artificial MsgTimeoutNow to the node to be
      elected. This message reliably triggers campaign and skip the lease
      This CR also fixes several potential data race and flakes in tests.
      Change-Id: I3c8e0bcadbb8cfa1ae3393de2ea711fdd0d8b7aa
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13848] Fix flaky integration test in raft cft · 20dc27fc
      Jay Guo authored
      Test may query chaincode too fast after invocation, before block
      is actually committed.
      Change-Id: I4159fb2dfb31310eccfd64fcb9a9a99ceef54db0
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13845] Increase default raft tick interval · ebd9127c
      Jay Guo authored
      Increase default etcdraft tick interval to 500ms, for several reasons:
      - in a WAN/Cloud environment, this is more realistic
      - WAL sync in CI often exceeds 1s, which causes heartbeats not being
        sent timely. Increasing election timeout can decrease the chance of
        unexpected leader failover.
      This CR also increases default timeout of peer cmd, because now
      it takes 5~10s to elect a leader for a newly created channel, and
      `peer channel create` can only retrieve genesis block of that channel
      when leader exists (Deliver API returns error if leaderless).
      Note that this value is still configurable by users.
      Change-Id: I94fbbc750fa096cce6ef9e2d65eb981c6202b675
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Yoav Tock's avatar
      FAB-13705 refine Bundle.validateNew · 2979b8cc
      Yoav Tock authored
      This task addresses two issues:
      1) In common/multichannel/Bundle.ValidateNew() it is possible
         to identify the system channel using:
             _, isSys := b.ConsortiumsConfig()
         this can be used to refine validateMigrationStep() such that
         it deals more accurately with the migration-state transitions
         on the system vs. standard channels.
      2) In addition, prevent user from adding ConsortiumsConfig() to
         standard channels. This will protect multichannel.Registrar
         from blowing up on next initialization. Explanation:
         - Looking at the code in multichannel.Registrar, we see that
             _, ok := ledgerResources.ConsortiumsConfig()
         is used to identify the system channel. If two system channels
         are identified, the code panics.
         - Now, in Bundle.ValidateNew(), currently there is no mechanism
         to prevent a user (orderer admin) from adding a ConsortiumsConfig()
         to a standard channel. If a user does that,  multichannel.Registrar
         will blow up in the next initialization.
      Change-Id: Ia7551cbd27389a9988757af0224abdc0d1bfef5b
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
    • Yoav Tock's avatar
      FAB-13704 Update doc of ConsensusType proto · 5af9c275
      Yoav Tock authored
      Update documentation of ConsensusType in protos/orderer/configuration.proto
      to reflect implementation.
       - spell out permitted type strings: "solo" / "kafka" / "etcdraft"
       - update migration_state for which messages are permitted on system / standard channels
       - update migration_context for what is required on each migration_state
       - make protos
      Change-Id: Ia27d9cd162fe6656fd2bd56ceaf5aae8a6fe5222
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
    • yacovm's avatar
      [FAB-13808] Address code review comments for FAB-13363 · 6ebade28
      yacovm authored
      This change set addresses code review comments for FAB-13363.
      - Deletion of the redundant logger instance in the server main.go
      Change-Id: I0ee0db21894a352c7d1679efc27524162723a895
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13363] Block verification for onboarding · b6dc844a
      yacovm authored
      This change set connects the block verification infrastructure
      for onboarding to the production code.
      Now, whenever an orderer onboards a channel - it also verifies the blocks
      of the application channels, by:
      1) Creating a bundle from the genesis block, which is derived from
         the system channel (which is verified using backward hash chain validation).
      2) Verifying blocks using the bundle.
      3) Replacing the bundle with a new bundle whenever a config block is pulled.
      It also adds a check in the integration test, that ensures that no errors
      are reported in the log of the onboarded OSN.
      Change-Id: I3c5714f9d4491cdfd78e4e47407925136906d413
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13178] Move `SendSubmit` out of serveRequest · 100e1ad7
      Jay Guo authored
      When gRPC buffer of `Submit` stream is full, `SendSubmit` would
      block, which freezes the `serveRequest` go routine. This CR moves
      this out of go routine, and clients should be blocked on waiting
      for room in buffer.
      Change-Id: I62cd261b9419bd8df3fa1bfaeff14551168d2e65
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-13716] Block verifier book-keeping for onboarding · e0e3ddbb
      yacovm authored
      This change set adds the following supporting structs for adding
      support for verifying blocks pulled by onboarding in future CRs:
      - Ledger interceptor: intercepts a commit of a block, and invokes
        a callback.
      - VerificationRegistry: tracks commit of config blocks, and builds
        channelconfig bundles from them, in order to support verification
        of blocks pulled.
      - BlockVerifierAssembler and BlockValidationPolicyVerifier: together
        they build block verifiers out of config blocks.
      - verifierLoader: Loads a mapping of chainID->cluster.BlockVerifier,
        which is to be used at OSN startup to preload the existing verifiers.
        It is needed in cases we recover from a crash, or if we do
        dynamic onboarding and the previous config blocks have been committed
        to the ledger before the OSN was started.
      In the next CR, I will wire all these into the onboarding infrastructure
      itself, and they will be used to hold the latest bundle per channel
      in order to verify block signatures.
      Change-Id: Ic9fc99243baa5c2cef97103d001180207414d98a
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13178] Use MaxInflightMsgs to throttle requests · 9b78a9d8
      Jay Guo authored
      If there are MaxInflightMsgs blocks proposed but not
      committed, chain blocks further incoming requests.
      Change-Id: I58c84e23c882ccc152e5c9a248434e466a8b5266
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13438] Errored should reflect correct state · 0276480c
      Jay Guo authored
      This CR changes Errored to return a channel that is
      closed when node becomes candidate.
      Change-Id: Ibd0ece763b9d93c4da93825d1b302ecc55a9b32e
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13438] Store raft SoftState · 21a49bad
      Jay Guo authored
      Store raft SoftState in raft chain so it returns error
      while election is ongoing. This prevents a disconnected
      follower from returning success on Broadcast API.
      Change-Id: Ib6619b230938f0d6c10240b8cd8e34e346056145
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13438] pass SoftState on observe channel · 657b8095
      Jay Guo authored
      This CR changes type of etcdraft observe channel from uint64
      to raft.SoftState, so that chain_test can assert not only leader
      id, but also the state of node.
      Change-Id: Ia0c5f8c9060c234ceb84133e0c5598ed064dd1ee
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13613] Fix race in etcdraft chain UT · 5dadb3a5
      Jay Guo authored
      Add a lock to guard manipulation of `StepStub`.
      Change-Id: Icaadb1f5aea0cb7f266f24ed6756c4f6541768bd
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13447] new leader should wait for in flight msg · 0d247c1d
      Jay Guo authored
      Newly elected raft leader should wait for in flight blocks
      to be committed, before accepting new envelopes and creating
      new blocks. Otherwise all those blocks created would be uncle
      blocks and we don't permit this situation in Fabric.
      Change-Id: Ia5adac185263735eace1fc805ebea0f5c98b2fb1
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
  2. 02 Mar, 2019 1 commit
  3. 01 Mar, 2019 7 commits
    • Jason Yellick's avatar
    • Yacov Manevich's avatar
      Merge changes I67612917,I226ee34d,If5e785e0,Ibefe74a3,I4e32cb15, ... into release-1.4 · 160a228c
      Yacov Manevich authored
      * changes:
        [FAB-13447] Streamline the code
        [FAB-13178] A dumb version of etcdraft BlockCreator
        [FAB-13178] Remove global leader var in etcdraft chain
        [FAB-13178] Move raft logic to its own file
        [FAB-13178] do not accept new env when conf in flight
        [FAB-13178] Refactor etcdraft chain to avoid sync
        [FAB-13694] Move LastConfigBlock to orderer common
        [FAB-13698] disable flaky test TestReconnect
        [FAB-13643] Leader crash and failover integration test
        FAB-13265 migration status in channelconfig
        FAB-12984 consensus migration protos
        [FAB-13633] Make Step RPC failures non blocking
        [FAB-13178] Simplify the proposition of config block
        [FAB-11996] Fix failed UT
        [FAB-13481] Make onboarding code more idiomatic
        [FAB-13495] Activate onboarding max retries
        FAB-12983 capability V2_0 for Kafka2RaftMigration
    • Yacov Manevich's avatar
      Merge changes I12f42470,I3c2a84e6,I9fe663c4,Ib6acf6fd,I3331f2ab, ... into release-1.4 · c4c0ce0c
      Yacov Manevich authored
      * changes:
        [FAB-13465] Max retry attempts for orderer replication
        [FAB-13180] Orderer: auto-join existing inactive chains
        [FAB-13456] Fix race in etcdraft test
        [FAB-13456] Use empty peer list to join raft cluster
        [FAB-13444] Prepare onboarding to multi-time use
        [FAB-13362] Pulling not servicing chains in onboarding
        [FAB-13441] Properly capture OSN output
        [FAB-13428] Make TestReplicateChainsFailures robust
        [FAB-13427] Make replication tests not depend on time
        [FAB-13360] Fix an etcdraft flaky UT
        [FAB-13415] DRY up UpdateConsensusMetadata in nwo
        [FAB-13367] Fix flaky etcdraft UT
        [FAB-1337] Raft: Commit genesis blocks for non-members
        [FAB-13208] Raft Reconfig&Onboarding integration test
        [FAB-13333] Orderer config update to use orderer creds
    • Yacov Manevich's avatar
      Merge changes I3aa68e4b,Idf10bff7,I5db2adbd,If1ce27b2,Ica00d5e6, ... into release-1.4 · 4445fa12
      Yacov Manevich authored
      * changes:
        [FAB-13331] Refactor metadata updates in nwo
        [FAB-13298] Fix test flake on MacOS
        [FAB-13332] Add cryptogen extend to integration tests
        [FAB-13334] Onboarding: Allow empty channels
        [FAB-13330] Rename GetConfigBlock to GetConfig in nwo
        [FAB-13349] Add more assertion to etcdraft UT.
        [FAB-13095] fix UT flake RPC timeout
        [FAB-13350] Fix etcdraft flaky test
        [FAB-13298] Fix TestConfigureClusterListener in MacOS
        [FAB-13299] Onboarding: Skip committing existing blocks
        [FAB-12579] Separate TLS listener for intra-cluster
        [FAB-13262] typo in configblock.go
        [FAB-13053] Add an UT to assert retransmission.
        [FAB-12949] Fix etcdraft reconfiguration UT
        [FAB-12729] Support subset of system channel OSNs
        [FAB-13150] Re-enable etcdraft for v2.0 development
        [FAB-13225] address code review comments
        [FAB-13057] Remove applied index check in storage
        [FAB-13199] Reduce etcdraft test time.
        [FAB-12949] finish reconfiguration after restart
    • Yacov Manevich's avatar
    • Kostas Christidis's avatar
    • Kostas Christidis's avatar
  4. 28 Feb, 2019 1 commit
    • Gari Singh's avatar
      Allow statically configured root CAs for TLS · 86f1c990
      Gari Singh authored
      When peers communicate with other peers or
      orderers, the list of trusted CAs for TLS
      communication is derived from the channel
      configs.  For the peer, the list is the
      aggregate of all roots across all channels.
      For the orderer, the list is per channel.
      This CR adds the option to specify a static
      list of CAs via peer.tls.serverRootCAs.files
      in core.yaml and a flag
      peer.deliveryclient.staticRootsEnabled for the
      deliveryclient to use.
      Note:  the properties are intentionally not
      being added to the sample config because they
      should not be used in most situations.
      Fixes FAB-14420
      Change-Id: Ic381dc99bbb6dc5f7ceafd93738b34c5e24fe60c
      Signed-off-by: default avatarGari Singh <gari.r.singh@gmail.com>