1. 27 Feb, 2019 40 commits
    • Yoav Tock's avatar
      FAB-13265 migration status in channelconfig · 7c384c52
      Yoav Tock authored
      Change the channelconfig of an Orderer to reflect the extension of ConsensusType:
      - MigrationState
      - MigrationContext
      Add a method to the bundle to validate the migartion steps of
      a new versus old config.
      Add test-cases to bundle_test.go to unit-test said method.
      Improved comments language.
      Needed to regenerate and update mocks in 'common' and 'blockcutter' packages
      for unit tests to build correctly.
      Change-Id: If060c05bcb9a0e0ca81b1f754a2b0e69a7f6c896
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
    • Yoav Tock's avatar
      FAB-12984 consensus migration protos · aa35c9f8
      Yoav Tock authored
      Allow a Kafak-based orderer to receive & process config transactions
      that change consensus-type - broadcast phase.
      In the orderer configuration proto-buff definition,
      extend ConsensusType to include:
      enum "Migration State" - to command & record the state of
        the migration per channel
      uint64 "Migration Context" - to correlate the system-channel
        config-update-tx with the following standard-channel config-update-tx(s)
      Change-Id: I121496499b3e4b6355a43843b49d3e039a65a987
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
    • yacovm's avatar
      [FAB-13633] Make Step RPC failures non blocking · fba0b4e2
      yacovm authored
      Per the gRPC documentation:
      If an RPC is issued but the channel is in TRANSIENT_FAILURE or
      SHUTDOWN states, the RPC is unable to be transmited promptly.
      By default, gRPC implementations SHOULD fail such RPCs immediately.
      This is known as "fail fast," but usage of the term is historical.
      RPCs SHOULD NOT fail as a result of the channel being in other states
      Therefore, if it takes too much time for gRPC to move from
      a state of CONNECTING to TRANSIENT_FAILURE (i.e - packet drop,
      or DNS lookup failure) - it will slow down the entire Raft FSM.
      This change set makes Step RPCs inspect the underlying
      gRPC connection state prior to being invoked.
      If the connection is in state connecting, then
      it fails fast.
      Change-Id: I50df1f758a00fc99bed54ff1a2056f83f53efdf7
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13178] Simplify the proposition of config block · fb6ffe8f
      Jay Guo authored
      Change-Id: Ie755722f7db1df5efca5dea43d9912fdf36b6d25
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-11996] Fix failed UT · d12f1e49
      Jay Guo authored
      The UT being fixed in this CR submits a malformed config env,
      which crashes test. It passed because we never wait for the
      block to be committed and shut down test early.
      Change-Id: I40311ccbf03d8ffb73d4467e5695a28b4834d61e
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-13481] Make onboarding code more idiomatic · 532b5382
      yacovm authored
      This change set addresses code review comments from
      https://gerrit.hyperledger.org/r/#/c/28391/ and from
      in an attempt to make the orderer code more idiomatic.
      Change-Id: I04ac7bc21ee8fc1ccda4e76d8afa53fe527f7f5e
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • yacovm's avatar
      [FAB-13495] Activate onboarding max retries · 235bb3ac
      yacovm authored
      This change set enables the max retry logic for onboarding:
      - Adds a new configuration parameter to orderer.yaml.
      - Adds the appropriate configuration to the production code.
      - Adds a cross-package unit test that simulates the scenario
        for which the retry logic was made: An application channel
        is listed in the system channel, but as we try to pull it,
        we fail until we exhaust our retry count.
        We nevertheless - commit the genesis block for that channel,
        and proceed with the replication.
      Change-Id: I28204f3c1ec0f99dd4d510ed7c9f4ae94759cba2
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Yoav Tock's avatar
      FAB-12983 capability V2_0 for Kafka2RaftMigration · 3984ca15
      Yoav Tock authored
      Add orderer capability V2_0 for Kafka2RaftMigration
      This capability defines whether the orderer supports a kafka to
      raft migration. A kafka-based Ordering Service Node requires
      this in order to receive and process a config update with
      consensus-type migration commands. Migration is supported from
      Kafka to Raft only. If not present, these config updates will
      be rejected.
      Change-Id: I3b56dec21f0893d0b1df5db30973b4762aab5575
      Signed-off-by: default avatarYoav Tock <tock@il.ibm.com>
    • yacovm's avatar
      [FAB-13465] Max retry attempts for orderer replication · 0da0ecee
      yacovm authored
      This change set adds an option to configure the block puller
      used for the replication with a maximum retry attempts.
      It is needed because during onboarding, a specific application channel
      might become unavailable, but it shouldn't block onboarding now when
      we have dynamic periodical onboarding for channels we were unable to join.
      Change-Id: I12f4247040c258809885f0e5fdc07d60914a56e2
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13180] Orderer: auto-join existing inactive chains · 4bc13c8e
      yacovm authored
      This change set makes cluster type OSNs autonomously detect channels
      that exist and that they should be part of (the channel configuration
      has their public credentials as a consenter for the channel),
      but that they do not run chains for, or have the blocks in their ledger.
      This can happen from several reasons:
      - The OSN is added to an existing chain, and since it didn't participate
        in the chain so far, it didn't get the blocks that tell it is now
        part of the channel.
      - The OSN tried to detect whether it is part of a channel, but it
        wasn't able, because all OSNs of the system channel returned
        service-unavailable. This can happen if:
        - a leader election takes place
        - the network is acting up so the leadership was lost
        - a channel has been deserted (all OSNs left it).
      To take care of such use cases, all OSNs now:
      - Track inactive chains that they know of, but they do not participate in
      - Periodically(*) probe the system channel OSNs to see if they are now
        part of these chains or not.
      - If so, then they replicate the chains, and create instances of them,
        and replace the instances of the inactive chains in the registrar
        with the new instances of type etcdraft.
      (*) - 10 seconds after boot, then after 20 seconds,
            then after 40 seconds, etc. etc. eventually- every 5 minutes.
      Change-Id: I3c2a84e6f4f402e011e7a895345b3d3982247083
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13456] Fix race in etcdraft test · e8514271
      Jay Guo authored
      Change-Id: I9fe663c4efa46e5644571d238f4d7ea8f4e51626
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-13456] Use empty peer list to join raft cluster · 2755580a
      Jay Guo authored
      When joining a fresh node to existing etcdraft cluster, it
      should be using empty peer list to call `StartNode`.
      Change-Id: Ib6acf6fd9b2956680c99d5d7370ce439228d3bfa
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-13444] Prepare onboarding to multi-time use · 48bd0ee9
      yacovm authored
      This change set refactors the onboarding infrastructure to support
      multiple use of it, in contrast to the current logic which
      can only be used once.
      Namely, it re-uses the current logic and introduces a new method
      to the replicationInitiator:
      replicateChains(lastConfigBlock *common.Block, chains []string)
      Which forces replication of the given chain names, with the given
      last config block.
      The chains slice is passed to the cluster.Replicator as a filter
      which prevents pulling chains that aren't among the chains slice.
      Change-Id: I3331f2abb6a2879876644b2f5ef4ee48c4eb43fa
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13362] Pulling not servicing chains in onboarding · 12948f81
      yacovm authored
      An orderer might not have permission to try and probe whether it belongs
      to a certain application channel.
      In addition, since the OSNs of an application channel might be a subset
      of the system channel OSNs, they may be unreachable at the time of
      onboarding, so all we will get from other OSNs is "service unavailable".
      This change set addresses this by making that if we try to pull blocks
      in order to see whether we belong to the channel (by pulling the latest block)
      and we only bad responses from all OSNs that say: un-authorized, not available,
      we don't panic. Instead we just skip pulling the chain.
      If some orderer returns unauthorized, and the rest either not return
      anything, or return a bad request, unavailable, etc. - we return
      that we are unauthorized.
      If some orderer returns service unavailable, and the rest return
      anything that is not unauthorized, then we classify it as service
      If no orderer returns unauthorized/unavailable,
      and all orderers return something bad or not return anything at all -
      we now panic as before, because it means we probably misconfigured the
      node, or we are in a network partition so we don't want to
      skip pulling blocks.
      This change set also enchances the reconfiguration integration test
      to include a third channel for which the onboarded OSN is not authorized.
      Change-Id: I6f9b0cfe3671794ef1c036b432e77e2ac55b1efd
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13441] Properly capture OSN output · c73870ff
      yacovm authored
      The reconfiguration and onboarding integration tests ensures
      that the OSNs stop logging errors at the end of the test,
      in order to ensure there aren't any not noticed faults
      that occurred due to reconfiguration/onboarding.
      The function used the wrong method to obtain a buffer
      that is used to read the process's output.
      Change-Id: Ieadae1bb083454b195cbfe52b41582dc9dbbf80a
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13428] Make TestReplicateChainsFailures robust · c68313e0
      yacovm authored
      There was a redundant sending of a block, which got the client
      and the mock server out of sync occasionally.
      Ran the test 2,000 times and it didn't fail afterwards.
      Change-Id: I09631632a16d3ee42fc51fbb809f3027e50a0973
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13427] Make replication tests not depend on time · f7d59804
      yacovm authored
      Sometimes the machine that runs the test, is stuck for too long,
      and it makes the client side time out, thus the server side mock
      gets out of sync with the tested client side (the replication code).
      A way to fix it, is to specify a really big timeout (1 hour),
      and in cases where we expect the server not to respond - instead of
      making it time out, we make it send an EOF downstream, which will
      indicate a failure to the client.
      Change-Id: I06ed5bd4a645ae8ace90542fe56d19254bfa42b7
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13360] Fix an etcdraft flaky UT · 45d6f1b9
      Jay Guo authored
      This CR removes a redundant assertion in etcdraft UT
      to avoid flakiness.
      The assertion being removed checks that expected number
      of MsgApp are dropped since node is disconnected from
      network. However, when a raft candidate is elected as
      leader, it broadcasts MsgApp containing empty data to
      followers, and sometimes this is being counted as part
      of dropped MsgApp, which causes unmatched expectation.
      We could be more precise by inspecting MsgApp on wire,
      and neglect empty messages, so we still perform this
      assertion. But it is anyway a redundant check, therefore
      can be safely removed.
      This CR also further reduces total test time.
      Change-Id: I74af9fecfebe20e44c6736a644352f8b67b624e3
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-13415] DRY up UpdateConsensusMetadata in nwo · 86cb2d8c
      yacovm authored
      This change set makes AddConsenter and RemoveConsenter use
      a consensus specific method UpdateEtcdRaftMetadata instead
      of the generic UpdateConsensusMetadata one, to remove
      code duplication.
      It also addresses a few nits in etcdraft_reconfig_test.
      Change-Id: I86d50fd80d4985df77474c054ce916f0d2fb62e7
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13367] Fix flaky etcdraft UT · bebfee73
      Jay Guo authored
      In raft, to commit a message, leader firstly broadcasts
      MsgApp to replicate data, and once it receives MsgAppResp
      from quorum, it
      - applies message to state machine
      - broadcasts another MsgApp to instruct followers to commit
      and these are done in parallel.
      To emulate situation where a node is disconnected *after*
      config block is committed and *before* node add/remove in
      that config block is proposed to raft, `WriteConfigBlock`
      stub on leader was overloaded to disconnect node from the
      network, so that ConfChange is proposed but dropped.
      However this becomes racy when we want to assert the config
      block is committed on followers as well, since we might
      disconnect leader too fast, and cause the second round of
      MsgApp to be dropped, so that config block is not committed
      on followers. This is fine in real case because new leader
      will continue the effort to commit this block.
      To circumvent this flakiness, this CR changes UT to overload
      `StepStub` instead of `WriteConfigBlockStub`, so that we can
      be more precise on when to disconnect a node from network.
      Change-Id: Ic1b7d28c043e779c7cc258c2e08bfaa3578bc429
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-1337] Raft: Commit genesis blocks for non-members · c9996d64
      yacovm authored
      Currently (before this CR), an onboarded OSN doesn't pull any blocks
      for channels it doesn't participate in.
      Aas a result, when the OSN starts up after onboarding - it doesn't have these
      channels in its registrar, and therefore may classify channel
      creation transactions differently than its fellow OSNs that do have
      the channels it didn't pull.
      In order to avoid a state fork, this change set makes the OSN
      to commit the genesis block for channels it doesn't participate in.
      This is *NOT* done by pulling the genesis blocks, since the OSN may not have
      permissions to do that in the first place, but instead - it creates
      a genesis block from the system channel block that has the channel creation
      This change set also changes the integration test for onboarding to
      adjust to the changes, namely - ensures the OSN committed the genesis
      block for a channel it doesn't participate in, and upon Broadcast,
      returns an answer stating it doesn't participate in the channel.
      Also, it reduces the run time for the integration test to 50s.
      Change-Id: Icf5754df6cedb7725c4d7091c7366ce0b17ff1b7
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • yacovm's avatar
      [FAB-13208] Raft Reconfig&Onboarding integration test · a46a55d5
      yacovm authored
      This change set adds an integration test for etcdraft orderers which:
      1) Spawns 3 OSNs of type etcdraft.
      2) Rotates their TLS certificates.
      3) Spawns a fourth OSN of type etcdraft.
      4) Gives it the last config block of the system channel.
      5) Ensures it sync with the channels it needs.
      6) Ensures it doesn't sync with the channels it doesn't need.
      7) Ensures it doesn't log errors to the logs of the orderers.
      Change-Id: I7f4cb1b6d841f51aae9f091da80797d1bac3df99
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13333] Orderer config update to use orderer creds · d434416f
      yacovm authored
      This change set makes orderer config updates use orderer
      This is needed when we want to update the system channel,
      since we cannot pull blocks from the system channel
      with peer credentials.
      Change-Id: Ic5f5749f7ec3e5ee7012b7a9f1d764826608b7d4
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • yacovm's avatar
      [FAB-13331] Refactor metadata updates in nwo · 4f802d51
      yacovm authored
      This change set refactors metadata updates by making them
      use a function that dictates how to handle consensus metadata.
      Change-Id: I3aa68e4b268a24887e4cba891e02ebce1a2ec65d
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13298] Fix test flake on MacOS · 567981aa
      yacovm authored
      Fixed a problem on MacOS but it seems that the error string
      that is returned from the operating system's system call
      differs on linux and Mac.
      This change set addresses this by making the panic error
      comparison look for a substring instead of a full comparison.
      Change-Id: Idf10bff7b4dde6009ce01bb83b7bd576be4df2b4
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13332] Add cryptogen extend to integration tests · 6e34e329
      yacovm authored
      This change set adds an ability to call "cryptogen extend"
      in integration tests.
      Change-Id: I5db2adbdb1260bf47da33ad1b5df9022a8fb1c95
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13334] Onboarding: Allow empty channels · 45316e37
      yacovm authored
      This change set removes the need for channels to be expected
      to have at least 1 block other than the genesis block.
      Change-Id: If1ce27b2bb703bd308ae356ea6fe6d6736a2dc99
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13330] Rename GetConfigBlock to GetConfig in nwo · 4ba5d615
      yacovm authored
      This change sets renames the GetConfigBlock to GetConfig, to fit
      what it actually returns.
      Change-Id: Ica00d5e6dab91852767c1c4fd1d8af0454bd1bd5
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13349] Add more assertion to etcdraft UT. · 81c2e195
      Jay Guo authored
      Change-Id: I85cb5b5cc633f27fa4f317bd2b0f6d947fd97a6d
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-13095] fix UT flake RPC timeout · 11567c1a
      yacovm authored
      Sometimes, 100ms isn't enough for the RPC call to reach
      the server, and therefore the request times out before
      it invokes the mock.
      Increased the timeout to 1 second, and tested this
      in CI for more than 1,000 iterations in
      a custom CR: https://gerrit.hyperledger.org/r/#/c/28258/
      Change-Id: I01976373dcb66f652016f8916c5c86027e21f0d4
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13350] Fix etcdraft flaky test · 2077c063
      Jay Guo authored
      Change-Id: Icd364101a350ab6fa2959f196f346c717a16337f
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-13298] Fix TestConfigureClusterListener in MacOS · 325618b3
      yacovm authored
      Some people have port 5000 used in MacOS.
      When they try to run unit tests locally,
      the test TestConfigureClusterListener fails due to
      the port being already in use, which changes the actual
      error returned, and as a result - the test fails because
      the error string doesn't match what the test expects it to be.
      Fixed by selecting a random un-used port.
      Change-Id: I687ae38b8c0200e8fb899a519611b50e19cafe08
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13299] Onboarding: Skip committing existing blocks · 39926d49
      yacovm authored
      If we onboard a new OSN, and crash in the middle-  we start with a ledger
      that has some blocks, in some channels.
      The onboarding phase, has to fetch all blocks in order to verify the hash
      chain and verify the blocks, but we should skip committing blocks that we
      already committed before.
      This change set implements that, and adjusts the tests accordingly.
      Change-Id: I9ffc56cd121e11ddfe675894ae9378fd52157dbe
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-12579] Separate TLS listener for intra-cluster · bd0a99d7
      yacovm authored
      This change set, adds an option for a separate TLS listener
      for intra-cluster communication.
      Change-Id: I059e4d45ddeaf066017c758b83a3e7422783a403
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • yacovm's avatar
      [FAB-13262] typo in configblock.go · f8934536
      yacovm authored
      whena --> when
      Change-Id: I24dd0fb741849293e13db5c1bce9cce3c27934ef
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • Jay Guo's avatar
      [FAB-13053] Add an UT to assert retransmission. · 20579e8b
      Jay Guo authored
      This CR adds an UT to attest retransmission
      of etcd/raft MsgApp.
      Change-Id: Ic06003ae7da9dc1dcc991103e6748b55a47f04dc
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • Jay Guo's avatar
      [FAB-12949] Fix etcdraft reconfiguration UT · c5cb8b22
      Jay Guo authored
      Change-Id: I64c232fe6433b9f6ca0dadbabfcc771f3f2c623f
      Signed-off-by: default avatarJay Guo <guojiannan1101@gmail.com>
    • yacovm's avatar
      [FAB-12729] Support subset of system channel OSNs · fba8cb02
      yacovm authored
      This change set changes etcdraft orderer node logic
      to participate in a subset of the channels in the network.
      Previously, before this change set - when an orderer can't find
      its certificate in the channel config - it panics.
      This is not a wanted behavior for application channels,
      as we might want only a subset of the nodes that belong to the
      system channel, to participate in a specific application channel.
      After this change set, if an OSN finds it doesn't need to participate
      in a channel - it simply presents to clients an "inactive channel".
      This is needed, because channel creation transactions are classified
      by whether the channel is known and registered in the registrar,
      thus - if we are to not instantiate any channel object - some nodes
      would act differently than other nodes with respect to classification
      of target channel at the time of dispatching the transactions.
      Change-Id: I254d270135fb4d9a105d04de893ae8ccdd13f0d7
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    • yacovm's avatar
      [FAB-13150] Re-enable etcdraft for v2.0 development · 3713cc30
      yacovm authored
      This change set re-enables etcdraft to resume v2.0 development
      after v1.4 branch is cut.
      Change-Id: I384247a9de2763a207bbde4fa8e519d703241ad5
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>
    • Artem Barger's avatar
      [FAB-13225] address code review comments · 52296172
      Artem Barger authored
      This commit address code review comments left from review of CR related
      to FAB-12552.
      Change-Id: I04635d36d7076cf89d43f42aa5351692b8d1782e
      Signed-off-by: default avatarArtem Barger <bartem@il.ibm.com>