- 03 Mar, 2019 16 commits
-
-
Jay Guo authored
When CheckQuorum is enabled, leader steps down if it cannot reach the quorum of network, so that clients have a chance to disconnect and try other nodes. Change-Id: I901c0e3009f9d354a2b504fe16174432345055b3 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
In etcdraft UT, we often need to deterministically elect a leader. This was done by ticking ONLY one node in the network, so it is the only node that start campaign. HOWEVER, there are several problems with this approach: 1. it's slow. We need real time interval between ticks due to the way fake clock is implemented: it drops tick on the floor in case of slow consumer. 2. there is random factor in election timeout of etcd/raft. It is calculated as follow: ``` randomElectionTimeout = electionTimeout + rand.Intn(electionTimeout) ``` in another word, if we send electionTimeout ticks, it's not guaranteed to trigger a leader election 3. if CheckQuorum is enabled, a lease is imposed on follower nodes which gets expired if electionTimeout <= elapsedTicks < randomElectionTimeout (if it's greater than randomElectionTimeout, it's reset to 0 and node starts campaign) In this CR, we send an artificial MsgTimeoutNow to the node to be elected. This message reliably triggers campaign and skip the lease check. This CR also fixes several potential data race and flakes in tests. Change-Id: I3c8e0bcadbb8cfa1ae3393de2ea711fdd0d8b7aa Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Test may query chaincode too fast after invocation, before block is actually committed. Change-Id: I4159fb2dfb31310eccfd64fcb9a9a99ceef54db0 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Increase default etcdraft tick interval to 500ms, for several reasons: - in a WAN/Cloud environment, this is more realistic - WAL sync in CI often exceeds 1s, which causes heartbeats not being sent timely. Increasing election timeout can decrease the chance of unexpected leader failover. This CR also increases default timeout of peer cmd, because now it takes 5~10s to elect a leader for a newly created channel, and `peer channel create` can only retrieve genesis block of that channel when leader exists (Deliver API returns error if leaderless). Note that this value is still configurable by users. Change-Id: I94fbbc750fa096cce6ef9e2d65eb981c6202b675 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Yoav Tock authored
This task addresses two issues: 1) In common/multichannel/Bundle.ValidateNew() it is possible to identify the system channel using: _, isSys := b.ConsortiumsConfig() this can be used to refine validateMigrationStep() such that it deals more accurately with the migration-state transitions on the system vs. standard channels. 2) In addition, prevent user from adding ConsortiumsConfig() to standard channels. This will protect multichannel.Registrar from blowing up on next initialization. Explanation: - Looking at the code in multichannel.Registrar, we see that _, ok := ledgerResources.ConsortiumsConfig() is used to identify the system channel. If two system channels are identified, the code panics. - Now, in Bundle.ValidateNew(), currently there is no mechanism to prevent a user (orderer admin) from adding a ConsortiumsConfig() to a standard channel. If a user does that, multichannel.Registrar will blow up in the next initialization. Change-Id: Ia7551cbd27389a9988757af0224abdc0d1bfef5b Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
Yoav Tock authored
Update documentation of ConsensusType in protos/orderer/configuration.proto to reflect implementation. - spell out permitted type strings: "solo" / "kafka" / "etcdraft" - update migration_state for which messages are permitted on system / standard channels - update migration_context for what is required on each migration_state - make protos Change-Id: Ia27d9cd162fe6656fd2bd56ceaf5aae8a6fe5222 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
yacovm authored
This change set addresses code review comments for FAB-13363. Namely: - Deletion of the redundant logger instance in the server main.go Change-Id: I0ee0db21894a352c7d1679efc27524162723a895 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set connects the block verification infrastructure for onboarding to the production code. Now, whenever an orderer onboards a channel - it also verifies the blocks of the application channels, by: 1) Creating a bundle from the genesis block, which is derived from the system channel (which is verified using backward hash chain validation). 2) Verifying blocks using the bundle. 3) Replacing the bundle with a new bundle whenever a config block is pulled. It also adds a check in the integration test, that ensures that no errors are reported in the log of the onboarded OSN. Change-Id: I3c5714f9d4491cdfd78e4e47407925136906d413 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Jay Guo authored
When gRPC buffer of `Submit` stream is full, `SendSubmit` would block, which freezes the `serveRequest` go routine. This CR moves this out of go routine, and clients should be blocked on waiting for room in buffer. Change-Id: I62cd261b9419bd8df3fa1bfaeff14551168d2e65 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set adds the following supporting structs for adding support for verifying blocks pulled by onboarding in future CRs: - Ledger interceptor: intercepts a commit of a block, and invokes a callback. - VerificationRegistry: tracks commit of config blocks, and builds channelconfig bundles from them, in order to support verification of blocks pulled. - BlockVerifierAssembler and BlockValidationPolicyVerifier: together they build block verifiers out of config blocks. - verifierLoader: Loads a mapping of chainID->cluster.BlockVerifier, which is to be used at OSN startup to preload the existing verifiers. It is needed in cases we recover from a crash, or if we do dynamic onboarding and the previous config blocks have been committed to the ledger before the OSN was started. In the next CR, I will wire all these into the onboarding infrastructure itself, and they will be used to hold the latest bundle per channel in order to verify block signatures. Change-Id: Ic9fc99243baa5c2cef97103d001180207414d98a Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
If there are MaxInflightMsgs blocks proposed but not committed, chain blocks further incoming requests. Change-Id: I58c84e23c882ccc152e5c9a248434e466a8b5266 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR changes Errored to return a channel that is closed when node becomes candidate. Change-Id: Ibd0ece763b9d93c4da93825d1b302ecc55a9b32e Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Store raft SoftState in raft chain so it returns error while election is ongoing. This prevents a disconnected follower from returning success on Broadcast API. Change-Id: Ib6619b230938f0d6c10240b8cd8e34e346056145 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR changes type of etcdraft observe channel from uint64 to raft.SoftState, so that chain_test can assert not only leader id, but also the state of node. Change-Id: Ia0c5f8c9060c234ceb84133e0c5598ed064dd1ee Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Add a lock to guard manipulation of `StepStub`. Change-Id: Icaadb1f5aea0cb7f266f24ed6756c4f6541768bd Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Newly elected raft leader should wait for in flight blocks to be committed, before accepting new envelopes and creating new blocks. Otherwise all those blocks created would be uncle blocks and we don't permit this situation in Fabric. Change-Id: Ia5adac185263735eace1fc805ebea0f5c98b2fb1 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
- 02 Mar, 2019 1 commit
-
-
David Enyeart authored
Add a hint about root cause - application capability V1_3 required. Change-Id: I4c72c4972f31732dff2e6aadd303853d5a1c79e7 Signed-off-by:
David Enyeart <enyeart@us.ibm.com>
-
- 01 Mar, 2019 7 commits
-
-
Jason Yellick authored
-
Yacov Manevich authored
* changes: [FAB-13447] Streamline the code [FAB-13178] A dumb version of etcdraft BlockCreator [FAB-13178] Remove global leader var in etcdraft chain [FAB-13178] Move raft logic to its own file [FAB-13178] do not accept new env when conf in flight [FAB-13178] Refactor etcdraft chain to avoid sync [FAB-13694] Move LastConfigBlock to orderer common [FAB-13698] disable flaky test TestReconnect [FAB-13643] Leader crash and failover integration test FAB-13265 migration status in channelconfig FAB-12984 consensus migration protos [FAB-13633] Make Step RPC failures non blocking [FAB-13178] Simplify the proposition of config block [FAB-11996] Fix failed UT [FAB-13481] Make onboarding code more idiomatic [FAB-13495] Activate onboarding max retries FAB-12983 capability V2_0 for Kafka2RaftMigration
-
Yacov Manevich authored
* changes: [FAB-13465] Max retry attempts for orderer replication [FAB-13180] Orderer: auto-join existing inactive chains [FAB-13456] Fix race in etcdraft test [FAB-13456] Use empty peer list to join raft cluster [FAB-13444] Prepare onboarding to multi-time use [FAB-13362] Pulling not servicing chains in onboarding [FAB-13441] Properly capture OSN output [FAB-13428] Make TestReplicateChainsFailures robust [FAB-13427] Make replication tests not depend on time [FAB-13360] Fix an etcdraft flaky UT [FAB-13415] DRY up UpdateConsensusMetadata in nwo [FAB-13367] Fix flaky etcdraft UT [FAB-1337] Raft: Commit genesis blocks for non-members [FAB-13208] Raft Reconfig&Onboarding integration test [FAB-13333] Orderer config update to use orderer creds
-
Yacov Manevich authored
* changes: [FAB-13331] Refactor metadata updates in nwo [FAB-13298] Fix test flake on MacOS [FAB-13332] Add cryptogen extend to integration tests [FAB-13334] Onboarding: Allow empty channels [FAB-13330] Rename GetConfigBlock to GetConfig in nwo [FAB-13349] Add more assertion to etcdraft UT. [FAB-13095] fix UT flake RPC timeout [FAB-13350] Fix etcdraft flaky test [FAB-13298] Fix TestConfigureClusterListener in MacOS [FAB-13299] Onboarding: Skip committing existing blocks [FAB-12579] Separate TLS listener for intra-cluster [FAB-13262] typo in configblock.go [FAB-13053] Add an UT to assert retransmission. [FAB-12949] Fix etcdraft reconfiguration UT [FAB-12729] Support subset of system channel OSNs [FAB-13150] Re-enable etcdraft for v2.0 development [FAB-13225] address code review comments [FAB-13057] Remove applied index check in storage [FAB-13199] Reduce etcdraft test time. [FAB-12949] finish reconfiguration after restart
-
Yacov Manevich authored
-
Kostas Christidis authored
-
Kostas Christidis authored
-
- 28 Feb, 2019 5 commits
-
-
Gari Singh authored
When peers communicate with other peers or orderers, the list of trusted CAs for TLS communication is derived from the channel configs. For the peer, the list is the aggregate of all roots across all channels. For the orderer, the list is per channel. This CR adds the option to specify a static list of CAs via peer.tls.serverRootCAs.files in core.yaml and a flag peer.deliveryclient.staticRootsEnabled for the deliveryclient to use. Note: the properties are intentionally not being added to the sample config because they should not be used in most situations. Fixes FAB-14420 Change-Id: Ic381dc99bbb6dc5f7ceafd93738b34c5e24fe60c Signed-off-by:
Gari Singh <gari.r.singh@gmail.com>
-
wenjian3 authored
- V1_4_FABTOKEN_EXPERIMENTAL is not used in any release - Make FabToken() always returns false in v1.4 Change-Id: I19ed0dcf88163c7aba3cb7d72c7c8f6b72b6b8c3 Signed-off-by:
Wenjian Qiao <wenjianq@gmail.com>
-
Saad Karim authored
There is a delay between the call to stop container, and the container getting removed from the docker network. Modified the test to poll the health check endpoint until the docker container is completely removed and the health check returns back the expected result. Enables the Couch DB health check test [FAB-14333] Change-Id: If6da40793e63378e6fd79e1d446b66ffeb72af72 Signed-off-by:
Saad Karim <skarim@us.ibm.com>
-
Gari Singh authored
-
Gari Singh authored
-
- 27 Feb, 2019 11 commits
-
-
Jay Guo authored
Instead of returning status several levels up, several methods in etcdraft chain can just set member var to store current state. Change-Id: I67612917bf3bb3225f1507c8b7376d730b18e9f4 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set adds an option to configure the block puller used for the replication with a maximum retry attempts. It is needed because during onboarding, a specific application channel might become unavailable, but it shouldn't block onboarding now when we have dynamic periodical onboarding for channels we were unable to join. Change-Id: I12f4247040c258809885f0e5fdc07d60914a56e2 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set refactors metadata updates by making them use a function that dictates how to handle consensus metadata. Change-Id: I3aa68e4b268a24887e4cba891e02ebce1a2ec65d Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Artem Barger authored
Currently there is a single instance of ledger shared between instance of chain mock in unit-tests. This commit introduces ledger instance per chain. Change-Id: I333fa2819490c995931a7e0d241eb6428e67c87e Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Artem Barger authored
Change-Id: Ib77c866a30ed5108ad53908b0ca25a60a89e9a7c Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Artem Barger authored
Change-Id: Ic17894d5eff66a195f93fcccacf2e3115587d7a5 Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Jay Guo authored
This CR rewrites BlockCreator so that it doesn't return nil block. BEFORE: blockcreator holds a channel of created blocks, which is buffered with size of createdBlocksBuffersize (default 20). It also stores the hash and number of latest block. When requested to create new block, blockcreator does so by assembling a block based on that hash and number, enque the block to buffered channel. If channel is full, a nil is returned. When commit a block, it drains the channel. If there's nothing in the channel, it implies the blockcreator is manipulated by raft follower, therefore blockreator simply updates hash and number. NOW: what we need is actually as simple as: a blockcreator holds the hash and number of latest block. When it is requested to create a block, it just uses that hash and number to assemble one. And ONLY raft leader holds a blockcreator. Followers blindly commit whatever comes from consensus. When a follower is elected as new leader, it simply looks up the ledger, find hash and number of latest block, and creates a new blockcreator. Change-Id: I226ee34d666fbb1e8d034dc22ea6800df993f7a4 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set makes cluster type OSNs autonomously detect channels that exist and that they should be part of (the channel configuration has their public credentials as a consenter for the channel), but that they do not run chains for, or have the blocks in their ledger. This can happen from several reasons: - The OSN is added to an existing chain, and since it didn't participate in the chain so far, it didn't get the blocks that tell it is now part of the channel. - The OSN tried to detect whether it is part of a channel, but it wasn't able, because all OSNs of the system channel returned service-unavailable. This can happen if: - a leader election takes place - the network is acting up so the leadership was lost - a channel has been deserted (all OSNs left it). To take care of such use cases, all OSNs now: - Track inactive chains that they know of, but they do not participate in - Periodically(*) probe the system channel OSNs to see if they are now part of these chains or not. - If so, then they replicate the chains, and create instances of them, and replace the instances of the inactive chains in the registrar with the new instances of type etcdraft. (*) - 10 seconds after boot, then after 20 seconds, then after 40 seconds, etc. etc. eventually- every 5 minutes. Change-Id: I3c2a84e6f4f402e011e7a895345b3d3982247083 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
yacovm authored
https://gerrit.hyperledger.org/r/#/c/28202/ Fixed a problem on MacOS but it seems that the error string that is returned from the operating system's system call differs on linux and Mac. This change set addresses this by making the panic error comparison look for a substring instead of a full comparison. Change-Id: Idf10bff7b4dde6009ce01bb83b7bd576be4df2b4 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
This CR removes the global leader var in etcdraft chain because it is racy in following case: several requests are to be enqued into submitC while leader loses its leadership. This also removes the lock on rpc.SendSubmit because it's guarded by the channel. Change-Id: If5e785e05dcf9bfc60e403f2d5813baf769ee103 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Change-Id: I9fe663c4efa46e5644571d238f4d7ea8f4e51626 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-