- 27 Feb, 2019 40 commits
-
-
Yoav Tock authored
Change the channelconfig of an Orderer to reflect the extension of ConsensusType: - MigrationState - MigrationContext Add a method to the bundle to validate the migartion steps of a new versus old config. Add test-cases to bundle_test.go to unit-test said method. Improved comments language. Needed to regenerate and update mocks in 'common' and 'blockcutter' packages for unit tests to build correctly. Change-Id: If060c05bcb9a0e0ca81b1f754a2b0e69a7f6c896 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
Yoav Tock authored
Allow a Kafak-based orderer to receive & process config transactions that change consensus-type - broadcast phase. In the orderer configuration proto-buff definition, extend ConsensusType to include: enum "Migration State" - to command & record the state of the migration per channel uint64 "Migration Context" - to correlate the system-channel config-update-tx with the following standard-channel config-update-tx(s) Change-Id: I121496499b3e4b6355a43843b49d3e039a65a987 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
yacovm authored
Per the gRPC documentation: If an RPC is issued but the channel is in TRANSIENT_FAILURE or SHUTDOWN states, the RPC is unable to be transmited promptly. By default, gRPC implementations SHOULD fail such RPCs immediately. This is known as "fail fast," but usage of the term is historical. However... RPCs SHOULD NOT fail as a result of the channel being in other states (CONNECTING, READY, or IDLE). Therefore, if it takes too much time for gRPC to move from a state of CONNECTING to TRANSIENT_FAILURE (i.e - packet drop, or DNS lookup failure) - it will slow down the entire Raft FSM. This change set makes Step RPCs inspect the underlying gRPC connection state prior to being invoked. If the connection is in state connecting, then it fails fast. Change-Id: I50df1f758a00fc99bed54ff1a2056f83f53efdf7 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
Change-Id: Ie755722f7db1df5efca5dea43d9912fdf36b6d25 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
The UT being fixed in this CR submits a malformed config env, which crashes test. It passed because we never wait for the block to be committed and shut down test early. Change-Id: I40311ccbf03d8ffb73d4467e5695a28b4834d61e Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set addresses code review comments from https://gerrit.hyperledger.org/r/#/c/28391/ and from https://gerrit.hyperledger.org/r/#/c/28385/ , in an attempt to make the orderer code more idiomatic. Change-Id: I04ac7bc21ee8fc1ccda4e76d8afa53fe527f7f5e Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
yacovm authored
This change set enables the max retry logic for onboarding: - Adds a new configuration parameter to orderer.yaml. - Adds the appropriate configuration to the production code. - Adds a cross-package unit test that simulates the scenario for which the retry logic was made: An application channel is listed in the system channel, but as we try to pull it, we fail until we exhaust our retry count. We nevertheless - commit the genesis block for that channel, and proceed with the replication. Change-Id: I28204f3c1ec0f99dd4d510ed7c9f4ae94759cba2 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Yoav Tock authored
Add orderer capability V2_0 for Kafka2RaftMigration This capability defines whether the orderer supports a kafka to raft migration. A kafka-based Ordering Service Node requires this in order to receive and process a config update with consensus-type migration commands. Migration is supported from Kafka to Raft only. If not present, these config updates will be rejected. Change-Id: I3b56dec21f0893d0b1df5db30973b4762aab5575 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
yacovm authored
This change set adds an option to configure the block puller used for the replication with a maximum retry attempts. It is needed because during onboarding, a specific application channel might become unavailable, but it shouldn't block onboarding now when we have dynamic periodical onboarding for channels we were unable to join. Change-Id: I12f4247040c258809885f0e5fdc07d60914a56e2 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set makes cluster type OSNs autonomously detect channels that exist and that they should be part of (the channel configuration has their public credentials as a consenter for the channel), but that they do not run chains for, or have the blocks in their ledger. This can happen from several reasons: - The OSN is added to an existing chain, and since it didn't participate in the chain so far, it didn't get the blocks that tell it is now part of the channel. - The OSN tried to detect whether it is part of a channel, but it wasn't able, because all OSNs of the system channel returned service-unavailable. This can happen if: - a leader election takes place - the network is acting up so the leadership was lost - a channel has been deserted (all OSNs left it). To take care of such use cases, all OSNs now: - Track inactive chains that they know of, but they do not participate in - Periodically(*) probe the system channel OSNs to see if they are now part of these chains or not. - If so, then they replicate the chains, and create instances of them, and replace the instances of the inactive chains in the registrar with the new instances of type etcdraft. (*) - 10 seconds after boot, then after 20 seconds, then after 40 seconds, etc. etc. eventually- every 5 minutes. Change-Id: I3c2a84e6f4f402e011e7a895345b3d3982247083 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Jay Guo authored
Change-Id: I9fe663c4efa46e5644571d238f4d7ea8f4e51626 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
When joining a fresh node to existing etcdraft cluster, it should be using empty peer list to call `StartNode`. Change-Id: Ib6acf6fd9b2956680c99d5d7370ce439228d3bfa Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set refactors the onboarding infrastructure to support multiple use of it, in contrast to the current logic which can only be used once. Namely, it re-uses the current logic and introduces a new method to the replicationInitiator: replicateChains(lastConfigBlock *common.Block, chains []string) Which forces replication of the given chain names, with the given last config block. The chains slice is passed to the cluster.Replicator as a filter which prevents pulling chains that aren't among the chains slice. Change-Id: I3331f2abb6a2879876644b2f5ef4ee48c4eb43fa Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
An orderer might not have permission to try and probe whether it belongs to a certain application channel. In addition, since the OSNs of an application channel might be a subset of the system channel OSNs, they may be unreachable at the time of onboarding, so all we will get from other OSNs is "service unavailable". This change set addresses this by making that if we try to pull blocks in order to see whether we belong to the channel (by pulling the latest block) and we only bad responses from all OSNs that say: un-authorized, not available, we don't panic. Instead we just skip pulling the chain. If some orderer returns unauthorized, and the rest either not return anything, or return a bad request, unavailable, etc. - we return that we are unauthorized. If some orderer returns service unavailable, and the rest return anything that is not unauthorized, then we classify it as service unavailable. If no orderer returns unauthorized/unavailable, and all orderers return something bad or not return anything at all - we now panic as before, because it means we probably misconfigured the node, or we are in a network partition so we don't want to skip pulling blocks. This change set also enchances the reconfiguration integration test to include a third channel for which the onboarded OSN is not authorized. Change-Id: I6f9b0cfe3671794ef1c036b432e77e2ac55b1efd Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
The reconfiguration and onboarding integration tests ensures that the OSNs stop logging errors at the end of the test, in order to ensure there aren't any not noticed faults that occurred due to reconfiguration/onboarding. The function used the wrong method to obtain a buffer that is used to read the process's output. Change-Id: Ieadae1bb083454b195cbfe52b41582dc9dbbf80a Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
There was a redundant sending of a block, which got the client and the mock server out of sync occasionally. Ran the test 2,000 times and it didn't fail afterwards. Change-Id: I09631632a16d3ee42fc51fbb809f3027e50a0973 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
Sometimes the machine that runs the test, is stuck for too long, and it makes the client side time out, thus the server side mock gets out of sync with the tested client side (the replication code). A way to fix it, is to specify a really big timeout (1 hour), and in cases where we expect the server not to respond - instead of making it time out, we make it send an EOF downstream, which will indicate a failure to the client. Change-Id: I06ed5bd4a645ae8ace90542fe56d19254bfa42b7 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
This CR removes a redundant assertion in etcdraft UT to avoid flakiness. The assertion being removed checks that expected number of MsgApp are dropped since node is disconnected from network. However, when a raft candidate is elected as leader, it broadcasts MsgApp containing empty data to followers, and sometimes this is being counted as part of dropped MsgApp, which causes unmatched expectation. We could be more precise by inspecting MsgApp on wire, and neglect empty messages, so we still perform this assertion. But it is anyway a redundant check, therefore can be safely removed. This CR also further reduces total test time. Change-Id: I74af9fecfebe20e44c6736a644352f8b67b624e3 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set makes AddConsenter and RemoveConsenter use a consensus specific method UpdateEtcdRaftMetadata instead of the generic UpdateConsensusMetadata one, to remove code duplication. It also addresses a few nits in etcdraft_reconfig_test. Change-Id: I86d50fd80d4985df77474c054ce916f0d2fb62e7 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
In raft, to commit a message, leader firstly broadcasts MsgApp to replicate data, and once it receives MsgAppResp from quorum, it - applies message to state machine - broadcasts another MsgApp to instruct followers to commit and these are done in parallel. To emulate situation where a node is disconnected *after* config block is committed and *before* node add/remove in that config block is proposed to raft, `WriteConfigBlock` stub on leader was overloaded to disconnect node from the network, so that ConfChange is proposed but dropped. However this becomes racy when we want to assert the config block is committed on followers as well, since we might disconnect leader too fast, and cause the second round of MsgApp to be dropped, so that config block is not committed on followers. This is fine in real case because new leader will continue the effort to commit this block. To circumvent this flakiness, this CR changes UT to overload `StepStub` instead of `WriteConfigBlockStub`, so that we can be more precise on when to disconnect a node from network. Change-Id: Ic1b7d28c043e779c7cc258c2e08bfaa3578bc429 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
Currently (before this CR), an onboarded OSN doesn't pull any blocks for channels it doesn't participate in. Aas a result, when the OSN starts up after onboarding - it doesn't have these channels in its registrar, and therefore may classify channel creation transactions differently than its fellow OSNs that do have the channels it didn't pull. In order to avoid a state fork, this change set makes the OSN to commit the genesis block for channels it doesn't participate in. This is *NOT* done by pulling the genesis blocks, since the OSN may not have permissions to do that in the first place, but instead - it creates a genesis block from the system channel block that has the channel creation transaction. This change set also changes the integration test for onboarding to adjust to the changes, namely - ensures the OSN committed the genesis block for a channel it doesn't participate in, and upon Broadcast, returns an answer stating it doesn't participate in the channel. Also, it reduces the run time for the integration test to 50s. Change-Id: Icf5754df6cedb7725c4d7091c7366ce0b17ff1b7 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
yacovm authored
This change set adds an integration test for etcdraft orderers which: 1) Spawns 3 OSNs of type etcdraft. 2) Rotates their TLS certificates. 3) Spawns a fourth OSN of type etcdraft. 4) Gives it the last config block of the system channel. 5) Ensures it sync with the channels it needs. 6) Ensures it doesn't sync with the channels it doesn't need. 7) Ensures it doesn't log errors to the logs of the orderers. Change-Id: I7f4cb1b6d841f51aae9f091da80797d1bac3df99 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set makes orderer config updates use orderer credentials. This is needed when we want to update the system channel, since we cannot pull blocks from the system channel with peer credentials. Change-Id: Ic5f5749f7ec3e5ee7012b7a9f1d764826608b7d4 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
yacovm authored
This change set refactors metadata updates by making them use a function that dictates how to handle consensus metadata. Change-Id: I3aa68e4b268a24887e4cba891e02ebce1a2ec65d Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
https://gerrit.hyperledger.org/r/#/c/28202/ Fixed a problem on MacOS but it seems that the error string that is returned from the operating system's system call differs on linux and Mac. This change set addresses this by making the panic error comparison look for a substring instead of a full comparison. Change-Id: Idf10bff7b4dde6009ce01bb83b7bd576be4df2b4 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set adds an ability to call "cryptogen extend" in integration tests. Change-Id: I5db2adbdb1260bf47da33ad1b5df9022a8fb1c95 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set removes the need for channels to be expected to have at least 1 block other than the genesis block. Change-Id: If1ce27b2bb703bd308ae356ea6fe6d6736a2dc99 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change sets renames the GetConfigBlock to GetConfig, to fit what it actually returns. Change-Id: Ica00d5e6dab91852767c1c4fd1d8af0454bd1bd5 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
Change-Id: I85cb5b5cc633f27fa4f317bd2b0f6d947fd97a6d Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
Sometimes, 100ms isn't enough for the RPC call to reach the server, and therefore the request times out before it invokes the mock. Increased the timeout to 1 second, and tested this in CI for more than 1,000 iterations in a custom CR: https://gerrit.hyperledger.org/r/#/c/28258/ Change-Id: I01976373dcb66f652016f8916c5c86027e21f0d4 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
Change-Id: Icd364101a350ab6fa2959f196f346c717a16337f Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
Some people have port 5000 used in MacOS. When they try to run unit tests locally, the test TestConfigureClusterListener fails due to the port being already in use, which changes the actual error returned, and as a result - the test fails because the error string doesn't match what the test expects it to be. Fixed by selecting a random un-used port. Change-Id: I687ae38b8c0200e8fb899a519611b50e19cafe08 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
If we onboard a new OSN, and crash in the middle- we start with a ledger that has some blocks, in some channels. The onboarding phase, has to fetch all blocks in order to verify the hash chain and verify the blocks, but we should skip committing blocks that we already committed before. This change set implements that, and adjusts the tests accordingly. Change-Id: I9ffc56cd121e11ddfe675894ae9378fd52157dbe Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set, adds an option for a separate TLS listener for intra-cluster communication. Change-Id: I059e4d45ddeaf066017c758b83a3e7422783a403 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
yacovm authored
whena --> when Change-Id: I24dd0fb741849293e13db5c1bce9cce3c27934ef Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
This CR adds an UT to attest retransmission of etcd/raft MsgApp. Change-Id: Ic06003ae7da9dc1dcc991103e6748b55a47f04dc Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Change-Id: I64c232fe6433b9f6ca0dadbabfcc771f3f2c623f Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set changes etcdraft orderer node logic to participate in a subset of the channels in the network. Previously, before this change set - when an orderer can't find its certificate in the channel config - it panics. This is not a wanted behavior for application channels, as we might want only a subset of the nodes that belong to the system channel, to participate in a specific application channel. After this change set, if an OSN finds it doesn't need to participate in a channel - it simply presents to clients an "inactive channel". This is needed, because channel creation transactions are classified by whether the channel is known and registered in the registrar, thus - if we are to not instantiate any channel object - some nodes would act differently than other nodes with respect to classification of target channel at the time of dispatching the transactions. Change-Id: I254d270135fb4d9a105d04de893ae8ccdd13f0d7 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set re-enables etcdraft to resume v2.0 development after v1.4 branch is cut. Change-Id: I384247a9de2763a207bbde4fa8e519d703241ad5 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Artem Barger authored
This commit address code review comments left from review of CR related to FAB-12552. Change-Id: I04635d36d7076cf89d43f42aa5351692b8d1782e Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-