- 03 Mar, 2019 37 commits
-
-
Jay Guo authored
`checkPeers` should be supplied to `nwo.InstantiateChaincode` if called separately, so that checks are performed to ensure the instantiation. Otherwise, an immediate query/invoke following this would fail, because that deploy tx may not be committed yet. Change-Id: I8e870b183c279aca53961745031fdee7085efe18 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR adds more debug logs to etcdraft chain to facilitate debugging. Change-Id: I2d70869bc8823babb3ab50782bd4472637ed5820 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set fixes a null pointer panic that happens when onboarding a node to a channel that has no application channels, and it only has 2 blocks in the system channel. The null pointer panic happens because the LastConfigBlock index is 1, and the previous index (0) was never pulled, so previous block passed into the VerifyBlocks method is nil. Although the genesis block of the system channel - cannot be verified by the block puller in general - it can be verified by the block puller that is used for listing the channels, because it doesn't perform signature checks on system channel blocks, and instead - uses backward hash chain verification using the bootstrap block. Change-Id: I5aaaffa79da637463da1689b1c6167e586f64f44 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Yoav Tock authored
This is the second of four (2/4) sub-tasks that focus on the "green" path of consensus-type migration from Kafka to Raft. By "green" we mean that there are no failures or aborts along the way. The 4 sub-tasks are staged in a way that minimizes dependencies between them. In this sub-task we introduce a new package: orderer/consensus/migration. In this package we introduce the migration Status, which hold the state and context of each chain, and the migration Stepper, which enforces the migration state machine in the Kafka chain, post-ordering (i.e. after messages are consumed from Kafka). It also defines the interface of the migration Controller which is implemented by the Registrar. See respective JIRA item for further details. Change-Id: Ifaad95bcf3bc8fb889ab5f65e817bfe1ebdfa771 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
Jay Guo authored
After reconnect the leader to network, we tick it to resend previous data. However, if it's ticked too excessively, it may step down to follower due to `CheckQuorum` being enabled. This causes the test to fail. Instead of ticking it, this CR changes one test to simply enque one more tx to trigger resend of previous data. Change-Id: I211c7bf59dc6322509336ed8b120d869ea1f42f6 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Instead of taking snapshot every N blocks, this CR changes it to taking snapshot every N bytes. This also sets default SnapshotInterval to 100MB, if it's unset. Otherwise data in memory is never compacted till OOM. Meanwhile, DefaultSnapshotCatchUpEntries is shrunk so it does not take too much space to preserve extra entries every time a snapshot is taken. Slow nodes are catching up using blockpuller, which is also efficient. Change-Id: I79cfeb8652fcbafdeb5793bf4f06267b95a858d6 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
The function is no longer used. Change-Id: Ic19762fd1cfa0d691816220e74977ba9efcce366 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set makes consensus messages be sent asynchronously, over a buffered channel with a size of 10. Consensus messages are dropped when the buffer overflows, and Submit messages are blocking on the buffer. Without this change set, the sending of large messages takes milliseconds, while with the change set it takes micro-seconds. Change-Id: Id60b05b96eed6d9d04f89b8967945b18ddfbef94 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set makes the orderer validate that the bootstrap block it is spawned with (or created from configtx.yaml), contains a ConsortiumsConfig. Change-Id: I26abf8ac8719fbb472351b036137debc7a911665 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
This makes assertion failure more debuggable. Change-Id: I66f8ac8c9b755eaab37f89a10a39c3bfa44ef39a Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
When a snapshot is taken, stale etcdraft WAL files should be purged to free disk space, as well as old snapshot files. However, we still keep several snapshot files around, in case the latest file is corrupted, etcdraft will automatically load an older one, until there's none left. Change-Id: I2b8168dbc0c3e5bd56a081c104dd7dc9defbcd92 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set removes Step RPC from the cluster protobuf, and renames Submit stream to a Step stream, and makes both transaction forwarding and consensus messages use the new Step stream. It also makes both egress Send() and Recv(), have a maximum timeout (the RPC timeout in the config). A Send or Recv that is used to send a consensus message, or send (receive) a transaction (status) will now abort prematurely in order to protect against any liveness issue on the remote node, and also to return an answer to clients within a timely manner. Change-Id: Id942b248212f5c324e12af34fce48f96fdbb6aea Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
The integration test that checks that an orderer is evicted from a channel and stops its service for the channel has broken due to: 1) A removed log message that it used as an indicator was removed in a parallel CR. 2) In another parallel CR, the communication layer now puts message into the log asynchronously and doesn't block - and as a result - a node might be evicted from the channel but the other nodes will close the connection to it before it has a chance of obtaining the block that evicts it from the channel. For (1) - the message that no longer exists was removed from the test. For (2) - the node that is removed is now always the leader, and this way it always gets the block update (because it sends it in the first place). Change-Id: Ib67d1a448447ef44d9b41f52c8ee8bddb6b064ce Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set adds an integration test that removes an OSN from an application channel and system channel and ensures that the OSN gracefully shuts down for these channels. Change-Id: Idcdad8083f5881c6194185ad5f623c9c64323a02 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
If something in AfterEach hangs till test timeout, the coredump produced might covers up the actual assertion failure in test body. This CR fixes this in etcdraft UT. This CR also adds LongEventualTimeout to some places in etcdraft chain_test to prevent flakes due to slow wal sync. Change-Id: I585e59e5eb587f9e9eb082c5eb8f681141b16e55 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Yoav Tock authored
This is the first of four (1/4) sub-tasks that focus on the "green" path of consensus-type migration from Kafka to Raft. By "green" we mean that there are no failures or aborts along the way. The flow of the green path and the changes made in these 4 tasks are described below. The 4 sub-tasks are staged in a way that minimizes dependencies between them. In this sub-task we introduce changes to the orderer/common/bootstrap package (see details below). In essence, Just before the last config block of the system channel (COMMIT) is written to the ledger, the bootstrap file (a.k.a "genesis.block", do not confuse with the first block of the ledger) is swapped with the last block of the system channel. This sub-task extends package orderer/common/bootstrap to support this functionality. See respective JIRA item for further details. The "green" path for migration is the following: 1. Start with a Kafka-based ordering service 2. Send a config update tx (START-TX) on the system channel that: - Has ConsensusType.MigrationState=START - This will disable the creation of new channels - This will disable the processing of normal (standard channel) transactions 3. Wait until the START-TX is committed and get the block height H of that tx 4. Send a config update tx (CONTEXT-TX) on each of the standard channels that: - Has ConsensusType.MigrationState=CONTEXT - Has ConsensusType.MigrationContext=H - Has ConsensusType.Type="etcdraft" - Has ConsensusType.Metadata=<a marshaled etcdraft metadata: Consenters, Options, etc> 5. Send a config update tx (COMMIT-TX) on the system channel that: - Has ConsensusType.MigrationState=COMMIT - Has ConsensusType.MigrationContext=H - Has ConsensusType.Type="etcdraft" - Has ConsensusType.Metadata=<a marshaled etcdraft metadata: Consenters, Options, etc> - The metadata should be the same as for the standard channels, with the same precautions. - If committed successfully, no further configuration will be possible 6. Restart each orderer - The orderer will bootstrap into an etcdraft mode - Each channel will form a cluster - Normal transactions can resume now 7. In order to configure the channels (system or standard), make sure that the first config update tx (on any given channel) after migration has: - Has ConsensusType.MigrationState=NONE - Has ConsensusType.MigrationS=NONE - In addition to other changes to the channel's config. Change-Id: Iccd146bb7260bafa4e4d8c4ee457d2ac19f5a642 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
Jay Guo authored
The creation of BlockPuller takes latest certificates, therefore should be done on-demand to guarantee its validity. Change-Id: I327275da495a85126feb58c84b460bed98f7b860 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Every failed attempt to send Step request is logged at ERROR level, which pollutes etcdraft orderer leader logs when a follower is down. This CR changes it to log at DEBUG level, except for the first failed attempt and the first successful delivery after failure(s). Test done: manually run cft integration test and inspect logs. Change-Id: I1dd3468de1f6745f658c15e83a5e644e0b0492d6 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR puts Raft snapshotting into a go routine to avoid excessive snapshotting due to extreme small SnapshotInterval. This is also preperation for WAL files pruning. Change-Id: Ib43a2197c533bdc224a4bc52ff6cb418b62a0c33 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR hopefully makes occasional UT timeout more debuggable. Also, it fixes a go routine leak in UT. Change-Id: Ia0ac63b2394061dd13a570d71eae6b4139fb73b0 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Change-Id: Ie80ff2f11de59a216a94fc61330f9d625ed16e59 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
When CheckQuorum is enabled, leader steps down if it cannot reach the quorum of network, so that clients have a chance to disconnect and try other nodes. Change-Id: I901c0e3009f9d354a2b504fe16174432345055b3 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
In etcdraft UT, we often need to deterministically elect a leader. This was done by ticking ONLY one node in the network, so it is the only node that start campaign. HOWEVER, there are several problems with this approach: 1. it's slow. We need real time interval between ticks due to the way fake clock is implemented: it drops tick on the floor in case of slow consumer. 2. there is random factor in election timeout of etcd/raft. It is calculated as follow: ``` randomElectionTimeout = electionTimeout + rand.Intn(electionTimeout) ``` in another word, if we send electionTimeout ticks, it's not guaranteed to trigger a leader election 3. if CheckQuorum is enabled, a lease is imposed on follower nodes which gets expired if electionTimeout <= elapsedTicks < randomElectionTimeout (if it's greater than randomElectionTimeout, it's reset to 0 and node starts campaign) In this CR, we send an artificial MsgTimeoutNow to the node to be elected. This message reliably triggers campaign and skip the lease check. This CR also fixes several potential data race and flakes in tests. Change-Id: I3c8e0bcadbb8cfa1ae3393de2ea711fdd0d8b7aa Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Test may query chaincode too fast after invocation, before block is actually committed. Change-Id: I4159fb2dfb31310eccfd64fcb9a9a99ceef54db0 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Increase default etcdraft tick interval to 500ms, for several reasons: - in a WAN/Cloud environment, this is more realistic - WAL sync in CI often exceeds 1s, which causes heartbeats not being sent timely. Increasing election timeout can decrease the chance of unexpected leader failover. This CR also increases default timeout of peer cmd, because now it takes 5~10s to elect a leader for a newly created channel, and `peer channel create` can only retrieve genesis block of that channel when leader exists (Deliver API returns error if leaderless). Note that this value is still configurable by users. Change-Id: I94fbbc750fa096cce6ef9e2d65eb981c6202b675 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Yoav Tock authored
This task addresses two issues: 1) In common/multichannel/Bundle.ValidateNew() it is possible to identify the system channel using: _, isSys := b.ConsortiumsConfig() this can be used to refine validateMigrationStep() such that it deals more accurately with the migration-state transitions on the system vs. standard channels. 2) In addition, prevent user from adding ConsortiumsConfig() to standard channels. This will protect multichannel.Registrar from blowing up on next initialization. Explanation: - Looking at the code in multichannel.Registrar, we see that _, ok := ledgerResources.ConsortiumsConfig() is used to identify the system channel. If two system channels are identified, the code panics. - Now, in Bundle.ValidateNew(), currently there is no mechanism to prevent a user (orderer admin) from adding a ConsortiumsConfig() to a standard channel. If a user does that, multichannel.Registrar will blow up in the next initialization. Change-Id: Ia7551cbd27389a9988757af0224abdc0d1bfef5b Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
Yoav Tock authored
Update documentation of ConsensusType in protos/orderer/configuration.proto to reflect implementation. - spell out permitted type strings: "solo" / "kafka" / "etcdraft" - update migration_state for which messages are permitted on system / standard channels - update migration_context for what is required on each migration_state - make protos Change-Id: Ia27d9cd162fe6656fd2bd56ceaf5aae8a6fe5222 Signed-off-by:
Yoav Tock <tock@il.ibm.com>
-
yacovm authored
This change set addresses code review comments for FAB-13363. Namely: - Deletion of the redundant logger instance in the server main.go Change-Id: I0ee0db21894a352c7d1679efc27524162723a895 Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
yacovm authored
This change set connects the block verification infrastructure for onboarding to the production code. Now, whenever an orderer onboards a channel - it also verifies the blocks of the application channels, by: 1) Creating a bundle from the genesis block, which is derived from the system channel (which is verified using backward hash chain validation). 2) Verifying blocks using the bundle. 3) Replacing the bundle with a new bundle whenever a config block is pulled. It also adds a check in the integration test, that ensures that no errors are reported in the log of the onboarded OSN. Change-Id: I3c5714f9d4491cdfd78e4e47407925136906d413 Signed-off-by:
yacovm <yacovm@il.ibm.com> Signed-off-by:
Artem Barger <bartem@il.ibm.com>
-
Jay Guo authored
When gRPC buffer of `Submit` stream is full, `SendSubmit` would block, which freezes the `serveRequest` go routine. This CR moves this out of go routine, and clients should be blocked on waiting for room in buffer. Change-Id: I62cd261b9419bd8df3fa1bfaeff14551168d2e65 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
yacovm authored
This change set adds the following supporting structs for adding support for verifying blocks pulled by onboarding in future CRs: - Ledger interceptor: intercepts a commit of a block, and invokes a callback. - VerificationRegistry: tracks commit of config blocks, and builds channelconfig bundles from them, in order to support verification of blocks pulled. - BlockVerifierAssembler and BlockValidationPolicyVerifier: together they build block verifiers out of config blocks. - verifierLoader: Loads a mapping of chainID->cluster.BlockVerifier, which is to be used at OSN startup to preload the existing verifiers. It is needed in cases we recover from a crash, or if we do dynamic onboarding and the previous config blocks have been committed to the ledger before the OSN was started. In the next CR, I will wire all these into the onboarding infrastructure itself, and they will be used to hold the latest bundle per channel in order to verify block signatures. Change-Id: Ic9fc99243baa5c2cef97103d001180207414d98a Signed-off-by:
yacovm <yacovm@il.ibm.com>
-
Jay Guo authored
If there are MaxInflightMsgs blocks proposed but not committed, chain blocks further incoming requests. Change-Id: I58c84e23c882ccc152e5c9a248434e466a8b5266 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR changes Errored to return a channel that is closed when node becomes candidate. Change-Id: Ibd0ece763b9d93c4da93825d1b302ecc55a9b32e Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Store raft SoftState in raft chain so it returns error while election is ongoing. This prevents a disconnected follower from returning success on Broadcast API. Change-Id: Ib6619b230938f0d6c10240b8cd8e34e346056145 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
This CR changes type of etcdraft observe channel from uint64 to raft.SoftState, so that chain_test can assert not only leader id, but also the state of node. Change-Id: Ia0c5f8c9060c234ceb84133e0c5598ed064dd1ee Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Add a lock to guard manipulation of `StepStub`. Change-Id: Icaadb1f5aea0cb7f266f24ed6756c4f6541768bd Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
Jay Guo authored
Newly elected raft leader should wait for in flight blocks to be committed, before accepting new envelopes and creating new blocks. Otherwise all those blocks created would be uncle blocks and we don't permit this situation in Fabric. Change-Id: Ia5adac185263735eace1fc805ebea0f5c98b2fb1 Signed-off-by:
Jay Guo <guojiannan1101@gmail.com>
-
- 02 Mar, 2019 1 commit
-
-
David Enyeart authored
Add a hint about root cause - application capability V1_3 required. Change-Id: I4c72c4972f31732dff2e6aadd303853d5a1c79e7 Signed-off-by:
David Enyeart <enyeart@us.ibm.com>
-
- 01 Mar, 2019 2 commits
-
-
Jason Yellick authored
-
Yacov Manevich authored
* changes: [FAB-13447] Streamline the code [FAB-13178] A dumb version of etcdraft BlockCreator [FAB-13178] Remove global leader var in etcdraft chain [FAB-13178] Move raft logic to its own file [FAB-13178] do not accept new env when conf in flight [FAB-13178] Refactor etcdraft chain to avoid sync [FAB-13694] Move LastConfigBlock to orderer common [FAB-13698] disable flaky test TestReconnect [FAB-13643] Leader crash and failover integration test FAB-13265 migration status in channelconfig FAB-12984 consensus migration protos [FAB-13633] Make Step RPC failures non blocking [FAB-13178] Simplify the proposition of config block [FAB-11996] Fix failed UT [FAB-13481] Make onboarding code more idiomatic [FAB-13495] Activate onboarding max retries FAB-12983 capability V2_0 for Kafka2RaftMigration
-