论文标题
PAXOS中非恶性任意容错性的分散验证
Decentralized Validation for Non-malicious Arbitrary Fault Tolerance in Paxos
论文作者
论文摘要
容忍性分布式系统具有很高的可靠性,因为即使发生故障,它们也不会表现出错误的行为。根据所采用的故障模型,通常不容忍不会导致过程崩溃的硬件和软件错误。为了容忍这些相当普遍的失败,通常的解决方案是采用更强大的断层模型,例如任意或拜占庭断层模型。但是,为此故障模型创建的算法比为不太严格的故障模型开发的算法要复杂得多,并且需要更多的系统资源。到达中间立场的一种方法是非恶意的任意断层模型。该模型假设可以使用给定概率检测和过滤故障,如果这些故障不是以恶意意图产生的,则可以将这些故障隔离和映射到良性故障。在本文中,我们描述了如何使用基本类型的分布式验证来递增非微型故障模型中主动复制的实现,其中与预期算法行为的偏差将使过程崩溃。我们使用故障注入框架对实验进行了实验评估该实现,这表明将非恶化故障的概念扩展到硬件故障之外是可行的。
Fault-tolerant distributed systems offer high reliability because even if faults in their components occur, they do not exhibit erroneous behavior. Depending on the fault model adopted, hardware and software errors that do not result in a process crashing are usually not tolerated. To tolerate these rather common failures the usual solution is to adopt a stronger fault model, such as the arbitrary or Byzantine fault model. Algorithms created for this fault model, however, are considerably more complex and require more system resources than the ones developed for less strict fault models. One approach to reach a middle ground is the non-malicious arbitrary fault model. This model assumes it is possible to detect and filter faults with a given probability, if these faults are not created with malicious intent, allowing the isolation and mapping of these faults to benign faults. In this paper we describe how we incremented an implementation of active replication in the non-malicious fault model with a basic type of distributed validation, where a deviation from the expected algorithm behavior will make a process crash. We experimentally evaluate this implementation using a fault injection framework showing that it is feasible to extend the concept of non-malicious failures beyond hardware failures.