论文标题
欺骗哈希的技巧:countssketch的鲁棒性的紧密下限以自适应输入
Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs
论文作者
论文摘要
Countsketch和功能哈希(“ Hashing Trick”)是流行的随机尺寸缩小方法,可支持$ \ ell_2 $ -2 $ -HEAVY击球手的恢复(键$ i $ whene $ v_i^2>ε\ | \ boldsymbol {v} \ | _2^2^$)和近似内部产品。当输入为{\ em不自适应}(不依赖于先前的输出)时,应用于尺寸$ o(\ ell/ε)$草图的经典估计器对于在$ \ ell $中指数的许多查询中都是准确的。但是,当输入是自适应的时,可以在$ o(\ ell)$ QUERIES中使用经典估算器进行构建对抗输入,而最知名的稳健估计器仅支持$ \ tilde {o}(\ ell^2)$查询。在这项工作中,我们表明,这种二次依赖性在某种意义上是固有的:我们设计了一种攻击,$ O(\ ell^2)$ Queries会产生一个对抗性输入向量,其草图具有很大的偏差。我们的攻击使用“天然”非自适应输入(仅选择最终的对抗输入),并普遍适用任何正确的估计器,包括攻击者未知的估计器。在此,我们暴露了这种基本方法的固有脆弱性。
CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $\ell_2$-heavy hitters (keys $i$ where $v_i^2 > ε\|\boldsymbol{v}\|_2^2$) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O(\ell/ε)$ are accurate for a number of queries that is exponential in $\ell$. When inputs are adaptive, however, an adversarial input can be constructed after $O(\ell)$ queries with the classic estimator and the best known robust estimator only supports $\tilde{O}(\ell^2)$ queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after $O(\ell^2)$ queries produces an adversarial input vector whose sketch is highly biased. Our attack uses "natural" non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method.