论文标题
信念依赖性:在数据位中利用原子线性来重新思考广义线性模型
BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models
论文作者
论文摘要
两个线性不相关的二进制变量也必须是独立的,因为非线性依赖性不能仅以两个可能的状态表现出来。这种固有的线性是构成任何复杂形式关系形式的原子。受到这一观察的启发,我们开发了一个称为二进制扩展线性效应(信念)的框架,以理解与二进制结果的任意关系。信念框架的模型很容易解释,因为它们描述了线性模型语言中二进制变量的关联,从而产生了方便的理论洞察力和引人注目的高斯相似之处。有了信念,人们可以通过透明的线性模型研究广义线性模型(GLM),从而洞悉链接的选择如何影响建模。例如,将GLM相互作用系数设置为零并不一定会导致其线性模型对应物中所理解的那种无相互作用模型的假设。此外,对于二进制响应,GLM的最大似然估计在完全分离的情况下矛盾地失败,而数据最为歧视,而信念估计会自动揭示数据中导致完全分离的数据中的完美预测指标。我们探索这些现象并提供相关的理论结果。我们还提供了一些理论结果的初步经验证明。
Two linearly uncorrelated binary variables must be also independent because non-linear dependence cannot manifest with only two possible states. This inherent linearity is the atom of dependency constituting any complex form of relationship. Inspired by this observation, we develop a framework called binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary outcome. Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models, yielding convenient theoretical insight and striking Gaussian parallels. With BELIEF, one may study generalized linear models (GLM) through transparent linear models, providing insight into how the choice of link affects modeling. For example, setting a GLM interaction coefficient to zero does not necessarily lead to the kind of no-interaction model assumption as understood under their linear model counterparts. Furthermore, for a binary response, maximum likelihood estimation for GLMs paradoxically fails under complete separation, when the data are most discriminative, whereas BELIEF estimation automatically reveals the perfect predictor in the data that is responsible for complete separation. We explore these phenomena and provide related theoretical results. We also provide preliminary empirical demonstration of some theoretical results.