主题:Optimal decorrelated score subsampling for generalized linear models with massive data 时间:2023年5月11号 10:00-11:30 地点:腾讯会议:748-682-688 主持人:姜荣 教授 报告人简介: 王磊,南开大学统计与数据科学学院副研究员,博士生导师。研究方向是复杂数据分析和统计学习,已在Biometrika、SCIENCE CHINA Mathematics、Bernoulli、Statistica Sinica等统计学杂志发表学术论文50多篇,主持3项国家自然科学基金和1项天津市自然科学基金项目。 讲座简介: In this paper, we consider a unified optimal subsampling estimation and inference on lowdimensional parameter of main interest in the presence of nuisance parameter for low/high-dimensional generalized linear models (GLMs) with massive data.We first present a general subsampling decorrelated score function to reduce the influence of the less accurate nuisance parameter estimation with slow convergence rate. The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelated score subsampling algorithm are established, and two optimal subsampling probabilities are derived under the A- and L-optimality criteria to downsize the data volume and reduce the computational burden. The proposed optimal subsampling probabilities provably improve the asymptotic efficiency upon the subsampling schemes in the lowdimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs. A two-step algorithm is further proposed to implement and the asymptotic properties of the corresponding estimators are also given. Simulations show satisfactory performance of the proposed estimators, and two applications to census income and Fashion-MNIST datasets also demonstrate its practical applicability. 关于活动获得“第二课堂学分”的说明(线上): ①腾讯会议:进入腾讯会议后更改自己昵称备注为学号+姓名 ②讲座开始后 将在任意两个时段由工作人员记录信息,进行比对审核,成功匹配的计算第二课堂积分。 ③请同学们全程参与讲座,不可中途来回进出。聆听讲座时确保自己的昵称更改为要求格式,否则最终审核不通过,将无法获得第二课堂积分。