关于xtdpdsys和xtabond2

关键词：stata xtabond、xtabond2、 stata xtabond2、stata xtabond2命令

是检验扰动项的差分是否存在一阶与二阶自相关，以保证GMM的一致估计，一般而言扰动项的差分会存在一阶自相关，因为是动态面板数据，但若不存在二阶自相关或更高阶的自相关，则接受原假设“扰动项无自相关”。

Description

Linear dynamic panel-data models include p lags of the dependent variable as covariates and contain unobserved panel-level effects, fixed or random. By construction, the unobserved panel-level effects are correlated with the lagged dependent variables, making standard estimators inconsistent. Arellano and Bond (1991) derived a consistent generalized method-of-moments (GMM) estimator for the parameters of this model; xtabond implements this estimator.

This estimator is designed for datasets with many panels and few periods, and it requires that there be no autocorrelation in the idiosyncratic errors. For a related estimator that uses additional moment conditions, but still requires no autocorrelation in the idiosyncratic errors, see [XT] xtdpdsys. For estimators that allow for some autocorrelation in the idiosyncratic errors, at the cost of a more complicated syntax, see [XT] xtdpd.

1. xtdpdsys是stata10以后官方发布的命令，语法格式更为简洁；而xtabond2则是Roodman(2009)发布的个人编写的命令，语法格式较为繁复。

2. xtdpdsys可以通过pre()选项将部分解释变量设定为predetermined（前定变量），亦可通过endog()选项将部分解释变量设定为内生变量；而xtabond2则只能通过gmm()选项将部分解释变量设定为内生变量，并未能支持前定变量的设定；

3. xtdpdsys执行后无法直接报告sargan统计量和AR2统计量（需要进一步使用estat sargan和estat abond 来报告这两个统计量），而xtabond2则可以，且该命令会同时报告hansen统计量。

xtdpdsys or xtdpd is more concise way to write code for system GMM, but basically similar to xtabond2.

xtdpdsys or xtdpd can set the predetermined vars in “pre()” and endpgenous vars in “endog()”, but they do not report sargan test and AR(2), need to use “estat sargan” and “estat abond” to get the postestimation, but xtabond2 automatically report these.

Here are from stata website for their difference:
http://www.stata-press.com/manuals/stata10/xtintro.pdf

b. New estimation command xtdpdsys fits dynamic panel-data models by using the Arellano–Bover/Blundell–Bond system estimator. xtdpdsys is an extension of xtabond and produces
estimates with smaller bias when the AR process is too persistent. xtpdsys is also more efficient than xtabond. Whereas xtabond uses moment conditions based on the differenced
errors in producing results, xtpdsys uses moment conditions based on differences and levels.
See [XT] xtdpdsys.

c. New estimation command xtdpd fits dynamic panel-data models extending the Arellano–Bond or the Arellano–Bover/Blundell–Bond system estimator and allows a richer syntax for specifying models and so will fit a broader class of models then either xtabond or xtdpdsys. xtdpd can be used to fit models with serially correlated idiosyncratic errors,
whereas xtdpdsys and xtabond assume no serial correlation. xtdpd can be used with models where the structure of the predetermined variables is more complicated than that assumed by xtdpdsys or xtabond. See [XT] xtdpd.d.

New postestimation command estat abond tests for serial correlation in the first-differenced
errors. See [XT] xtabond postestimation, [XT] xtdpdsys postestimation, and [XT] xtdpd postestimation.

e. New postestimation command estat sargan performs the Sargan test of overidentifying restrictions. See [XT] xtabond postestimation, [XT] xtdpdsys postestimation, and [XT] xtdpd

clear

set more off

infile exp wks occ ind south smsa ms fem union ed blk lwage ///

using “D:\软件培训资料\动态面板\aa.txt”

drop in 1

describe

summarize

generate person=group(595)

bysort person: generate period=group(7)

* panel data definition

xtset person period

xtdes

xtsum

generate exp2=exp^2

local x1 exp exp2 wks occ ind south smsa ms union

local x2 ed blk fem

* panel data regression: y=lwage

* x1=[1 exp exp2 wks occ ind south smsa ms union],

* x2=[ed blk fem] (time-invariant regressors)

xtdpdsys lwage occ ind south smsa, lags(1) maxldep(3) vce(robust) ///

endogenous(ms union,lag(0,2)) pre(wks,lag(1,2)) twostep

estimates store ABB1

xtdpdsys lwage occ ind south smsa, lags(2) maxldep(3) vce(robust) ///

endogenous(ms union,lag(0,2)) pre(wks,lag(1,2)) twostep

estimates store ABB2

xtdpdsys lwage occ ind south smsa, lags(3) maxldep(3) vce(robust) ///

endogenous(ms union,lag(0,2)) pre(wks,lag(1,2)) twostep

estimates store ABB3

estimates table ABB1 ABB2 ABB3, b se t p

* hypothesis testing

quietly xtdpdsys lwage occ ind south smsa, lags(2) maxldep(3) ///

endogenous(ms union,lag(0,2)) pre(wks,lag(1,2)) twostep artest(4)

estat abond // test for autocorrelation

estat sargan // test for IV overidentification

xtabond2 df age age2 ed12 nwe12 perd2 perd3 perd4 lnrtb3 ///

dna dnk dms dhrsw dhrsh dyu2, gmm(L.(lnrtb3 dms dna dnk dfu dyu2 dhrsh dhrsw), lag(3) collapse) ///

iv(age age2 edCol edColp ednoHS) twostep robust ///

noconstant small orthogonal art(3)

*直接复制help中的例子

use http://www.stata-press.com/data/r7/abdata.dta

xtabond2 n l.n l(0/1).(w k) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984, passthru) noleveleq small

xtabond2 n l.n l(0/1).(w k) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984, mz) robust twostep small h(2)

xtabond2 n l(1/2).n l(0/1).w l(0/2).(k ys) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984) robust twostep

small

* Next two are equivalent, assuming id is the panel identifier

ivreg2 n cap (w = k ys rec) [pw=_n], cluster(ind) orthog(rec)

xtabond2 n w cap [pw=_n], iv(cap k ys, eq(level)) iv(rec, eq(level)) cluster(ind) h(1)

* Same for next two

regress n w k

xtabond2 n w k, iv(w k, eq(level)) small h(1)

* And next two, assuming xtabond updated since May 2004 with update command.

xtabond n yr*, lags(1) pre(w, lags(1,.)) pre(k, endog) robust small noconstant

xtabond2 n L.n w L.w k yr*, gmm(L.(w n k)) iv(yr*) noleveleq robust small

* And next two

xtdpd n L.n L(0/1).(w k) yr1978-yr1984, dgmm(w k n) lgmm(w k n) liv(yr1978-yr1984) vce(robust) two hascons

xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n)) iv(yr1978-yr1984, eq(level)) h(2) robust twostep

* Three ways to reduce the instrument count

xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n)) iv(yr1978-yr1984, eq(level)) h(2) robust twostep pca

xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n), collapse) iv(yr1978-yr1984, eq(level)) h(2) robust twostep

xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n), lag(1 1)) iv(yr1978-yr1984, eq(level)) h(2) robust twostep

广义矩估计（Generalized Method of Moments，即GMM）
一、解释变量内生性检验
首先检验解释变量内生性（解释变量内生性的Hausman 检验：使用工具变量法的前提是存在内生解释变量。Hausman 检验的原假设为：所有解释变量均为外生变量，如果拒绝，则认为存在内生解释变量，要用IV；反之，如果接受，则认为不存在内生解释变量，应该使用OLS。
reg ldi lofdi
estimates store ols
xtivreg ldi (lofdi=l.lofdi ldep lexr)
estimates store iv
hausman iv ols
（在面板数据中使用工具变量，Stata提供了如下命令来执行2SLS:xtivreg depvar [varlist1] (varlist_2=varlist_iv) （选择项可以为fe，re等，表示固定效应、随机效应等。详见help xtivreg）
如果存在内生解释变量，则应该选用工具变量，工具变量个数不少于方程中内生解释变量的个数。“恰好识别”时用2SLS。2SLS的实质是把内生解释变量分成两部分，即由工具变量所造成的外生的变动部分，以及与扰动项相关的其他部分；然后，把被解释变量对中的这个外生部分进行回归，从而满足OLS前定变量的要求而得到一致估计量。tptqtp
二、异方差与自相关检验
在球型扰动项的假定下，2SLS是最有效的。但如果扰动项存在异方差或自相关，
面板异方差检验：
xtgls enc invs exp imp esc mrl,igls panel(het)
estimates store hetero
xtgls enc invs exp imp esc mrl,igls
estimates store homo
local df = e(N_g) – 1
lrtest hetero homo, df(`df’)
面板自相关：xtserial enc invs exp imp esc mrl
则存在一种更有效的方法，即GMM。从某种意义上，GMM之于2SLS正如GLS之于OLS。好识别的情况下，GMM还原为普通的工具变量法；过度识别时传统的矩估计法行不通，只有这时才有必要使用GMM，过度识别检验（Overidentification Test或J Test）：estat overid
三、工具变量效果验证
工具变量：工具变量要求与内生解释变量相关，但又不能与被解释变量的扰动项相关。由于这两个要求常常是矛盾的，故在实践上寻找合适的工具变量常常很困难，需要相当的想象力与创作性。常用滞后变量。
需要做的检验：
检验工具变量的有效性：
（1）检验工具变量与解释变量的相关性
如果工具变量z与内生解释变量完全不相关，则无法使用工具变量法；如果与仅仅微弱地相关，。这种工具变量被称为“弱工具变量”（weak instruments）后果就象样本容量过小。检验弱工具变量的一个经验规则是，如果在第一阶段回归中，F统计量大于10，则可不必担心弱工具变量问题。Stata命令：estat first（显示第一个阶段回归中的统计量）
（2）检验工具变量的外生性（接受原假设好）
在恰好识别的情况下，无法检验工具变量是否与扰动项相关。在过度识别（工具变量个数>内生变量个数）的情况下，则可进行过度识别检验（Overidentification Test），检验原假设所有工具变量都是外生的。如果拒绝该原假设，则认为至少某个变量不是外生的，即与扰动项相关。0H
Sargan统计量，Stata命令：estat overid
四、GMM过程
在Stata输入以下命令，就可以进行对面板数据的GMM估计。
. ssc install ivreg2 （安装程序ivreg2 ）
. ssc install ranktest （安装另外一个在运行ivreg2 时需要用到的辅助程序ranktest）
. use “traffic.dta”（打开面板数据）
. xtset panelvar timevar （设置面板变量及时间变量）
. ivreg2 y x1 (x2=z1 z2),gmm2s （进行面板GMM估计，其中2s指的是2-step GMM）

The Sargan test is a statistical test used to check for over-identifying restrictions in a statistical model. It is also known as the Hansen test or J-Test for Overidentying restrictions. The Sargan test is based on the observation thatthe residuals should be uncorrelated with the set of exogenous variables if the instruments are truly exogenous. The Sargan test statistic can be calculated as TR² (the number of observations multiplied by the coefficient of determination) from the OLS regression of the residuals (from IV estimation) onto the set of exogenous variables.This statistic will be asymptotically chi-squared with m − k (where m is the number of Instruments and k is the number of endogenous variables) degrees of freedom under the null that the error term is uncorrelated with the instru

转载请注明：数据分析 » 关于xtdpdsys和xtabond2_stata xtabond