在2SLS回归时可决系数R方为负数咋办? 一般不给别人说
小猪(计量经济圈)
在2SLS回归时可决系数R方为负数咋办? 一般不给别人说
国际经贸研究小组 计量经济圈 6月7日

凡是搞计量经济的,都关注这个号了
投稿:econometrics666@sina.cn
所有计量经济圈方法论丛的程序文件,微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.

国际经贸研究小组(在公众号菜单栏输入“国际经贸研究小组”可加入)回答如下:
简单一句话总结为:根据Sribney et al. (2005),R2在2SLS中可能是负数,即RSS > TSS,他们通过模拟证明它对于模型好坏的评估不产生任何影响。如果想要深入了解相关解释,可以参阅下面的英文materials。
Background
Two-stage least-squares (2SLS) estimates, or instrumental variables (IV) estimates, are obtained in Stata using theivregresscommand.
ivregresssometimes reports noR2and returns a negative value for the model sum of squares in e(mss).
Three-stage least-squares (3SLS) estimates are obtained usingreg3.reg3sometimes reports a negativeR2and model sum of squares. The discussion below focuses on 2SLS/IV; the issues for 3SLS are the same.
The short answer
MissingR2s, negativeR2s, and negative model sum of squares are all the same issue.
Stata’sivregresscommand suppresses the printing of anR2on 2SLS/IV if theR2is negative, which is to say, if the model sum of squares is negative.
Whether a negativeR2should be reported or simply suppressed is a matter of taste. At any rate, theR2really has no statistical meaning in the context of 2SLS/IV.
If it makes you feel better, you can compute theR2yourself from the returned results (seeAn examplesection of the FAQ).
For two-stage least squares, some of the regressors enter the model as instruments when the parameters are estimated. However, since our goal is to estimate the structural model, the actual values, not the instruments for the endogenous right-hand-side variables, are used to determine the model sum of squares (MSS). The model’s residuals are computed over a set of regressors different from those used to fit the model. This means a constant-only model of the dependent variable isnotnested within the two-stage least-squares model, even though the two-stage model estimates an intercept, and the residual sum of squares (RSS) is no longer constrained to be smaller than the total sum of squares (TSS). When RSS exceeds TSS, the MSS and theR2will be negative.
The long answer—How can anR2be negative?
The formula forR-squared is
R2= MSS/TSS
where
MSS = model sum of squares = TSS − RSS and TSS = total sum of squares = sum of (y − ybar)2and RSS = residual (error) sum of squares = sum of (y − Xb)2
For your model,MSSis negative, soR2would be negative.
MSSis negative becauseRSSis greater thanTSS.RSSis greater thanTSSbecauseybaris a better predictor ofy(in the sum-of-squares sense) thanXb!
How canXbbe worse thanybar, especially when the model includes the constant term? At first glance, this seems impossible. But it is possible with the 2SLS/IV model.
Here are the background essentials:
LetZbe the matrix of instruments (say,z1,z2,z3,z4).
LetXbe the matrix of regressors (say,y2,y3,z3,z4, wherey2andy3are endogenous andz3andz4are exogenous).
Letybe the endogenous variable of interest. That is, we want to estimateb, where
y = Xb + error
LetP = Z (Z'Z)−1Z'be the projection matrix into the space spanned byZ.
2SLS/IV gives point estimates
b = ((PX)' PX)-1(PX)' y
The coefficients are simply those from an ordinary regression but with the predictors in the columns ofPX(the projection ofXintoZspace).
Let’s assume you have two endogenous right-hand-side variables (y1andy2), two exogenous variables (x1andx2), and two instruments not in the structural equation (z1andz2). This makes your structural equation
y = (Y)B1 + (X)B2 + e
or
y = b1*y1 + b2*y2 + b3*x1 + b3*x2 + e
(whereB1andB2are components of the vector of coefficients—b). If you run the following,
. regress y1 x1 x2 z1 z2 . predict yhat1 . regress y2 x1 x2 z1 z2 . predict yhat2 . regress y yhat1 yhat2 x1 x2
you will get exactly the coefficients of the 2SLS/IV model (but you will get different standard errors):
. ivregress 2sls y (y1 y2 = z1 z2) x1 x2
Now if we computed residuals after
. regress y yhat1 yhat2 x1 x2
the residuals would be
r = y − (PX)b
The sum of squares of these residuals would always be less than the total sum of squares.
But these are not the right residuals for 2SLS/IV. Because we are fitting a structural model, we are interested in the residuals using the actual values of the endogenous variables.
The correct two-stage least-squares residuals are
e = y − Xb
Here there is no guarantee that the sum of these residuals squared are less than the total sum of squares. These residuals do not come from a model that nests a constant-only model ofy.
An example
Let’s take a simple, and admittedly silly, example from our favorite dataset—auto.dta.
. sysuse auto, clear (1978 Automobile Data) . ivregress 2sls price (mpg = foreign) headroom Instrumental variables (2SLS) regression Number of obs = 74 Wald chi2(2) = 1.15 Prob > chi2 = 0.5619 R-squared = . Root MSE = 3363.6 price Coef. Std. Err. z P>|z| [95% Conf. Interval] mpg 154.4941 239.2968 0.65 0.519 -314.519 623.5072headroom 836.4137 821.6528 1.02 0.309 -773.9962 2446.824_cons 371.36 7268.765 0.05 0.959 -13875.16 14617.88 Instrumented: mpg Instruments: headroom foreign . display "MSS: " %15.0f e(mss) MSS: -202135715
There is your negative model sum of squares (−202135715). The model sum of squares is just the improvement over the sum of squares about the mean given by the full model. In this example, the sum of squared residuals from the model predictions is 837201111, whereas the sum of squared residuals about the mean of price is 635065396. By computing the model sum of square as
. display "MSS: " %15.0f 635065396 - 837201111 MSS: -202135715
we can see that our model actually performs worse than the mean of price. Why didn’t our constant keep this from happening? The coefficients are estimated using an instrument for mpg. Thus the constant need not provide an intercept that minimizes the sum of squared residuals when theactualvalues of the endogenous variables are used.
Just to be sure, let’s perform the sum of square computations by hand.
To get the sum of squared residuals for our model, type
. predict double errs, residuals . gen double errs2 = errs*errs . summarize errs2 Variable Obs Mean Std. Dev. Min Max errs2 74 1.13e+07 2.01e+07 3017.3 9.57e+07 . display "ESS: " %15.0f r(sum) ESS: 837201111
which agrees exactly with thereturned results fromivregress.
. display "ESS: " %15.0f e(rss) ESS: 837201111
To get the total sum of squared residuals about the mean of price, type
. summarize price Variable Obs Mean Std. Dev. Min Max price 74 6165.257 2949.496 3291 15906 . gen double pbarErr2 = (price - r(mean))^2 . summarize pbarErr2 Variable Obs Mean Std. Dev. Min Max pbarErr2 74 8581965 1.69e+07 .065924 9.49e+07 . display "TSS: " %15.0f r(sum) TSS: 635065396
So, our “hand” computations also give a model sum of squares of −202135715 and agree with the value returned byivregress.
Is a negativeR2a problem?
What does it mean when RSS is greater than TSS? Does this mean our parameter estimates are no good? Not really. You can easily develop simulations where the parameter estimates from two-stage are quite good while the MSS is negative. Remember why we fit two-stage models. We are interested in the parameters of the structural equation—the elasticity of demand, the marginal propensity to consume, etc. If our two-stage model produces estimates of these parameters with acceptable standard errors, we should be happy—regardless of MSS orR2. If we were interested strictly in projections of the dependent variable, we should probably consider the reduced form of the model.
Another way of stating this point is that there are models in which in the distribution of 2SLS estimates of the parameters will be well approximated by its theoretical distribution but that theR2computed from some samples will be negative. There are several ways of illustrating this point. Perhaps the most accessible is via simulation.
We simulate data from the model
(1) y = 1 + − .1*x + e1 + e2
(2) x = w + z + c1 + .5*e1
(3) z = 1.5*c1 + e3
wheree1,e2,w,andc1are all independent normal random variables. Thec1term in (2) and (3) provide the correlation betweenxandz.Thee1term in (1) and (2) is the source of the correlation betweenxand the error term(e1 + e2)fory. The coefficient of −0.1 is the parameter that we are trying to estimate. We are going to estimate this parameter with 2SLS usingivregresswithyas the dependent variable,xas the endogenous variable, andzas the instrument forx. For each simulated sample, we constructy,x, andzusing independent draws of the standard normal variablese1,e2,w, andc1and (1)–(3). Then we use
. ivregress 2sls y (x = z)
to estimate the coefficient −0.1. For each simulated sample, we record the following statistics:
b1estimate of the coefficient (−.1)pp of the null hypothesis that b1 = −.1rejectif p<.05 and 0 otherwiser2computed R2(missing if mss < 0)mssvalue of the model sum of squaresrho_x1ecorrelation between x1 and e=e1+e2rho_x1z1correlation between x1 and z1fsffirst stage F statisticp_fsfp-value from the first stage F statistic
The Stata code for drawing 2,000 simulations of this model, estimating the coefficient −0.1, computing the statistics of interest, and finally, summarizing the results, is saved in the filenegr2.do. Each simulated sample contains 1,000 observations, so the results should not be attributed to a small sample size.
Here are the results we obtained with thesummarizecommand:
. summarize Variable Obs Mean Std. Dev. Min Max b1 2000 -.0981982 .0541345 -.2771809 .0765793p 2000 .4945649 .2884685 .0002706 .9995125reject 2000 .0485 .214874 0 1r2 64 .0068443 .0063426 .000051 .0264567mss 2000 -78.4407 49.08486 -273.4773 47.94914 rho_x1e 2000 .235859 .0300348 .1194255 .3460462rho_x1z1 2000 .5556971 .0216154 .4764362 .6183904fsf 2000 448.584 50.32493 293.0595 617.9501p_fsf 2000 2.62e-34 7.49e-33 0 3.29e-31
The results forrho_x1e,rho_x1z1,fsf, andp_fsfindicate the correlations between the endogenous variable and the error term and between the endogenous variable and its instrument are reasonable and there is no weak-instrument problem. The results forb1,p, andrejectindicate that the mean estimate of the coefficient onxis very close to its true value of −0.1 and that there is no size distortion of the test that coefficient on x = −0.1. In short, the distribution of the estimates,b1, is very well approximated by its theoretical asymptotic distribution. Together, these results imply that the 2SLS estimator is performing according to the theory in these simulations.
There are only 64 observations onr2because there are 1,952 observations in whichmss< 0.
. count if mss < 0 1936
Thus the results illustrate that there is at least one model for which the distribution of the 2SLS estimates of the parameters is very well approximated by its asymptotic distribution but that theR2will be negative in most of the individual samples. To obtain more models that produce the same qualitative results, simply change the coefficient −0.1 by a small amount. As one would expect, increasing the coefficient −0.1 reduces the fraction of the of simulated samples that produce a negativeR2.
Reference:
https://www.stata.com/support/faqs/statistics/two-stage-least-squares/
2年,计量经济圈公众号近1000篇文章,
Econometrics Circle
数据系列:空间矩阵|工企数据|PM2.5|市场化指数|CO2数据|夜间灯光
计量系列:匹配方法|内生性|工具变量|DID|面板数据|常用TOOL|中介调节 | 时间序列
干货系列:能源环境|效率研究|空间计量|国际经贸|计量软件|商科研究 | 机器学习 |SSCI | CSSCI
计量经济圈组织了一个计量社群,有如下特征:热情互助最多、前沿趋势最多、社科资料最多、社科数据最多、科研牛人最多、海外名校最多。因此,建议积极进取和有强烈研习激情的中青年学者到社群交流探讨,始终坚信优秀是通过感染优秀而互相成就彼此的。

微信扫一扫 关注该公众号
你的回复
回复请先 登录 , 或 注册相关内容推荐
最新讨论 ( 更多 )
- 截面, 时间和面板的门槛回归模型, threshold (小猪)
- 顶级期刊目录及历史文章 (小猪)
- 发表Top5刊的500强名单出炉, 这几位中国人实至名归 (小猪)
- 最近几年国外国际贸易学术研究前沿 (小猪)
- 实践中双重差分法DID暗含的假设 (小猪)