Regression in STATA with Indicator Variables
Dummy Variable in Multiple Regression STATA Tutorial
Consider the specification:
LNWAGE = β0 + β1EDU + β2EX + β3EXSQ + ε
Does the return to experience differ by years of schooling? How would you allow that the return to experience differs by education level? Test whether the return to experience differs by education level.
To test whether the returns to experience differ by years of schooling, we run the following regression; EX = C + β*EDU
The regression output and STATA command for regression with indicator variable used is given as follows;
regress EX i.EDU
Interpretation of the STATA output for regression with indicator variables.
Based on the result obtained, we note that the very initial years of schooling (till 4 years) significantly affect the return to experience and beyond 4 years of schooling, the affect is insignificant on returns to experience and beyond certain years of schooling, an additional year casts a negative effect on the returns to experience. Thus, we conclude that returns to experience do differ by years of schooling as indicated by huge change in coefficient values beyond 4 years of schooling. The returns to experience differ by education level and this difference can be attributed to the fact that with higher years of schooling, the age also increases and there are diminishing returns to human capital as human capital depreciates with age. Therefore, statistically the returns to experience do differ by education level as implied by the statistical significance of 3rd and 4th years of schooling and statistical insignificance of later years (which is indicated by the corresponding values of t-statistic, which is less than 2 for all years of schooling except 3rd and 4th, leading us not to reject the null that these beta coefficients are equal to zero at 5% level of significance).
DUMMY VARIABLES IN STATA WITH INDICATOR VARIABLES
Suppose in our model we want to test whether there is a wage premium for union jobs.
To test whether there is a wage premium for union jobs, we first estimate the following equation;
LNWAGE = C + β*UNIO
Here, UNIO is a dummy variable that takes the value 1 for people working in union jobs and 0 otherwise. The regression output and the used STATA command for regression with indicator variables is given as under
regress LNWAGE i.UNIO
Interpreting the STATA output for Dummy variable regression
The value of constant term, which is 2.0078 measure the hourly wage of individuals in non-union jobs and value of coefficient β, which is 0.2856 implies that people in union job earn an hourly wage in logarithmic terms that is higher by 0.2856 than those in non-union jobs. In other words, it measures the wage premium for union jobs. To test whether the wage premium is statistically significant or not, we do the following ttest in stata. The command and the output for t-test In STATA is given as under:
ttest LNWAGE,by(UNIO)
Since the value of t-statistic is greater than 2 in absolute value, we reject the null that difference in wage earnings across union and non-union jobs is 0 at 5% level of significance. We conclude that the wage premium in union jobs is statistically significant at 5% level of significance. Therefore, there is statistically significant wage premium for union jobs.