# MAE256 T2 Sample Assignment

MAE256 T2 – ASSIGNMENT

Q.1 The descriptive statistics of house sale prices is presented in the Table 1. The output was generated using excel.

Table 1Descriptive statistics of house sale price in thousands of dollars.

 price Mean 804.8811 Standard Error 4.336317 Median 798.9556 Mode 842.4683 Standard Deviation 137.1264 Sample Variance 18803.64 Kurtosis -0.66476 Skewness 0.090698 Range 685.3633 Minimum 436.527 Maximum 1121.89 Sum 804881.1 Count 1000 Confidence Level (95.0%) 8.509334

From Table 1, it can be seen that the average house sale price in thousands of dollars is \$804.88. The median house sale price is \$798.95 thousand dollar. The spread of the sale prices can be assessed from the standard deviation which is \$137.13 thousand dollar. Therefore, the spread of the data is high. The skewness of the data is very low which is 0.09. Therefore, it can be commented that the data is symmetrical and follows normal distribution. It can also be seen from the difference between mean and media, which are nearly the same considering the size of data. Therefore, the spread of the data is high whereas skewness of the house prices is nearly symmetrical.

Q2. The mean sale price of house is 804.11 and the standard deviation is 137.13. The house price which is one standard deviation away from the mean would be:

µ±σ = [804.11 – 137.13, 804.11 + 137.13] = [666.98, 941.24]. Therefore, as it is known that proportion with one standard deviation away from mean is nearly 68.5%. Hence, 68.5% of the prices are between \$668.98 thousand and \$941.24 thousand.

Q.3 The Figure 1 presets the scatter plot of price against the size. It is generated using Excel. It represents size on x-axis and price on y-axis.

Figure 1 Scatter plot between price and size

As it can be seen from the figure that there is a positive relationship between house sale prices and size of the house. It can be interpreted that as the size of house in square meters increases the sale price of house also increases.

Figure 2 presents the scatter plot between the proximity and price. Proximity is a dummy variable which shows whether the house is located near major business district or not. If house is near a major business district then it takes the value of 1 or 0 otherwise. The proximity is represented on x-axis and sale price is represented on y-axis.

Figure 2 Scatter plot between price and proximity

It can be seen from the above figure, that as the house is more near to the major business district, the price will be higher. Therefore, it can be stated that the proximity of house also affect the sale prices positively. In conclusion, both size and proximity are positively related to the sale prices of house.

Q.4 The linear regression of below stated model is performed using excel,

In the above model, the dependent variable is prices which is regressed with explanatory variables of size, age and proximity. It should be noted that the proximity is dummy variable. The excel output of the above model is represented in Figure 3.

Table 2 Regression Output

 SUMMARY OUTPUT Regression Statistics Multiple R 0.93093204 R Square 0.866634463 Adjusted R Square 0.86623276 Standard Error 50.15287793 Observations 1000 ANOVA df SS MS F Regression 3 16279587.45 5426529 2157.399 Residual 996 2505249.92 2515.311 Total 999 18784837.37 Coefficients Standard Error t Stat P-value Intercept 25.59282925 14.00464211 1.827453 0.067931 size 3.890134007 0.078081854 49.82123 1E-272 age -0.601560515 0.168471364 -3.5707 0.000373 proximity 195.8438204 3.176667127 61.65072 0

Therefore, our estimated model is:

The adjusted R2 of the model is 0.866, which is a measure of goodness of fit. The intercept of our model is the base price of the sale price. It is the average price level which is not dependent on any of the explanatory variables. The estimates of slope parameters of size, age and proximity are 3.8901, -0.6016 and 195.8438 respectively. The estimate of slope coefficient of size can be interpreted as the sale price is increased by \$3.89 thousand if size increases by 1 square meters, keeping other variables constant. Similarly, if the age of the house is increased by 1 year the sale prices are decreased by \$-0.6015 thousand while keeping the effect of size and proximity constant. If the house is near major business district then the proximity variable will take the value of one, therefore, the prices of house which are near business districts are on 195.8438 thousand more as compared to the houses which are not near the business district keeping other factors constant. The results are expected as, the houses with more size and which are near business districts will costs more. The age of house is also very important factor, the more old the age of the house, the prices of house will decrease. It is also important to note that at 5% only size, age and proximity estimates are significant.

Q.5 After adding two more variables of swimming pool and fire place into our previous model, the following model is obtained.

Both the variables that are introduced into the model are dummy variable. Pool variable takes the value of 1 if house has swimming pool or 0 otherwise. Similarly, fireplace takes the value of 1 if house has fireplace or 0 otherwise. The regression was performed using excel, the output is represented in Table 3.

The R2 of model before introduction of pool and fireplace variable was 0.8666 and the R2 after the introduction of the said variable is 0.8686. As it can be seen that R2 of the model is increased very little. Therefore, the introduction of the model did not increase the ‘goodness of fit’ of the model. It is better to also consider the value of adjusted R2 as it penalize the model for the introduction of new variables. The adjusted R2 before introducing new variable is 0.8662 and after introducing new variable the value of adjusted R2 is 0.8679. Therefore, introduction of adding new variable has increased the goodness of fit of the model but with a very modest amount. Nevertheless, the variation explained by the model has increased by introduction of two new variables.

Table 3 Regression output

 SUMMARY OUTPUT Regression Statistics Multiple R 0.9319731 R Square 0.868573859 Adjusted R Square 0.867912761 Standard Error 49.83694429 Observations 1000 ANOVA df SS MS F Regression 5 16316018.68 3263204 1313.837 Residual 994 2468818.691 2483.721 Total 999 18784837.37 Coefficients Standard Error t Stat P-value Intercept 22.46361047 13.94043772 1.611399 0.10741 size 3.880094147 0.077978005 49.75883 4.3E-272 age -0.627220786 0.1675913 -3.74256 0.000193 proximity 195.6377575 3.157476777 61.96016 0 pool 14.1458518 3.917097047 3.61131 0.00032 fireplace 4.546132296 3.174622828 1.432023 0.152452

Q6. After transforming the price and size variables into natural logarithms. The following model is obtained:

The regression was performed using excel and the output is presented in Table 4. The estimated model is following;

Table 4 Regression output

 SUMMARY OUTPUT Regression Statistics Multiple R 0.928131596 R Square 0.86142826 Adjusted R Square 0.860731219 Standard Error 0.064934719 Observations 1000 ANOVA df SS MS F Significance F Regression 5 26.05462137 5.210924 1235.836 0 Residual 994 4.191218689 0.004217 Total 999 30.24584006 Coefficients Standard Error t Stat P-value Intercept 2.175009803 0.090650516 23.99335 1E-100 log(size) 0.847233288 0.017580409 48.1919 2.5E-262 age -0.000873122 0.000218377 -3.99823 6.85E-05 proximity 0.247529504 0.004113783 60.17077 0 pool 0.018570706 0.00510377 3.638625 0.000288 fireplace 0.007021951 0.004136736 1.697462 0.089922

The slope coefficient of log(size), represents the percentage change in percentage change in price due to one percent change in size. Therefore, it can be interpreted as the sale price of houses increased by 84.72% or nearly 85% when there is one percent increase in size of the house. The estimate of  represents the percentage difference in house of sale prices when there is availability of swimming pool in the house keeping size, proximity, age and fireplace constant. Therefore, it can be interpreted as the sale prices of houses with swimming pool availability will be nearly 1.86% more as compared to prices of house which does not have swimming pool.

Q7. Null Hypothesis: H0: Fireplace does not influence the house prices: H0:

Alternate hypothesis: H1 Fireplace influence the house prices: Ha: .

It can be seen from Table 4 that p-value of the estimate of fireplace is 0.08 which is far greater than the 0.01 level of significance. Therefore, at 1% level of significance, cannot reject the null hypothesis as p-value is greater than significance level. Hence, it can be stated that there are no enough evidence to reject the claim that fireplace does not influence the house prices. However, at 10% level of significance, the p-value is smaller than 0.10, therefore, we can state that there are enough evidences to reject the null hypothesis. Hence, it can be stated at 10% level of significance that fireplace does influence the sale prices of house.

Q.8 Null Hypothesis: all the variables in the model are not significant.

Alternate hypothesis: at least one coefficient is different from zero.

According to the output generated in Table 4, it can be seen that p-value of F statistic is given by 0 which is lower than the 0.05 level of significance. Hence, there are enough evidences to reject the null hypothesis. Therefore, it can be concluded that the variables are jointly significant. Hence at least one variable from size, proximity, age, pool and fireplace affects the sale prices of house.

Assignment Help Features
Assignment Help Services
• Assignment Help
• Homework Help
• Writing Help