MAE256 T2 Sample Assignment

MAE256 T2 – ASSIGNMENT

Answers

Q.1 The descriptive statistics of house sale prices is presented in the Table 1. The output was generated using excel.

Table 1Descriptive statistics of house sale price in thousands of dollars.

price

Mean

804.8811

Standard Error

4.336317

Median

798.9556

Mode

842.4683

Standard Deviation

137.1264

Sample Variance

18803.64

Kurtosis

-0.66476

Skewness

0.090698

Range

685.3633

Minimum

436.527

Maximum

1121.89

Sum

804881.1

Count

1000

Confidence Level (95.0%)

8.509334

From Table 1, it can be seen that the average house sale price in thousands of dollars is $804.88. The median house sale price is $798.95 thousand dollar. The spread of the sale prices can be assessed from the standard deviation which is $137.13 thousand dollar. Therefore, the spread of the data is high. The skewness of the data is very low which is 0.09. Therefore, it can be commented that the data is symmetrical and follows normal distribution. It can also be seen from the difference between mean and media, which are nearly the same considering the size of data. Therefore, the spread of the data is high whereas skewness of the house prices is nearly symmetrical.

Q2. The mean sale price of house is 804.11 and the standard deviation is 137.13. The house price which is one standard deviation away from the mean would be:

µ±σ = [804.11 – 137.13, 804.11 + 137.13] = [666.98, 941.24]. Therefore, as it is known that proportion with one standard deviation away from mean is nearly 68.5%. Hence, 68.5% of the prices are between $668.98 thousand and $941.24 thousand.

Q.3 The Figure 1 presets the scatter plot of price against the size. It is generated using Excel. It represents size on x-axis and price on y-axis.

MAE256 T2  Assignment img1

Figure 1 Scatter plot between price and size

As it can be seen from the figure that there is a positive relationship between house sale prices and size of the house. It can be interpreted that as the size of house in square meters increases the sale price of house also increases.

Figure 2 presents the scatter plot between the proximity and price. Proximity is a dummy variable which shows whether the house is located near major business district or not. If house is near a major business district then it takes the value of 1 or 0 otherwise. The proximity is represented on x-axis and sale price is represented on y-axis.

MAE256 T2  Assignment img2

Figure 2 Scatter plot between price and proximity

It can be seen from the above figure, that as the house is more near to the major business district, the price will be higher. Therefore, it can be stated that the proximity of house also affect the sale prices positively. In conclusion, both size and proximity are positively related to the sale prices of house.

Q.4 The linear regression of below stated model is performed using excel,

MAE256 T2  Assignment img3

In the above model, the dependent variable is prices which is regressed with explanatory variables of size, age and proximity. It should be noted that the proximity is dummy variable. The excel output of the above model is represented in Figure 3.

Table 2 Regression Output

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.93093204

R Square

0.866634463

Adjusted R Square

0.86623276

Standard Error

50.15287793

Observations

1000

ANOVA

df

SS

MS

F

Regression

3

16279587.45

5426529

2157.399

Residual

996

2505249.92

2515.311

Total

999

18784837.37

Coefficients

Standard Error

t Stat

P-value

Intercept

25.59282925

14.00464211

1.827453

0.067931

size

3.890134007

0.078081854

49.82123

1E-272

age

-0.601560515

0.168471364

-3.5707

0.000373

proximity

195.8438204

3.176667127

61.65072

0

Therefore, our estimated model is:

MAE256 T2  Assignment img4

The adjusted R2 of the model is 0.866, which is a measure of goodness of fit. The intercept of our model is the base price of the sale price. It is the average price level which is not dependent on any of the explanatory variables. The estimates of slope parameters of size, age and proximity are 3.8901, -0.6016 and 195.8438 respectively. The estimate of slope coefficient of size can be interpreted as the sale price is increased by $3.89 thousand if size increases by 1 square meters, keeping other variables constant. Similarly, if the age of the house is increased by 1 year the sale prices are decreased by $-0.6015 thousand while keeping the effect of size and proximity constant. If the house is near major business district then the proximity variable will take the value of one, therefore, the prices of house which are near business districts are on 195.8438 thousand more as compared to the houses which are not near the business district keeping other factors constant. The results are expected as, the houses with more size and which are near business districts will costs more. The age of house is also very important factor, the more old the age of the house, the prices of house will decrease. It is also important to note that at 5% only size, age and proximity estimates are significant.

Q.5 After adding two more variables of swimming pool and fire place into our previous model, the following model is obtained.

MAE256 T2  Assignment img5

Both the variables that are introduced into the model are dummy variable. Pool variable takes the value of 1 if house has swimming pool or 0 otherwise. Similarly, fireplace takes the value of 1 if house has fireplace or 0 otherwise. The regression was performed using excel, the output is represented in Table 3.

The R2 of model before introduction of pool and fireplace variable was 0.8666 and the R2 after the introduction of the said variable is 0.8686. As it can be seen that R2 of the model is increased very little. Therefore, the introduction of the model did not increase the ‘goodness of fit’ of the model. It is better to also consider the value of adjusted R2 as it penalize the model for the introduction of new variables. The adjusted R2 before introducing new variable is 0.8662 and after introducing new variable the value of adjusted R2 is 0.8679. Therefore, introduction of adding new variable has increased the goodness of fit of the model but with a very modest amount. Nevertheless, the variation explained by the model has increased by introduction of two new variables.

Table 3 Regression output

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.9319731

R Square

0.868573859

Adjusted R Square

0.867912761

Standard Error

49.83694429

Observations

1000

ANOVA

df

SS

MS

F

Regression

5

16316018.68

3263204

1313.837

Residual

994

2468818.691

2483.721

Total

999

18784837.37

Coefficients

Standard Error

t Stat

P-value

Intercept

22.46361047

13.94043772

1.611399

0.10741

size

3.880094147

0.077978005

49.75883

4.3E-272

age

-0.627220786

0.1675913

-3.74256

0.000193

proximity

195.6377575

3.157476777

61.96016

0

pool

14.1458518

3.917097047

3.61131

0.00032

fireplace

4.546132296

3.174622828

1.432023

0.152452

Q6. After transforming the price and size variables into natural logarithms. The following model is obtained:

MAE256 T2  Assignment img6

The regression was performed using excel and the output is presented in Table 4. The estimated model is following;

MAE256 T2  Assignment img7

Table 4 Regression output

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.928131596

R Square

0.86142826

Adjusted R Square

0.860731219

Standard Error

0.064934719

Observations

1000

ANOVA

df

SS

MS

F

Significance F

Regression

5

26.05462137

5.210924

1235.836

0

Residual

994

4.191218689

0.004217

Total

999

30.24584006

Coefficients

Standard Error

t Stat

P-value

Intercept

2.175009803

0.090650516

23.99335

1E-100

log(size)

0.847233288

0.017580409

48.1919

2.5E-262

age

-0.000873122

0.000218377

-3.99823

6.85E-05

proximity

0.247529504

0.004113783

60.17077

0

pool

0.018570706

0.00510377

3.638625

0.000288

fireplace

0.007021951

0.004136736

1.697462

0.089922

The slope coefficient of log(size), represents the percentage change in percentage change in price due to one percent change in size. Therefore, it can be interpreted as the sale price of houses increased by 84.72% or nearly 85% when there is one percent increase in size of the house. The estimate of  represents the percentage difference in house of sale prices when there is availability of swimming pool in the house keeping size, proximity, age and fireplace constant. Therefore, it can be interpreted as the sale prices of houses with swimming pool availability will be nearly 1.86% more as compared to prices of house which does not have swimming pool.

Q7. Null Hypothesis: H0: Fireplace does not influence the house prices: H0:

Alternate hypothesis: H1 Fireplace influence the house prices: Ha: .

It can be seen from Table 4 that p-value of the estimate of fireplace is 0.08 which is far greater than the 0.01 level of significance. Therefore, at 1% level of significance, cannot reject the null hypothesis as p-value is greater than significance level. Hence, it can be stated that there are no enough evidence to reject the claim that fireplace does not influence the house prices. However, at 10% level of significance, the p-value is smaller than 0.10, therefore, we can state that there are enough evidences to reject the null hypothesis. Hence, it can be stated at 10% level of significance that fireplace does influence the sale prices of house.

Q.8 Null Hypothesis: all the variables in the model are not significant.

Alternate hypothesis: at least one coefficient is different from zero.

MAE256 T2  Assignment img8

According to the output generated in Table 4, it can be seen that p-value of F statistic is given by 0 which is lower than the 0.05 level of significance. Hence, there are enough evidences to reject the null hypothesis. Therefore, it can be concluded that the variables are jointly significant. Hence at least one variable from size, proximity, age, pool and fireplace affects the sale prices of house.