Computer applications assignment problem

COMPUTER APPLICATIONS ASSIGNMENT PROBLEM

Question# 01

The data has been extracted using the provided link for the variables under consideration. The data source is Australian Bureau of Statistics. Screen Shots of first and last ten rows of data is as follows (Abs.gov.au, 2018).

Question#02 a

Using “DATA ANALYSIS” tool pack of MS Excel, Descriptive statistics summary output has been generated and is listed below.

Retail turnover per capita - Current Prices ; Total (State) ;

FINAL CONSUMPTION EXPENDITURE: Current prices ;

Mean

1971.332143

Mean

121985.5143

Standard Error

65.3066091

Standard Error

5970.069937

Median

1878.8

Median

102630

Mode

#N/A

Mode

#N/A

Standard Deviation

772.7182196

Standard Deviation

70638.82012

Sample Variance

597093.4469

Sample Variance

4989842908

Kurtosis

-1.224812391

Kurtosis

-1.026668164

Skewness

0.152557367

Skewness

0.519449274

Range

2809.3

Range

241940

Minimum

733.6

Minimum

29251

Maximum

3542.9

Maximum

271191

Sum

275986.5

Sum

17077972

Count

140

Count

140

Confidence Level (95.0%)

129.1227708

Confidence Level (95.0%)

11803.88912

According to the descriptive statistics summary generated through MS Office, the measures of central tendency for the provided data can be explained as the arithmetic mean for final consumption expenditures is found to be equal to 121985.5143 $ while the Mean Retail turnover per capita is found to be equal to 1971.332143 $. The median is found to be equal to $102630 final consumption expense while it is found to be equal to $ 1878.8 for per capita retail turn over. Data is found to be positively skewed. The variation of data as shown by output is found to be very high for Final Consumption Expenditure while relatively low for Per Capita Retail Turn Over showing that there is high diversion in data points from the average value obtained. This can also be seen through Histogram.

Figure 1: Histogram Final Consumption Expenditure

This is the Histogram for Final Consumption Expenditure showing skewness of the data while the mean value is found to be in the smallest class of the distribution showing that the data is highly dispersed from the average calculated (Altman, 1980).

Figure 2: Histogram Retail turnover per Capita

It can be seen that the data is highly dispersed and is positively skewed while the variation from mean is also evident to be high as the mean calculated in the fourth class while the data is highly dispersed around this value.

Question#02 b

In order to see the association between the variables under consideration, a scatter plot has been generated using MS Excel. It is given below.

The relationship between the considered variable is assessed while considering the outcome impact that can be generated if Per Capita Retail Turn Over is changed therefore it is considered to be the Independent Variable or “X” while the “Total Consumption Expenditure” is considered to be the Dependent Variable or “Y”. The Scatter plot is drawn to assess the nature of association which is found to be positive as an increase in Retail Turnover results in an increase in Total Consumption Expenditures. Here the “Scatter Plot” has provided the estimated model for the association of variables.

This equation also shows the β to be positive further reinforcing that both variables are positively correlated.

Question 3a

In order to perform simple linear regression, “Data Analysis” tool has been employed of MS Excel. The output for regression is as follows.

Regression Statistics

Multiple R

0.972728226

R Square

0.946200201

Adjusted R Square

0.945810347

Standard Error

16443.77857

Observations

140

This is the first table generated and it shows the correlation coefficient termed ere as “Multiple R” which is found to be equal to 0.9727 showing a highly positive relationship between the variables under study. The coefficient of determination represented in the output as R2 (R squared) is found to be 0.9462 depicting that the model is a good fit and almost 95% values fit in the model and almost 95% variations in Final Consumption are caused by the Per Capita Retail Turn Over.

ANOVA

df

SS

MS

F

Significance F

Regression

1

6.56273E+11

6.56E+11

2427.065345

1.84889E-89

Residual

138

37314903827

2.7E+08

Total

139

6.93588E+11

The next table generated by MS Excel is ANOVA. The ANOVA tests the significance of association under the following hypothesis

Null Hypothesis or H0: β≠0

Alternate Hypothesis or Ha: β=0

The test statistic in this case is F which is found to be 2427.06534 and the chance of incurring a value greater or equal to 2427.064 is less than 0 approximately. It can be concluded that coefficient is greater than zero.

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

-53311.13118

3819.993082

-13.9558

3.7113E-28

-60864.41692

-45757.8

-60864.4

-45757.8

X Variable 1

88.92293777

1.804982799

49.26525

1.84889E-89

85.35393894

92.49194

85.35394

92.49194

The third table tests the significance of Y intercept and slope of the line using t statistic. The tests have been conducted and obtained p values are found to be extremely smaller than α (level of significance). The regression coefficient is found to be 88.92 which mean that one unit change in X will change Y by 88.92 times keeping other things constant.

Queston#2b

The residuals depict the difference between actual values of Y for certain X value and the calculated Y values for that certain point using the linear equation. Using the “Data Analysis” tool pack, the residuals have been calculated and the first rows are shown in the picture below.

Furthermore a scatter plot has been generated using residuals and the independent variable using Data Analysis to assess the fitness of the model.

Figure 3: Residual plot

The residual plot illustrates a reasonably non random pattern as initial and the last values are positive while the middle values are negative. This u-shaped graph depicts that the simple linear model utilized is providing an unreasonable fit to the given data.

Autocorrelation refers to statistical illustration of the level of resemblance among a specified time series and a delayed account of itself over succeeding intervals of time (Golberg & Cho, 2004). Since the data employed is time series, there are much chances of autocorrelation to exist, therefore, Durbin Watson test has been conducted using MS Excel. The results obtained show positive correlation to exist amongst the residuals. The Durbin Watson test statistic is found to be equal to 0.9354.

Durbin-Watson Calculations

Sum of Squared Difference of Errors

130.0191

Sum of Squared Errors

139.0000

Durbin-Watson Statistic

0.9354

4. Hypothesis Testing

In order to ascertain the significance of association between the variables under consideration, hypothesis testing is performed using the sample correlation coefficient denoted by “r”. This will help in concluding if the sample coefficient is a significant predictor of population coefficient of correlation denoted by “ρ”.

Hypothesis

The following null and alternated hypothesis are drawn for the purpose

Null hypothesis: H0: ρ = 0

Alternate hypothesis: Ha: ρ ≠ 0

Significance Level

In order to test our quoted hypothesis, 0.05 level of significance or α has been selected.

Decision Rule

In order to reject the hypothesis, we need to have a p-value that is fairly greater than the level of significance chosen.

Test Statistic

In order to check the significance t test has been conducted using the given formula.

In the formula N shows the sample size and r is the sample’s correlation coefficient

The t statistic is found to be equal to 49.2669. Now in order to conclude we need to get the p value associated with the calculated t statistic. The p value has been found to be equal to 0.00001 fairly lesser than our significance level.

Conclusion

Since the obtained p value is lesser than the selected significance level, we cannot reject our H0 and thus it can be concluded that that the final consumption expenditure is positively correlated with the per capita retail turnover of the whole population.

Using the regression outputs we will assess the fitness of the model.

R Square 0.946200201

Adjusted R Square 0.945810347

The R square and the R adjusted values are somewhat similar in the given case and are strongly showing that the model is a good fit to the provided data. As approximately 94% of the variations in Final Consumption expenditure is explained through this model.

References

Abs.gov.au. (2018). [Online] Available at: http://www.abs.gov.au/AUSSTATS/abs@.nsf/viewcontent?readform&view=ProductsbyCatalogue&Action=expandwithheader&Num=1 [Accessed 24 Sep. 2018].

Altman, D. G. (1980). “Statistics and ethics in medical research. VI - Presentation of results”, British Medical Journal, 281(6254), pp.1542-1544.

Golberg, M. A. & Cho, H. A. (2004). Introduction to Regression Analysis.WIT Press.