Computer applications assignment problem

COMPUTER APPLICATIONS ASSIGNMENT PROBLEM

Question# 01

The data has been extracted using the provided link for the variables under consideration. The data source is Australian Bureau of Statistics. Screen Shots of first and last ten rows of data is as follows (Abs.gov.au, 2018).

Question#02 a

Using “DATA ANALYSIS” tool pack of MS Excel, Descriptive statistics summary output has been generated and is listed below.

Retail turnover per capita - Current Prices ; Total (State) ;		FINAL CONSUMPTION EXPENDITURE: Current prices ;

Mean	1971.332143	Mean	121985.5143
Standard Error	65.3066091	Standard Error	5970.069937
Median	1878.8	Median	102630
Mode	#N/A	Mode	#N/A
Standard Deviation	772.7182196	Standard Deviation	70638.82012
Sample Variance	597093.4469	Sample Variance	4989842908
Kurtosis	-1.224812391	Kurtosis	-1.026668164
Skewness	0.152557367	Skewness	0.519449274
Range	2809.3	Range	241940
Minimum	733.6	Minimum	29251
Maximum	3542.9	Maximum	271191
Sum	275986.5	Sum	17077972
Count	140	Count	140
Confidence Level (95.0%)	129.1227708	Confidence Level (95.0%)	11803.88912

According to the descriptive statistics summary generated through MS Office, the measures of central tendency for the provided data can be explained as the arithmetic mean for final consumption expenditures is found to be equal to 121985.5143 $ while the Mean Retail turnover per capita is found to be equal to 1971.332143 $. The median is found to be equal to $102630 final consumption expense while it is found to be equal to $ 1878.8 for per capita retail turn over. Data is found to be positively skewed. The variation of data as shown by output is found to be very high for Final Consumption Expenditure while relatively low for Per Capita Retail Turn Over showing that there is high diversion in data points from the average value obtained. This can also be seen through Histogram.

Figure 1: Histogram Final Consumption Expenditure

This is the Histogram for Final Consumption Expenditure showing skewness of the data while the mean value is found to be in the smallest class of the distribution showing that the data is highly dispersed from the average calculated (Altman, 1980).

Figure 2: Histogram Retail turnover per Capita

It can be seen that the data is highly dispersed and is positively skewed while the variation from mean is also evident to be high as the mean calculated in the fourth class while the data is highly dispersed around this value.

Question#02 b

In order to see the association between the variables under consideration, a scatter plot has been generated using MS Excel. It is given below.

The relationship between the considered variable is assessed while considering the outcome impact that can be generated if Per Capita Retail Turn Over is changed therefore it is considered to be the Independent Variable or “X” while the “Total Consumption Expenditure” is considered to be the Dependent Variable or “Y”. The Scatter plot is drawn to assess the nature of association which is found to be positive as an increase in Retail Turnover results in an increase in Total Consumption Expenditures. Here the “Scatter Plot” has provided the estimated model for the association of variables.

This equation also shows the β to be positive further reinforcing that both variables are positively correlated.

Question 3a

In order to perform simple linear regression, “Data Analysis” tool has been employed of MS Excel. The output for regression is as follows.

Regression Statistics
Multiple R	0.972728226
R Square	0.946200201
Adjusted R Square	0.945810347
Standard Error	16443.77857
Observations	140

This is the first table generated and it shows the correlation coefficient termed ere as “Multiple R” which is found to be equal to 0.9727 showing a highly positive relationship between the variables under study. The coefficient of determination represented in the output as R²(R squared) is found to be 0.9462 depicting that the model is a good fit and almost 95% values fit in the model and almost 95% variations in Final Consumption are caused by the Per Capita Retail Turn Over.

ANOVA
	df	SS	MS	F	Significance F
Regression	1	6.56273E+11	6.56E+11	2427.065345	1.84889E-89
Residual	138	37314903827	2.7E+08
Total	139	6.93588E+11

The next table generated by MS Excel is ANOVA. The ANOVA tests the significance of association under the following hypothesis

Null Hypothesis or H0: β≠0

Alternate Hypothesis or Ha: β=0

The test statistic in this case is F which is found to be 2427.06534 and the chance of incurring a value greater or equal to 2427.064 is less than 0 approximately. It can be concluded that coefficient is greater than zero.

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-53311.13118	3819.993082	-13.9558	3.7113E-28	-60864.41692	-45757.8	-60864.4	-45757.8
X Variable 1	88.92293777	1.804982799	49.26525	1.84889E-89	85.35393894	92.49194	85.35394	92.49194

The third table tests the significance of Y intercept and slope of the line using t statistic. The tests have been conducted and obtained p values are found to be extremely smaller than α (level of significance). The regression coefficient is found to be 88.92 which mean that one unit change in X will change Y by 88.92 times keeping other things constant.

Queston#2b

The residuals depict the difference between actual values of Y for certain X value and the calculated Y values for that certain point using the linear equation. Using the “Data Analysis” tool pack, the residuals have been calculated and the first rows are shown in the picture below.

Furthermore a scatter plot has been generated using residuals and the independent variable using Data Analysis to assess the fitness of the model.

Figure 3: Residual plot

The residual plot illustrates a reasonably non random pattern as initial and the last values are positive while the middle values are negative. This u-shaped graph depicts that the simple linear model utilized is providing an unreasonable fit to the given data.

Autocorrelation refers to statistical illustration of the level of resemblance among a specified time series and a delayed account of itself over succeeding intervals of time (Golberg & Cho, 2004). Since the data employed is time series, there are much chances of autocorrelation to exist, therefore, Durbin Watson test has been conducted using MS Excel. The results obtained show positive correlation to exist amongst the residuals. The Durbin Watson test statistic is found to be equal to 0.9354.

Durbin-Watson Calculations
Sum of Squared Difference of Errors	130.0191
Sum of Squared Errors	139.0000
Durbin-Watson Statistic	0.9354

4. Hypothesis Testing

In order to ascertain the significance of association between the variables under consideration, hypothesis testing is performed using the sample correlation coefficient denoted by “r”. This will help in concluding if the sample coefficient is a significant predictor of population coefficient of correlation denoted by “ρ”.

Hypothesis

The following null and alternated hypothesis are drawn for the purpose

Null hypothesis: H0: ρ = 0

Alternate hypothesis: Ha: ρ ≠ 0

Significance Level

In order to test our quoted hypothesis, 0.05 level of significance or α has been selected.

Decision Rule

In order to reject the hypothesis, we need to have a p-value that is fairly greater than the level of significance chosen.

Test Statistic

In order to check the significance t test has been conducted using the given formula.

In the formula N shows the sample size and r is the sample’s correlation coefficient

The t statistic is found to be equal to 49.2669. Now in order to conclude we need to get the p value associated with the calculated t statistic. The p value has been found to be equal to 0.00001 fairly lesser than our significance level.

Conclusion

Since the obtained p value is lesser than the selected significance level, we cannot reject our H₀ and thus it can be concluded that that the final consumption expenditure is positively correlated with the per capita retail turnover of the whole population.

Using the regression outputs we will assess the fitness of the model.

R Square 0.946200201

Adjusted R Square 0.945810347

The R square and the R adjusted values are somewhat similar in the given case and are strongly showing that the model is a good fit to the provided data. As approximately 94% of the variations in Final Consumption expenditure is explained through this model.

References

Abs.gov.au. (2018). [Online] Available at: http://www.abs.gov.au/AUSSTATS/abs@.nsf/viewcontent?readform&view=ProductsbyCatalogue&Action=expandwithheader&Num=1 [Accessed 24 Sep. 2018].

Altman, D. G. (1980). “Statistics and ethics in medical research. VI - Presentation of results”, British Medical Journal, 281(6254), pp.1542-1544.

Golberg, M. A. & Cho, H. A. (2004). Introduction to Regression Analysis.WIT Press.