Computer applications assignment problem
COMPUTER APPLICATIONS ASSIGNMENT PROBLEM
Question# 01
The data has been extracted using the provided link for the variables under consideration. The data source is Australian Bureau of Statistics. Screen Shots of first and last ten rows of data is as follows (Abs.gov.au, 2018).
Question#02 a
Using “DATA ANALYSIS” tool pack of MS Excel, Descriptive statistics summary output has been generated and is listed below.
Retail turnover per capita - Current Prices ; Total (State) ; |
FINAL CONSUMPTION EXPENDITURE: Current prices ; | ||
Mean |
1971.332143 |
Mean |
121985.5143 |
Standard Error |
65.3066091 |
Standard Error |
5970.069937 |
Median |
1878.8 |
Median |
102630 |
Mode |
#N/A |
Mode |
#N/A |
Standard Deviation |
772.7182196 |
Standard Deviation |
70638.82012 |
Sample Variance |
597093.4469 |
Sample Variance |
4989842908 |
Kurtosis |
-1.224812391 |
Kurtosis |
-1.026668164 |
Skewness |
0.152557367 |
Skewness |
0.519449274 |
Range |
2809.3 |
Range |
241940 |
Minimum |
733.6 |
Minimum |
29251 |
Maximum |
3542.9 |
Maximum |
271191 |
Sum |
275986.5 |
Sum |
17077972 |
Count |
140 |
Count |
140 |
Confidence Level (95.0%) |
129.1227708 |
Confidence Level (95.0%) |
11803.88912 |
According to the descriptive statistics summary generated through MS Office, the measures of central tendency for the provided data can be explained as the arithmetic mean for final consumption expenditures is found to be equal to 121985.5143 $ while the Mean Retail turnover per capita is found to be equal to 1971.332143 $. The median is found to be equal to $102630 final consumption expense while it is found to be equal to $ 1878.8 for per capita retail turn over. Data is found to be positively skewed. The variation of data as shown by output is found to be very high for Final Consumption Expenditure while relatively low for Per Capita Retail Turn Over showing that there is high diversion in data points from the average value obtained. This can also be seen through Histogram.
Figure 1: Histogram Final Consumption Expenditure
This is the Histogram for Final Consumption Expenditure showing skewness of the data while the mean value is found to be in the smallest class of the distribution showing that the data is highly dispersed from the average calculated (Altman, 1980).
Figure 2: Histogram Retail turnover per Capita
It can be seen that the data is highly dispersed and is positively skewed while the variation from mean is also evident to be high as the mean calculated in the fourth class while the data is highly dispersed around this value.
Question#02 b
In order to see the association between the variables under consideration, a scatter plot has been generated using MS Excel. It is given below.
The relationship between the considered variable is assessed while considering the outcome impact that can be generated if Per Capita Retail Turn Over is changed therefore it is considered to be the Independent Variable or “X” while the “Total Consumption Expenditure” is considered to be the Dependent Variable or “Y”. The Scatter plot is drawn to assess the nature of association which is found to be positive as an increase in Retail Turnover results in an increase in Total Consumption Expenditures. Here the “Scatter Plot” has provided the estimated model for the association of variables.
This equation also shows the β to be positive further reinforcing that both variables are positively correlated.
Question 3a
In order to perform simple linear regression, “Data Analysis” tool has been employed of MS Excel. The output for regression is as follows.
Regression Statistics | |
Multiple R |
0.972728226 |
R Square |
0.946200201 |
Adjusted R Square |
0.945810347 |
Standard Error |
16443.77857 |
Observations |
140 |
This is the first table generated and it shows the correlation coefficient termed ere as “Multiple R” which is found to be equal to 0.9727 showing a highly positive relationship between the variables under study. The coefficient of determination represented in the output as R2 (R squared) is found to be 0.9462 depicting that the model is a good fit and almost 95% values fit in the model and almost 95% variations in Final Consumption are caused by the Per Capita Retail Turn Over.
ANOVA | |||||
|
df |
SS |
MS |
F |
Significance F |
Regression |
1 |
6.56273E+11 |
6.56E+11 |
2427.065345 |
1.84889E-89 |
Residual |
138 |
37314903827 |
2.7E+08 | ||
Total |
139 |
6.93588E+11 |
The next table generated by MS Excel is ANOVA. The ANOVA tests the significance of association under the following hypothesis
Null Hypothesis or H0: β≠0
Alternate Hypothesis or Ha: β=0
The test statistic in this case is F which is found to be 2427.06534 and the chance of incurring a value greater or equal to 2427.064 is less than 0 approximately. It can be concluded that coefficient is greater than zero.
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% | |
Intercept |
-53311.13118 |
3819.993082 |
-13.9558 |
3.7113E-28 |
-60864.41692 |
-45757.8 |
-60864.4 |
-45757.8 |
X Variable 1 |
88.92293777 |
1.804982799 |
49.26525 |
1.84889E-89 |
85.35393894 |
92.49194 |
85.35394 |
92.49194 |
The third table tests the significance of Y intercept and slope of the line using t statistic. The tests have been conducted and obtained p values are found to be extremely smaller than α (level of significance). The regression coefficient is found to be 88.92 which mean that one unit change in X will change Y by 88.92 times keeping other things constant.
Queston#2b
The residuals depict the difference between actual values of Y for certain X value and the calculated Y values for that certain point using the linear equation. Using the “Data Analysis” tool pack, the residuals have been calculated and the first rows are shown in the picture below.
Furthermore a scatter plot has been generated using residuals and the independent variable using Data Analysis to assess the fitness of the model.
Figure 3: Residual plot
The residual plot illustrates a reasonably non random pattern as initial and the last values are positive while the middle values are negative. This u-shaped graph depicts that the simple linear model utilized is providing an unreasonable fit to the given data.
Autocorrelation refers to statistical illustration of the level of resemblance among a specified time series and a delayed account of itself over succeeding intervals of time (Golberg & Cho, 2004). Since the data employed is time series, there are much chances of autocorrelation to exist, therefore, Durbin Watson test has been conducted using MS Excel. The results obtained show positive correlation to exist amongst the residuals. The Durbin Watson test statistic is found to be equal to 0.9354.
Durbin-Watson Calculations | |
Sum of Squared Difference of Errors |
130.0191 |
Sum of Squared Errors |
139.0000 |
Durbin-Watson Statistic |
0.9354 |
4. Hypothesis Testing
In order to ascertain the significance of association between the variables under consideration, hypothesis testing is performed using the sample correlation coefficient denoted by “r”. This will help in concluding if the sample coefficient is a significant predictor of population coefficient of correlation denoted by “ρ”.
Hypothesis
The following null and alternated hypothesis are drawn for the purpose
Null hypothesis: H0: ρ = 0
Alternate hypothesis: Ha: ρ ≠ 0
Significance Level
In order to test our quoted hypothesis, 0.05 level of significance or α has been selected.
Decision Rule
In order to reject the hypothesis, we need to have a p-value that is fairly greater than the level of significance chosen.
Test Statistic
In order to check the significance t test has been conducted using the given formula.
In the formula N shows the sample size and r is the sample’s correlation coefficient
The t statistic is found to be equal to 49.2669. Now in order to conclude we need to get the p value associated with the calculated t statistic. The p value has been found to be equal to 0.00001 fairly lesser than our significance level.
Conclusion
Since the obtained p value is lesser than the selected significance level, we cannot reject our H0 and thus it can be concluded that that the final consumption expenditure is positively correlated with the per capita retail turnover of the whole population.
Using the regression outputs we will assess the fitness of the model.
R Square 0.946200201
Adjusted R Square 0.945810347
The R square and the R adjusted values are somewhat similar in the given case and are strongly showing that the model is a good fit to the provided data. As approximately 94% of the variations in Final Consumption expenditure is explained through this model.
References
Abs.gov.au. (2018). [Online] Available at: http://www.abs.gov.au/AUSSTATS/abs@.nsf/viewcontent?readform&view=ProductsbyCatalogue&Action=expandwithheader&Num=1 [Accessed 24 Sep. 2018].
Altman, D. G. (1980). “Statistics and ethics in medical research. VI - Presentation of results”, British Medical Journal, 281(6254), pp.1542-1544.
Golberg, M. A. & Cho, H. A. (2004). Introduction to Regression Analysis.WIT Press.