代写ECON7300: Statistical Project

ECON7300: Statistical Project Assignment, Semester 2, 2016

ECON7300: Statistical Project Assignment, Semester 2, 2016

Instructions for Dataset 2: Simple Regression Analysis (30 marks)

The growing interest in and use of the internet has forced many companies into considering ways to sell their products on the web. Therefore, such companies are interested in determining who is using the web. A statistician undertook a study to determine how education and internet use are connected. She took a random sample of 146 adults (20 years of age and older) and asked each to report the years of education they had completed and the number of hours of internet use in the previous week.

The variables in the dataset are:

Education (X, in years)

Internet (Y, in hours)

The dependent variable for your analysis is Internet.

Answer the following questions using dataset 2.

(a) Estimate a regression model using X to predict Y (state the simple linear regression equation).

(b) Interpret the meaning of the slope.

(c) Predict Y when X = 12.

(d) Compute the coefficient of determination and interpret its meaning.

(e) Compute the standard error of the estimate and interpret its meaning. Judge the magnitude of the standard error of the estimate.

(f) Perform a residual analysis (plot the residuals) and evaluate whether the assumptions of regression have been violated.

(g) Test for the slope using t test (follow all the necessary steps). Assume 5% level of significance.

(h) Test for the slope using F test (follow all the necessary steps). Assume 5% level of significance.

(i) Test for the correlation coefficient (follow all the necessary steps). Assume 5% level of significance.

(j) Compute a 95% confidence interval estimate of the mean Y for all adults when X = 12 and interpret its meaning.

(k) Compute a 95% prediction interval of Y for an individual adult when X = 12 and interpret its meaning.

ECON7300: Statistical Project Assignment, Semester 2, 2016

Instructions for Dataset 5: Multiple Regression Analysis (45 marks)

Absenteeism is a serious employment problem in most countries. Two economists

launched a research project to learn more about the problem. They randomly selected

100 organisations to participate in a one-year study. For each organization, they

recorded the average number of days absent for employee, percentage of part-time

employees, percentage of unionised employees and availability of shiftwork.

The variables in the dataset are:

Days absent (Y)

Percentage PT (X1, % of part-time employees in each organisation)

Percentage U (X2, % of unionised employees in each organisation)

Shift work (X3, availability of shift work: coded 1 if yes and 0 if no)

The dependent variable for your analysis is Days absent.

Answer the following questions using dataset 5

(a) Estimate a regression model using X1 and X2 to predict Y (state the multiple

regression equation).

(b) Interpret the meaning of the slopes.

(c) Predict Y when X1 = 15 and X2 = 40.

(d) Compute a 95% confidence interval estimate of the mean Y for all organisations

when X1 = 15 and X2 = 40 and interpret its meaning.

(e) Compute a 95% prediction interval of Y for a single organisation when X1 = 15

and X2 = 40 and interpret its meaning.

(f) Plot the residuals to test the assumptions of the regression model. Is there any

evidence of violation of the regression assumptions? Explain.

(g) Determine the variance inflation factor (VIF) for each independent variable (X1

and X2) in the model. Is there reason to suspect the existence of collinearity?

(h) At the 0.05 level of significance, determine whether each independent variable

(X1 and X2) makes a significant contribution to the regression model (use t tests

and follow all the necessary steps). On the basis of these results, indicate the

independent variables to include in the model.

(i) Test for the significance of the overall multiple regression model at 5% level of

significance.

ECON7300: Statistical Project Assignment, Semester 2, 2016

(j) Determine whether there is a significant relationship between Y and each independent variable (X1 and X2) at the 5% level of significance (hint: testing portions of the multiple regression model using the partial F test).

(k) Compute the coefficients of partial determination and interpret their meaning.

(l) Estimate a regression model using X1, X2 and X3 to predict Y (state the multiple regression equation, the regression equation for availability of shift work, the regression equation for non-availability of shift work) and interpret the coefficient for X3.

(m) Estimate a regression model using X1, X2, X3, an interaction between X1 and X2, an interaction between X1 and X3, and an interaction between X2 and X3 to predict Y.

(n) Test whether the three interactions significantly improve the regression model. Assume 5% level of significance (hint: test the joint significance of the three interaction terms using the partial F test. If you reject the null hypothesis, test the contribution of each interaction separately (using the partial F test) in order to determine which interaction terms to include in the model).

代写ECON7300: Statistical Project

代写ECON7300: Statistical Project