STAT6086- stat代写-Assignment 20

时间：2023-01-07

STAT6086 Sampling Techniques

Assignment 2022-23

You must submit one electronic copy of your report in PDF by 11.59pm on Tuesday 10 th January

2023. You must submit your electronic copy (a single file) via the STAT6093 Blackboard website using

TurnItIn (in the Assignments folder). Your Student ID Number must appears in your electronic copy.

Make sure that your assignment fits in a single PDF document. A scanned handwritten document is not

allowed. Note that the file has to be smaller than 10 MB. If your Word file is larger than this, try

converting it to a readable PDF or save your images (graphs, plots, …) in JPEG (instead of BMP).

It is the policy of the Department of Social Statistics that courseworks should be anonymous, therefore

only your Student ID Number appears in your Word or PDF document. To maintain anonymity please

do not put your name on any part of your submission. You must put your Student ID Number on the

first page of your coursework.

Note that it is not acceptable that you read and gain ideas for your coursework from another student’s

finished work. It is very important that you read carefully the Section “Academic Integrity and

Referencing” from the module outline (available on blackboard).

Make sure that you have 3 sections called Task 1, Task 2, Task 3 and Task 4. Each subsections should

be also clearly labelled: 1a), 1b),...2a), 2b), 2c),...

The maximum number of words is 6000.

Information about coursework submission, penalty for late submission, policy for over-length work,

procedure for coursework extensions, feedback and academic integrity and referencing can be found in

module outline (available on blackboard). It is very important that you read carefully the module

outline.

ASSIGNMENT

The target population consists of 1653 farms in Australia (file “OzFarm_Frame.xls” on blackboard).

For each farm, you have: (i) ID for each farm, (ii) variable STATE, (iii) variable ZONE (iv) variable

REGION (v) variable INDUSTRY and (vi) variable DSE. The description of these variables are

STATE:

1 New South Wales

2 Victoria

3 Queensland

4 South Australia

5 Western Australia

6 Tasmania

7 Northern Territory

ZONE:

1 Pastoral zone (inland)

2 Wheat-sheep zone (hinterland)

3 High rainfall zone (coastal)

REGION: Subdivision of State x Zone indicating a more homogeneous (in terms of climate, soil type

etc.) farming area within a State. Three digits code, with first digit = state, second digit = zone and third

digit denoting region.

INDUSTRY:

1 crops specialist farm

2 mixed livestock/crops farm

3 sheep farm

4 beef farm

5 sheep-beef farm

DSE: A measure of size of a farm in terms of its productive capacity. DSE stands for "Dry Sheep

Equivalent" and is a linear combination of the reported numbers of sheep and beef cattle and hectares of

crops area reported by the farm at the previous Agricultural Census.

TASK 1: (30%)

For this task, you need to use the size variable DSE to create strata.

1a) Create 4 strata using two different methods:

(i) the Dalenius and Hodges method (with classes of size 5000),

(ii) the cum( ) rule,

where the variable is the variable DSE. For each methods, present the details of you calculation and

any analytic expressions needed.

[10%]

1b) Suppose you want to select a sample of size . What would be the optimal allocation

(according to the variable DSE) for the 2 methods of stratification (i) and (ii) described in 1a)? Provide

the details of your calculation and any analytic expressions needed. [5%]

1c) Compute the variances of the mean of the variable DSE under the 2 methods of stratification (i) and

(ii), when and under optimal allocation. Provide the formulae and details of your calculation.

Which stratification method would you recommend? [5%]

1d) What would be minimal sample size needed to achieve a coefficient of variation (CV) of 5% for

the mean of the variable DSE, for the stratification obtained with the cum( ) rule and optimal

allocation? What would be the minimal sample size if you use proportional allocation instead of optimal

allocation? [10%]

TASK 2: (40%)

A stratified sample of units has been selected from the 1653 farms in Australia. The

stratification is given by the variable ZONE (3 strata). The sample data can be found in the file

“OzFarm_Sample.xls” on blackboard. You will see that the sample data contains additional variables:

TCC Total Cash Costs of farm over financial year

TCR Total Cash Receipts of farm over financial year (A$)

EQUITY Value of farm assets less farm debt at end of financial year (A$)

DEBT Farm debt at end of financial year (A$)

2a) Which allocation has been used? Explain your answer. [3%]

2b) Estimate the population mean of TCC. Provide the variances estimates and 95% confidence

intervals. Provide the formulae used and the details of you calculation. [5%]

2c) Estimate the population proportion of farms with DEBT < EQUITY. Provide the variances estimates

and 95% confidence intervals. Provide the formulae used and the details of you calculation.

[5%]

2d) Estimate the population mean of TCC using the separate ratio estimator, with DSE as auxiliary

variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and

the details of you calculation. [7%]

2e) Estimate the population mean of TCC using the combine ratio estimator, with DSE as auxiliary

variable. Provide the variances estimates and 95% confidence intervals. Provide the formulae used and

the details of you calculation. Compare your results with 2d). [10%]

2f) Estimate the population domain mean of TCR for the 5 types of INDUSTRY. Explain why you

should use the combine ratio estimator. Compute the variance estimates and 95% confidence intervals.

Provide the formulae used and the details of you calculation, for the first industry. Your final estimates,

variance estimates and confidence intervals should be given in a table. [10%]

TASK 3 (15%)

The sample dataset “Labor.xls” contains the following variables:

Cluster: cluster number

Person: person number

age: age of person

agecat: age category

1: 19 years and under

2: 20-24

3: 25-34

4: 35-64

5: 65 years and over

race: 1 for non-black and 2 for black

sex: 1 for male and 2 for female

HourPerWk: usual number of hours worked per week

Wkly Wage: usual amount of weekly wages (in 1976 US$)

We suppose that these sample data have been selected with a two-stage sampling design. For both

stages, simple random sampling has been used. The file “ClusterSize.xls” contains the (population)

sizes of the cluster. We suppose that we have 2 000 000 individuals in the population, and that we have

30 000 clusters in the population. An electronic copy of “Labor.xls” and “ClusterSize.xls” is

available on the module blackboard site.

3a) Estimate the population mean of weekly wage (per individuals). [5%]

3b) Compute the 95% confidence interval of the estimate found in 3a). [10%]

For 3a) and 3b): Provide details of your calculation. You should describe and justify the approach you

used. You should provide the analytic expressions of the estimator and variance estimator used. You

should also describe the key steps of your calculation.

TASK 4 (15%)

Suppose that we use a sampling design without replacement to select a sample of size from a

population of size N. Let and denotes the first and

second-order inclusion probabilities of the sampling design used. We suppose that and

for all and . Let denote the value of a variable of interest for the individual of the sample .

Consider the following estimator:

For which population parameter is this estimator an unbiased estimator? Justify your answer.

[15%]

Dr. Yves Berger December 2022

