xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

扫码添加客服微信

扫描添加客服微信

R代写-MATH5826

时间：2021-03-06

MATH5826 Statistical Methods in Epidemiology

Lecture 2: Measures of Disease Occurrence

Jake Olivier

Term 1, 2021

1/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

2/40

Background

Epidemiology originated as the study of epidemics, yet the current focus is much broader

including the study of chronic and acute disease, mental health and injury

The primary focus in epidemiology is often on the relationship (or association) between

an exposure E and disease D

For example, an epidemiologist may observe that those with higher levels of exposure to

lead is associated with an increase in the incidence of deficient brain development,

compared to those with little exposure to lead

Prior to making comparisons for, say, different levels of exposure, we must first introduce

measures of disease occurrence. These will be used later when we introduce measures of

association.

3/40

Disease Occurrence

There are many measures of disease occurrence and the choice of which one to use

depends on the study design, the population under study and the available data

Some issues to consider are:

• threats to study validity – this could be due to bias or limitations that arise due to

the study design or method of data collection

• extraneous factors – the analysis needs to account for them to untangle differing

effects of exposure on disease

4/40

Disease Occurrence

Broadly speaking, disease occurrence can be quantified as either a ratio, proportion, rate

or odds

Each of these are measures obtained by dividing one quantity versus another (i.e.,

M = a/b)

The numerator and denominator of a ratio are separate quantities: one does not contain

the other. For example, the ratio

number of doctors working at a hospital

number of beds in the hospital

are comprised of separate quantities and can assist in determining resource allocation at

a hospital

5/40

Disease Occurrence: Proportions

For proportions, the denominator quantity (b) contains the numerator quantity (a).

That is, the denominator can be written as b = a + a′ where a′ = b − a.

A proportion is a number between 0 and 1, and proportions are often interpreted as

probabilities in epidemiology.

For example, the number of deaths in a specified time interval out of the number alive

at the start of the interval is a proportion, and can be regarded as an estimate of the

probability of dying in the interval.

6/40

Disease Occurrence: Rates

A rate is a measure of change in a quantity per unit of another quantity.

Mortality and disease rates are almost always expressed per unit of time which might be

calendar time, age, or follow-up time.

Example

Lowres et al1 screened n = 1000 patients 65 years and older for atrial fibrulation (AF),

of which there were 15 new AF cases. The sum of the ages of those screened was

t = 76302 years.

The estimated proportion of new AF diagnosis was 15/1000 = 1.5%, and the rate per

person-years was 15/76302 or 1 new AF case per 5086.8 years lived.

1Feasibility and cost-effectiveness of stroke prevention through community screening for atrial fibrillation

using iPhone ECG in pharmacies. Thrombosis & Haemostasis 2014;111:1167-76.

7/40

Disease Occurrence: Rates

From differential calculus, the instantaneous rate of change in a quantity y is dy/dx . In

epidemiology, rates are commonly expressed relative to the size of the quantity in the

numerator.

So, the rate r for the change in population y at time t relative to y(t), the population

size at time t, is

r = dydt

/

y(t)

= dyy(t)dt

For example, if the population is diminishing only due to deaths, then r is a mortality

rate, i.e., the number of deaths per person per unit of time.

8/40

Disease Occurrence: Rates

Populations change over time and rates can be estimated over a time interval

The average rate in the interval (t, t + ∆t) is

r¯ = ∆y∫ t+∆t

t y(u)du

where ∆y is the change in y from t to t + ∆t.

For example, if y(x) = `x is the population at risk at age x , ∆y = `x − `x+∆x = dx , the

number of deaths between ages x and x + ∆x , and the average rate is

dx∫ x+∆x

x `udu

the number of deaths per total time at risk.

9/40

Disease Occurrence: Odds

The odds of occurrence of an event A is the probability of occurrence relative to the

probability of non-occurrence

odds(A) = P(A)1− P(A)

The odds can also be computed by frequency of occurrence to non-occurrence of an

event

10/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

11/40

Prevalence

The number of disease cases in a population can be measured in several ways. Methods

may differ based on the time frame and whether new or existing cases are counted.

Prevalence is the number of existing cases of disease in a population at a point in time.

Two types of prevalance are:

• Point prevalence proportion: the proportion of a population with the disease at a

specified point in time

• Period prevalence proportion: the proportion of a population with the disease over

a specified period of time

For example, a researcher may be interested in the prevalance of active COVID-19 cases.

On 25 Jan 2021 in Australia, the point prevalance was 135/25,744,519, while the period

prevalance for 2020 was 28,381/25,499,884.2

2Population estimates at 25/1/2021 & 30/6/2020 12/40

Incidence

Incidence is the number of new cases of disease occurring in a specified time period in

a population.

Two types of incidence are:

• Incidence proportion: the number of new cases of disease over a period of time,

divided by the number of people at risk for the disease at the start of the period.

• Incidence rate: the number of new cases of disease over a period of time, divided by

the total time at risk (for all individuals in the population) over the period.

13/40

Prevalence and Incidence

The numerator of the point prevalence proportion includes all those who have the

disease at that date, regardless of when the disease was contracted.

So, diseases of long duration tend to have higher prevalence than those of short

duration, even if the incidence is similar.

If the incidence and the average duration of a disease are constant over time, then the

prevalance P is

P = I × D

where I is incidence and D is average duration.

14/40

Epidemiologist’s Bathtub3

The epidemiologist’s bathtub can

be useful in understanding the

difference between prevalence and

incidence

The existing water level represents

prevalence while the amount of

new water flowing into the tub is

the incidence

Note the water exiting the tub can

be either mortality or recovery

3source: https://www.publichealth.hscni.net/node/5277

15/40

Example

The following table presents some

hypothetical data on a population of five

individuals observed from t = 0 to t = 5.

The line segments represent time alive,

circles represent deaths, and crosses

represent occurrences of disease. Note the

disease is chronic in the sense that no

recovery is possible.

1

2

3

4

5

0 1 2 3 4 5

Time

Pa

tie

nt

Legend

Death

Disease

Point prevalence proportion at t = 0 0/5 = 0

Point prevalence proportion at t = 5 1/2 = 0.5

Incidence proportion from t = 0 to t = 5 3/5 = 0.6

Incidence rate from t = 0 to t = 5 3/(5 + 1 + 4 + 3 + 1) = 0.21

16/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

17/40

Review of the Delta Method

It is often useful to derive the asymptotic variances of measures of disease occurrence

and association, and this section briefly reviews the delta method for obtaining the

covariance matrix of a transformation of a parameter vector.

Let θˆ be a p × 1 vector of parameter estimates with covariance matrix var(θˆ), and let

ϕ = g(θ) be a transformation of θ to a q × 1 parameter vector. The first-order Taylor

series expansion of g(θˆ) about θ is

g(θˆ) ≈ g(θ) +

[

∂gi(θ)

∂θj

]

(θˆ − θ)

where

[

∂gi (θ)

∂θj

]

is the Jacobian matrix of g whose (i , j) element is ∂gi (θ)∂θj

18/40

Review of the Delta Method

Taking the variance of both sides of the equation

var(ϕˆ) = var(g(θˆ)) ≈

[

∂gi(θ)

∂θj

]

var(θˆ)

[

∂gi(θ)

∂θj

]T

Evaluating the derivatives at the estimated value θ = θˆ gives the estimated covariance

matrix

v̂ar(ϕˆ) = v̂ar(g(θˆ)) ≈

[

∂gi(θˆ)

∂θj

]

v̂ar(θˆ)

[

∂gi(θˆ)

∂θj

]T

For the univariate case, i.e., p = q = 1,

v̂ar(ϕˆ) ≈

(

dg(θˆ)

dθ

)2

v̂ar(θˆ)

19/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

20/40

Estimation and inference for event rates

In cohort studies, we are often interested in the risk of (possibly recurrent) events during

some period of exposure.

Different individuals may be at risk of the event for different exposure periods, but the

overall event frequency is usually summarised as a rate. That is, the number of events

occurring in a specified time period divided by the total of the exposure periods for each

individual.

21/40

Poisson Process Model

We assume that the sequence of events and the aggregate count for each individual is

generated by a Poisson process where λ(s) is the instantaneous rate of events at time s

Let N(s) denote the counting process at time s, which is the cumulative number of

events in the period (0, s]. If the random variable D represents the number of events in

(s, s + t], then the probability of d events in this interval is

P(D = d) = P (N(s + t)− N(s) = d) = e

−Λ(s,s+t)(Λ(s,s+t))d

d!

where s, t > 0 and the rate parameter

Λ(s,s+t) =

∫ s+t

s

λ(u) du

is the cumulative intensity over the interval

22/40

Poisson Process Model

If we assume constant intensity over the time period of interest, i.e., λ(u) = λ for all u,

then for d ≥ 0, t > 0 we have a homogeneous Poisson process and the number of events

D in an interval of length t is distributed as a Poisson with parameter λt

P(D = d ; t, λ) = e

−λt(λt)d

d!

If we further assume that all individuals in the population experience the same constant

intensity, then we have a doubly homogeneous Poisson process and the likelihood

function for a sample of n independent observations is

L(λ) =

n∏

j=1

e−λtj (λtj)dj

dj !

where tj and dj are the exposure time and number of events, respectively, for individual j .

This likelihood conditions on the exposure times t1, . . . , tn (i.e., tj are fixed constants)

23/40

Poisson Process Model

The log-likelihood is

logL(λ) = −λ

n∑

j=1

tj +

n∑

j=1

dj log(λ) +

n∑

j=1

dj log(tj)−

n∑

j=1

log(dj !)

and the score equation is

∂ logL(λ)

∂λ

= −

n∑

j=1

tj +

1

λ

n∑

j=1

dj

Equating the score to 0 and solving for λ gives the maximum likelihood estimator

λˆ =

∑n

j=1 dj∑n

j=1 tj

= dT

where d and T are the total number of events and exposure, respectively

24/40

Poisson Process Model

The previous ML estimator is known as the (crude) event rate. It can also be

equivalently expressed as a weighted mean rate

λˆ =

∑n

j=1 dj∑n

j=1 tj

=

∑n

j=1 tj(dj/tj)∑n

j=1 tj

=

∑n

j=1 tj rj∑n

j=1 tj

where rj = dj/tj is the event rate for person j

25/40

Poisson Process Model

The variance of the rate can be obtained as the inverse of the the observed or expected

Fisher information. The observed information is

J (λ) = −∂

2 logL(λ)

∂λ2

= d

λ2

and the expected information is I(λ) = E (D)/λ2, where D now represents the total

number of events in the sample.

Therefore, the variance estimator based on the observed information is λ2/d , and based

on the expected information it is λ2/E (D) = λ/T .

This follows because D is a Poisson random variable with E (D) = λT (exercise). Thus

for a given λ, the variance is inversely proportional to the total exposure, and also to the

(expected) number of events.

26/40

Poisson Process Model

The variance of the rate can be estimated as

v̂ar(λˆ) = λˆT =

d

T 2 =

λˆ2

d

Large sample hypothesis tests and confidence intervals can be based on this result. For

example, a 95% confidence interval for λ would be

λˆ± 1.96se(λˆ) = λˆ± 1.96λˆ/

√

d

Alternatively, since λ is positive, a more accurate approximation may be obtained by

calculating confidence intervals for log(λ) using the delta method and transforming back.

This method gives var(log λˆ) ≈ 1/d , so a 95% confidence interval for λ is

(λˆe−1.96/

√

d , λˆe+1.96/

√

d)

27/40

Example

According to the New York State Cancer Registry, there were 524 cancer deaths

amongst males aged 45-49 in New York State in 2000. The corresponding mid-year

population for this age group is 649,533, and we take this to be an approximation to

T , the total number of years of exposure for year 2000. The estimated mortality rate

is then λˆ = d/T = 524/649, 533 = 0.000807, or about 80.7 per 100,000 per year.

A 95% confidence interval for λ, assuming a normal distribution for λˆ, is

λˆ± 1.96λˆ/

√

d = 80.7± 1.96× 80.7/

√

524 = (73.8, 87.6)

Using a log transformation, a 95% confidence interval for λ is

(λˆe−1.96/

√

d , λˆe+1.96/

√

d) = (80.7e−1.96/

√

524, 80.7e+1.96/

√

524) = (74.1, 87.9)

28/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

29/40

Age-specific rates and crude rates

Since mortality and disease rates usually show considerable variation by age, we are

often interested in looking at a series of rates, one for each age or age group. These are

termed age-specific rates.

• In the above example, the age-specific rate for the 45-49 year age group is 80.7 per

100,000 per year.

• The age-specific rates for this population vary considerably, from 2.2 per 100,000

per year in the 5-9 and 10-14 year age groups, to 2585.5 in the 85+ age group.

• For similar reasons, it is also useful to estimate rates separately for males and

females.

30/40

Age-specific rates and crude rates

The crude rate is the total number of deaths (in all age groups), divided by the total

exposure.

Crude rates reflect not only the level of risk in the population, but also the age

distribution. So, comparison of crude rates amongst different populations can be

confounded by age. For example, a higher crude rate might simply reflect an older

population distribution, rather than a real difference in risk.

Valid comparisons can be made by comparing series of age-specific rates, but sometimes

a single summary measure is required that allows comparison between populations. Such

a summary measure is referred to as an age-adjusted or standardized rate.

31/40

Direct Standardization

Let λˆk = dk/tk be the age-specific rate for age group k = 1, . . . ,K , where dk is the

number of events and tk is the total exposure time for age group k.

Then, similar to the weighted mean rate, the crude rate can be written as a weighted

average of the age-specific death rates

λˆc =

∑

k tk λˆk∑

k tk

=

∑

k

tk

T λˆk

where the weights are equal to the fraction of exposure in each age group.

The idea of direct standardization is to replace the exposure proportions obtained from

the study population of interest, with the corresponding proportions obtained from some

standard population.

32/40

Direct Standardization

For example, if we wanted to compare two different populations, we would use the same

standard population for each, thereby removing effects merely due to differing age

structures.

So, the directly standardized rate is obtained by applying the study population

age-specific rates to the standard population exposure proportions

λˆdir =

∑

k

tsk

Ts

λˆk

where tsk and Ts are the exposure in age group k and total exposure, respectively, for

the standard population.

33/40

Direct Standardization

The variance of the directly standardized rate can be estimated as

v̂ar(λˆdir ) =

∑

k

( tsk

Ts

)2

var(λˆk)

=

∑

k

( tsk

Ts

)2 λˆ2k

dk

Following direct standardization of rates for two populations A and B, comparisons can

be made using the standardized rate ratio

SRR = λˆA

/

λˆB

The delta method on the log SRR ratio can be used to derive a variance estimate

v̂ar(log SRR) = v̂ar(log λˆA) + v̂ar(log λˆB)

= v̂ar(λˆA)

λˆ2A

+ v̂ar(λˆB)

λˆ2B 34/40

Indirect Standardization

In contrast to direct standardization, indirect standardization applies age-specific rates

for some standard population to the exposure proportions in the study population. This

gives an “expected” number of deaths in the study population

E =

∑

k

tkλsk

where λsk are the age-specific rates for the standard population.

The standardized mortality ratio is then calculated as the ratio of observed to expected

deaths

SMR = dE =

∑

k tk λˆk∑

k tkλsk

35/40

Indirect Standardization

An indirectly standardized rate can be obtained by multiplying the crude rate in the

standard population by the SMR, although in practice we are often primarily interested

in the SMR itself.

If we can regard the expected deaths as a constant, then the variance of the SMR can

be estimated as

var(SMR) = v̂ar(D)E 2 =

d

E 2 =

SMR2

d

Exact confidence intervals can also be constructed from the chi-square distribution[

χ22D,α/2

2E ,

χ22(D+1),1−α/2

2E

]

36/40

Example

In the US state of Michigan from 1950 to 1964, 731,177 babies were first-born to their

mothers, and of these, 412 were affected by Down’s syndrome. In the same period,

442,811 babies were the fifth-born or more to their mothers, and of these, 740 were

affected by Down’s syndrome.

The crude “rates” (strictly speaking, prevalence proportions) of Down’s syndrome are:

• 412× 100, 000/731, 177 = 56.3 per 100,000 births for first-borns, and

• 740× 100, 000/442, 811 = 167.1 per 100,000 births for fifth and later-borns.

This is not a fair comparison, however, because incidence of Down’s syndrome is known

to increase with maternal age, and mothers of fifth and later-borns will tend to be older

than mothers of first-borns. That is, maternal age is a confounder in the association

between birth order and Down’s syndrome.

37/40

Example: Down’s Syndrome

The following table shows the maternal-age-specific proportions of births and prevalence

proportions, separately for the whole state of Michigan (the standard population), and

for first-borns and fifth and later-borns (the study populations) for the period 1950-1964.

Proportion of births in age range: Age-specific rate for:

Maternal Fifth or Fifth or

age Michigan First-born later-born Michigan First-born later-born

Under 20 0.113 0.315 0.001 42.5 46.5 0.0

20-24 0.330 0.451 0.069 42.5 42.8 26.1

25-29 0.278 0.157 0.279 52.3 52.2 51.0

30-34 0.173 0.054 0.339 87.7 101.3 74.7

35-39 0.084 0.019 0.235 264.0 274.5 251.7

40+ 0.022 0.004 0.078 864.4 819.1 857.8

Crude rate 89.5 56.3 167.1

38/40

Example: Down’s Syndrome

The directly standardized rates are computed in the following table by applying the

study population prevalence proportions to the standard population birth proportions.

Direct standardized rate:

Maternal Fifth or

age First-born later-born

Under 20 5.255 0.000

20-24 14.124 8.613

25-29 14.512 14.178

30-34 17.525 12.923

35-39 23.058 21.143

40+ 18.020 18.872

Direct standardized rate 92.5 75.7

39/40

Example: Down’s Syndrome

This table illustrates the calculation of the SMRs and the indirectly standardized rates.

Expected cases:

Maternal Fifth or

age First-born later-born

Under 20 13.388 0.043

20-24 19.168 2.933

25-29 8.211 14.592

30-34 4.736 29.730

35-39 5.016 62.040

40+ 3.458 67.423

Expected cases 53.976 176.760

SMR 1.044 0.945

Indirect standardized rate 93.4 84.6

40/40

学霸联盟

Lecture 2: Measures of Disease Occurrence

Jake Olivier

Term 1, 2021

1/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

2/40

Background

Epidemiology originated as the study of epidemics, yet the current focus is much broader

including the study of chronic and acute disease, mental health and injury

The primary focus in epidemiology is often on the relationship (or association) between

an exposure E and disease D

For example, an epidemiologist may observe that those with higher levels of exposure to

lead is associated with an increase in the incidence of deficient brain development,

compared to those with little exposure to lead

Prior to making comparisons for, say, different levels of exposure, we must first introduce

measures of disease occurrence. These will be used later when we introduce measures of

association.

3/40

Disease Occurrence

There are many measures of disease occurrence and the choice of which one to use

depends on the study design, the population under study and the available data

Some issues to consider are:

• threats to study validity – this could be due to bias or limitations that arise due to

the study design or method of data collection

• extraneous factors – the analysis needs to account for them to untangle differing

effects of exposure on disease

4/40

Disease Occurrence

Broadly speaking, disease occurrence can be quantified as either a ratio, proportion, rate

or odds

Each of these are measures obtained by dividing one quantity versus another (i.e.,

M = a/b)

The numerator and denominator of a ratio are separate quantities: one does not contain

the other. For example, the ratio

number of doctors working at a hospital

number of beds in the hospital

are comprised of separate quantities and can assist in determining resource allocation at

a hospital

5/40

Disease Occurrence: Proportions

For proportions, the denominator quantity (b) contains the numerator quantity (a).

That is, the denominator can be written as b = a + a′ where a′ = b − a.

A proportion is a number between 0 and 1, and proportions are often interpreted as

probabilities in epidemiology.

For example, the number of deaths in a specified time interval out of the number alive

at the start of the interval is a proportion, and can be regarded as an estimate of the

probability of dying in the interval.

6/40

Disease Occurrence: Rates

A rate is a measure of change in a quantity per unit of another quantity.

Mortality and disease rates are almost always expressed per unit of time which might be

calendar time, age, or follow-up time.

Example

Lowres et al1 screened n = 1000 patients 65 years and older for atrial fibrulation (AF),

of which there were 15 new AF cases. The sum of the ages of those screened was

t = 76302 years.

The estimated proportion of new AF diagnosis was 15/1000 = 1.5%, and the rate per

person-years was 15/76302 or 1 new AF case per 5086.8 years lived.

1Feasibility and cost-effectiveness of stroke prevention through community screening for atrial fibrillation

using iPhone ECG in pharmacies. Thrombosis & Haemostasis 2014;111:1167-76.

7/40

Disease Occurrence: Rates

From differential calculus, the instantaneous rate of change in a quantity y is dy/dx . In

epidemiology, rates are commonly expressed relative to the size of the quantity in the

numerator.

So, the rate r for the change in population y at time t relative to y(t), the population

size at time t, is

r = dydt

/

y(t)

= dyy(t)dt

For example, if the population is diminishing only due to deaths, then r is a mortality

rate, i.e., the number of deaths per person per unit of time.

8/40

Disease Occurrence: Rates

Populations change over time and rates can be estimated over a time interval

The average rate in the interval (t, t + ∆t) is

r¯ = ∆y∫ t+∆t

t y(u)du

where ∆y is the change in y from t to t + ∆t.

For example, if y(x) = `x is the population at risk at age x , ∆y = `x − `x+∆x = dx , the

number of deaths between ages x and x + ∆x , and the average rate is

dx∫ x+∆x

x `udu

the number of deaths per total time at risk.

9/40

Disease Occurrence: Odds

The odds of occurrence of an event A is the probability of occurrence relative to the

probability of non-occurrence

odds(A) = P(A)1− P(A)

The odds can also be computed by frequency of occurrence to non-occurrence of an

event

10/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

11/40

Prevalence

The number of disease cases in a population can be measured in several ways. Methods

may differ based on the time frame and whether new or existing cases are counted.

Prevalence is the number of existing cases of disease in a population at a point in time.

Two types of prevalance are:

• Point prevalence proportion: the proportion of a population with the disease at a

specified point in time

• Period prevalence proportion: the proportion of a population with the disease over

a specified period of time

For example, a researcher may be interested in the prevalance of active COVID-19 cases.

On 25 Jan 2021 in Australia, the point prevalance was 135/25,744,519, while the period

prevalance for 2020 was 28,381/25,499,884.2

2Population estimates at 25/1/2021 & 30/6/2020 12/40

Incidence

Incidence is the number of new cases of disease occurring in a specified time period in

a population.

Two types of incidence are:

• Incidence proportion: the number of new cases of disease over a period of time,

divided by the number of people at risk for the disease at the start of the period.

• Incidence rate: the number of new cases of disease over a period of time, divided by

the total time at risk (for all individuals in the population) over the period.

13/40

Prevalence and Incidence

The numerator of the point prevalence proportion includes all those who have the

disease at that date, regardless of when the disease was contracted.

So, diseases of long duration tend to have higher prevalence than those of short

duration, even if the incidence is similar.

If the incidence and the average duration of a disease are constant over time, then the

prevalance P is

P = I × D

where I is incidence and D is average duration.

14/40

Epidemiologist’s Bathtub3

The epidemiologist’s bathtub can

be useful in understanding the

difference between prevalence and

incidence

The existing water level represents

prevalence while the amount of

new water flowing into the tub is

the incidence

Note the water exiting the tub can

be either mortality or recovery

3source: https://www.publichealth.hscni.net/node/5277

15/40

Example

The following table presents some

hypothetical data on a population of five

individuals observed from t = 0 to t = 5.

The line segments represent time alive,

circles represent deaths, and crosses

represent occurrences of disease. Note the

disease is chronic in the sense that no

recovery is possible.

1

2

3

4

5

0 1 2 3 4 5

Time

Pa

tie

nt

Legend

Death

Disease

Point prevalence proportion at t = 0 0/5 = 0

Point prevalence proportion at t = 5 1/2 = 0.5

Incidence proportion from t = 0 to t = 5 3/5 = 0.6

Incidence rate from t = 0 to t = 5 3/(5 + 1 + 4 + 3 + 1) = 0.21

16/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

17/40

Review of the Delta Method

It is often useful to derive the asymptotic variances of measures of disease occurrence

and association, and this section briefly reviews the delta method for obtaining the

covariance matrix of a transformation of a parameter vector.

Let θˆ be a p × 1 vector of parameter estimates with covariance matrix var(θˆ), and let

ϕ = g(θ) be a transformation of θ to a q × 1 parameter vector. The first-order Taylor

series expansion of g(θˆ) about θ is

g(θˆ) ≈ g(θ) +

[

∂gi(θ)

∂θj

]

(θˆ − θ)

where

[

∂gi (θ)

∂θj

]

is the Jacobian matrix of g whose (i , j) element is ∂gi (θ)∂θj

18/40

Review of the Delta Method

Taking the variance of both sides of the equation

var(ϕˆ) = var(g(θˆ)) ≈

[

∂gi(θ)

∂θj

]

var(θˆ)

[

∂gi(θ)

∂θj

]T

Evaluating the derivatives at the estimated value θ = θˆ gives the estimated covariance

matrix

v̂ar(ϕˆ) = v̂ar(g(θˆ)) ≈

[

∂gi(θˆ)

∂θj

]

v̂ar(θˆ)

[

∂gi(θˆ)

∂θj

]T

For the univariate case, i.e., p = q = 1,

v̂ar(ϕˆ) ≈

(

dg(θˆ)

dθ

)2

v̂ar(θˆ)

19/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

20/40

Estimation and inference for event rates

In cohort studies, we are often interested in the risk of (possibly recurrent) events during

some period of exposure.

Different individuals may be at risk of the event for different exposure periods, but the

overall event frequency is usually summarised as a rate. That is, the number of events

occurring in a specified time period divided by the total of the exposure periods for each

individual.

21/40

Poisson Process Model

We assume that the sequence of events and the aggregate count for each individual is

generated by a Poisson process where λ(s) is the instantaneous rate of events at time s

Let N(s) denote the counting process at time s, which is the cumulative number of

events in the period (0, s]. If the random variable D represents the number of events in

(s, s + t], then the probability of d events in this interval is

P(D = d) = P (N(s + t)− N(s) = d) = e

−Λ(s,s+t)(Λ(s,s+t))d

d!

where s, t > 0 and the rate parameter

Λ(s,s+t) =

∫ s+t

s

λ(u) du

is the cumulative intensity over the interval

22/40

Poisson Process Model

If we assume constant intensity over the time period of interest, i.e., λ(u) = λ for all u,

then for d ≥ 0, t > 0 we have a homogeneous Poisson process and the number of events

D in an interval of length t is distributed as a Poisson with parameter λt

P(D = d ; t, λ) = e

−λt(λt)d

d!

If we further assume that all individuals in the population experience the same constant

intensity, then we have a doubly homogeneous Poisson process and the likelihood

function for a sample of n independent observations is

L(λ) =

n∏

j=1

e−λtj (λtj)dj

dj !

where tj and dj are the exposure time and number of events, respectively, for individual j .

This likelihood conditions on the exposure times t1, . . . , tn (i.e., tj are fixed constants)

23/40

Poisson Process Model

The log-likelihood is

logL(λ) = −λ

n∑

j=1

tj +

n∑

j=1

dj log(λ) +

n∑

j=1

dj log(tj)−

n∑

j=1

log(dj !)

and the score equation is

∂ logL(λ)

∂λ

= −

n∑

j=1

tj +

1

λ

n∑

j=1

dj

Equating the score to 0 and solving for λ gives the maximum likelihood estimator

λˆ =

∑n

j=1 dj∑n

j=1 tj

= dT

where d and T are the total number of events and exposure, respectively

24/40

Poisson Process Model

The previous ML estimator is known as the (crude) event rate. It can also be

equivalently expressed as a weighted mean rate

λˆ =

∑n

j=1 dj∑n

j=1 tj

=

∑n

j=1 tj(dj/tj)∑n

j=1 tj

=

∑n

j=1 tj rj∑n

j=1 tj

where rj = dj/tj is the event rate for person j

25/40

Poisson Process Model

The variance of the rate can be obtained as the inverse of the the observed or expected

Fisher information. The observed information is

J (λ) = −∂

2 logL(λ)

∂λ2

= d

λ2

and the expected information is I(λ) = E (D)/λ2, where D now represents the total

number of events in the sample.

Therefore, the variance estimator based on the observed information is λ2/d , and based

on the expected information it is λ2/E (D) = λ/T .

This follows because D is a Poisson random variable with E (D) = λT (exercise). Thus

for a given λ, the variance is inversely proportional to the total exposure, and also to the

(expected) number of events.

26/40

Poisson Process Model

The variance of the rate can be estimated as

v̂ar(λˆ) = λˆT =

d

T 2 =

λˆ2

d

Large sample hypothesis tests and confidence intervals can be based on this result. For

example, a 95% confidence interval for λ would be

λˆ± 1.96se(λˆ) = λˆ± 1.96λˆ/

√

d

Alternatively, since λ is positive, a more accurate approximation may be obtained by

calculating confidence intervals for log(λ) using the delta method and transforming back.

This method gives var(log λˆ) ≈ 1/d , so a 95% confidence interval for λ is

(λˆe−1.96/

√

d , λˆe+1.96/

√

d)

27/40

Example

According to the New York State Cancer Registry, there were 524 cancer deaths

amongst males aged 45-49 in New York State in 2000. The corresponding mid-year

population for this age group is 649,533, and we take this to be an approximation to

T , the total number of years of exposure for year 2000. The estimated mortality rate

is then λˆ = d/T = 524/649, 533 = 0.000807, or about 80.7 per 100,000 per year.

A 95% confidence interval for λ, assuming a normal distribution for λˆ, is

λˆ± 1.96λˆ/

√

d = 80.7± 1.96× 80.7/

√

524 = (73.8, 87.6)

Using a log transformation, a 95% confidence interval for λ is

(λˆe−1.96/

√

d , λˆe+1.96/

√

d) = (80.7e−1.96/

√

524, 80.7e+1.96/

√

524) = (74.1, 87.9)

28/40

Background

Prevalence and Incidence

Delta Method

Estimation and inference for event rates

Standardized Rates

29/40

Age-specific rates and crude rates

Since mortality and disease rates usually show considerable variation by age, we are

often interested in looking at a series of rates, one for each age or age group. These are

termed age-specific rates.

• In the above example, the age-specific rate for the 45-49 year age group is 80.7 per

100,000 per year.

• The age-specific rates for this population vary considerably, from 2.2 per 100,000

per year in the 5-9 and 10-14 year age groups, to 2585.5 in the 85+ age group.

• For similar reasons, it is also useful to estimate rates separately for males and

females.

30/40

Age-specific rates and crude rates

The crude rate is the total number of deaths (in all age groups), divided by the total

exposure.

Crude rates reflect not only the level of risk in the population, but also the age

distribution. So, comparison of crude rates amongst different populations can be

confounded by age. For example, a higher crude rate might simply reflect an older

population distribution, rather than a real difference in risk.

Valid comparisons can be made by comparing series of age-specific rates, but sometimes

a single summary measure is required that allows comparison between populations. Such

a summary measure is referred to as an age-adjusted or standardized rate.

31/40

Direct Standardization

Let λˆk = dk/tk be the age-specific rate for age group k = 1, . . . ,K , where dk is the

number of events and tk is the total exposure time for age group k.

Then, similar to the weighted mean rate, the crude rate can be written as a weighted

average of the age-specific death rates

λˆc =

∑

k tk λˆk∑

k tk

=

∑

k

tk

T λˆk

where the weights are equal to the fraction of exposure in each age group.

The idea of direct standardization is to replace the exposure proportions obtained from

the study population of interest, with the corresponding proportions obtained from some

standard population.

32/40

Direct Standardization

For example, if we wanted to compare two different populations, we would use the same

standard population for each, thereby removing effects merely due to differing age

structures.

So, the directly standardized rate is obtained by applying the study population

age-specific rates to the standard population exposure proportions

λˆdir =

∑

k

tsk

Ts

λˆk

where tsk and Ts are the exposure in age group k and total exposure, respectively, for

the standard population.

33/40

Direct Standardization

The variance of the directly standardized rate can be estimated as

v̂ar(λˆdir ) =

∑

k

( tsk

Ts

)2

var(λˆk)

=

∑

k

( tsk

Ts

)2 λˆ2k

dk

Following direct standardization of rates for two populations A and B, comparisons can

be made using the standardized rate ratio

SRR = λˆA

/

λˆB

The delta method on the log SRR ratio can be used to derive a variance estimate

v̂ar(log SRR) = v̂ar(log λˆA) + v̂ar(log λˆB)

= v̂ar(λˆA)

λˆ2A

+ v̂ar(λˆB)

λˆ2B 34/40

Indirect Standardization

In contrast to direct standardization, indirect standardization applies age-specific rates

for some standard population to the exposure proportions in the study population. This

gives an “expected” number of deaths in the study population

E =

∑

k

tkλsk

where λsk are the age-specific rates for the standard population.

The standardized mortality ratio is then calculated as the ratio of observed to expected

deaths

SMR = dE =

∑

k tk λˆk∑

k tkλsk

35/40

Indirect Standardization

An indirectly standardized rate can be obtained by multiplying the crude rate in the

standard population by the SMR, although in practice we are often primarily interested

in the SMR itself.

If we can regard the expected deaths as a constant, then the variance of the SMR can

be estimated as

var(SMR) = v̂ar(D)E 2 =

d

E 2 =

SMR2

d

Exact confidence intervals can also be constructed from the chi-square distribution[

χ22D,α/2

2E ,

χ22(D+1),1−α/2

2E

]

36/40

Example

In the US state of Michigan from 1950 to 1964, 731,177 babies were first-born to their

mothers, and of these, 412 were affected by Down’s syndrome. In the same period,

442,811 babies were the fifth-born or more to their mothers, and of these, 740 were

affected by Down’s syndrome.

The crude “rates” (strictly speaking, prevalence proportions) of Down’s syndrome are:

• 412× 100, 000/731, 177 = 56.3 per 100,000 births for first-borns, and

• 740× 100, 000/442, 811 = 167.1 per 100,000 births for fifth and later-borns.

This is not a fair comparison, however, because incidence of Down’s syndrome is known

to increase with maternal age, and mothers of fifth and later-borns will tend to be older

than mothers of first-borns. That is, maternal age is a confounder in the association

between birth order and Down’s syndrome.

37/40

Example: Down’s Syndrome

The following table shows the maternal-age-specific proportions of births and prevalence

proportions, separately for the whole state of Michigan (the standard population), and

for first-borns and fifth and later-borns (the study populations) for the period 1950-1964.

Proportion of births in age range: Age-specific rate for:

Maternal Fifth or Fifth or

age Michigan First-born later-born Michigan First-born later-born

Under 20 0.113 0.315 0.001 42.5 46.5 0.0

20-24 0.330 0.451 0.069 42.5 42.8 26.1

25-29 0.278 0.157 0.279 52.3 52.2 51.0

30-34 0.173 0.054 0.339 87.7 101.3 74.7

35-39 0.084 0.019 0.235 264.0 274.5 251.7

40+ 0.022 0.004 0.078 864.4 819.1 857.8

Crude rate 89.5 56.3 167.1

38/40

Example: Down’s Syndrome

The directly standardized rates are computed in the following table by applying the

study population prevalence proportions to the standard population birth proportions.

Direct standardized rate:

Maternal Fifth or

age First-born later-born

Under 20 5.255 0.000

20-24 14.124 8.613

25-29 14.512 14.178

30-34 17.525 12.923

35-39 23.058 21.143

40+ 18.020 18.872

Direct standardized rate 92.5 75.7

39/40

Example: Down’s Syndrome

This table illustrates the calculation of the SMRs and the indirectly standardized rates.

Expected cases:

Maternal Fifth or

age First-born later-born

Under 20 13.388 0.043

20-24 19.168 2.933

25-29 8.211 14.592

30-34 4.736 29.730

35-39 5.016 62.040

40+ 3.458 67.423

Expected cases 53.976 176.760

SMR 1.044 0.945

Indirect standardized rate 93.4 84.6

40/40

学霸联盟

- 留学生代写
- Python代写
- Java代写
- c/c++代写
- 数据库代写
- 算法代写
- 机器学习代写
- 数据挖掘代写
- 数据分析代写
- Android代写
- html代写
- 计算机网络代写
- 操作系统代写
- 计算机体系结构代写
- R代写
- 数学代写
- 金融作业代写
- 微观经济学代写
- 会计代写
- 统计代写
- 生物代写
- 物理代写
- 机械代写
- Assignment代写
- sql数据库代写
- analysis代写
- Haskell代写
- Linux代写
- Shell代写
- Diode Ideality Factor代写
- 宏观经济学代写
- 经济代写
- 计量经济代写
- math代写
- 金融统计代写
- 经济统计代写
- 概率论代写
- 代数代写
- 工程作业代写
- Databases代写
- 逻辑代写
- JavaScript代写
- Matlab代写
- Unity代写
- BigDate大数据代写
- 汇编代写
- stat代写
- scala代写
- OpenGL代写
- CS代写
- 程序代写
- 简答代写
- Excel代写
- Logisim代写
- 代码代写
- 手写题代写
- 电子工程代写
- 判断代写
- 论文代写
- stata代写
- witness代写
- statscloud代写
- 证明代写
- 非欧几何代写
- 理论代写
- http代写
- MySQL代写
- PHP代写
- 计算代写
- 考试代写
- 博弈论代写
- 英语代写
- essay代写
- 不限代写
- lingo代写
- 线性代数代写
- 文本处理代写
- 商科代写
- visual studio代写
- 光谱分析代写
- report代写
- GCP代写
- 无代写
- 电力系统代写
- refinitiv eikon代写
- 运筹学代写
- simulink代写
- 单片机代写
- GAMS代写
- 人力资源代写
- 报告代写
- SQLAlchemy代写
- Stufio代写
- sklearn代写
- 计算机架构代写
- 贝叶斯代写
- 以太坊代写
- 计算证明代写
- prolog代写
- 交互设计代写
- mips代写
- css代写
- 云计算代写
- dafny代写
- quiz考试代写
- js代写
- 密码学代写
- ml代写
- 水利工程基础代写
- 经济管理代写
- Rmarkdown代写
- 电路代写
- 质量管理画图代写
- sas代写
- 金融数学代写
- processing代写
- 预测分析代写
- 机械力学代写
- vhdl代写
- solidworks代写
- 不涉及代写
- 计算分析代写
- Netlogo代写
- openbugs代写
- 土木代写
- 国际金融专题代写
- 离散数学代写
- openssl代写
- 化学材料代写
- eview代写
- nlp代写
- Assembly language代写
- gproms代写
- studio代写
- robot analyse代写
- pytorch代写
- 证明题代写
- latex代写
- coq代写
- 市场营销论文代写
- 人力资论文代写
- weka代写
- 英文代写
- Minitab代写
- 航空代写
- webots代写
- Advanced Management Accounting代写
- Lunix代写
- 云基础代写
- 有限状态过程代写
- aws代写
- AI代写
- 图灵机代写
- Sociology代写
- 分析代写
- 经济开发代写
- Data代写
- jupyter代写
- 通信考试代写
- 网络安全代写
- 固体力学代写
- spss代写
- 无编程代写
- react代写
- Ocaml代写
- 期货期权代写
- Scheme代写
- 数学统计代写
- 信息安全代写
- Bloomberg代写
- 残疾与创新设计代写
- 历史代写
- 理论题代写
- cpu代写
- 计量代写
- Xpress-IVE代写
- 微积分代写
- 材料学代写
- 代写
- 会计信息系统代写
- 凸优化代写
- 投资代写
- F#代写
- C#代写
- arm代写
- 伪代码代写
- 白话代写
- IC集成电路代写
- reasoning代写
- agents代写
- 精算代写
- opencl代写
- Perl代写
- 图像处理代写
- 工程电磁场代写
- 时间序列代写
- 数据结构算法代写
- 网络基础代写
- 画图代写
- Marie代写
- ASP代写
- EViews代写
- Interval Temporal Logic代写
- ccgarch代写
- rmgarch代写
- jmp代写
- 选择填空代写
- mathematics代写
- winbugs代写
- maya代写
- Directx代写
- PPT代写
- 可视化代写
- 工程材料代写
- 环境代写
- abaqus代写
- 投资组合代写
- 选择题代写
- openmp.c代写
- cuda.cu代写
- 传感器基础代写
- 区块链比特币代写
- 土壤固结代写
- 电气代写
- 电子设计代写
- 主观题代写
- 金融微积代写
- ajax代写
- Risk theory代写
- tcp代写
- tableau代写
- mylab代写
- research paper代写
- 手写代写
- 管理代写
- paper代写
- 毕设代写
- 衍生品代写
- 学术论文代写
- 计算画图代写
- SPIM汇编代写
- 演讲稿代写
- 金融实证代写
- 环境化学代写
- 通信代写
- 股权市场代写
- 计算机逻辑代写
- Microsoft Visio代写
- 业务流程管理代写
- Spark代写
- USYD代写
- 数值分析代写
- 有限元代写
- 抽代代写
- 不限定代写
- IOS代写
- scikit-learn代写
- ts angular代写
- sml代写
- 管理决策分析代写
- vba代写
- 墨大代写
- erlang代写
- Azure代写
- 粒子物理代写
- 编译器代写
- socket代写
- 商业分析代写
- 财务报表分析代写
- Machine Learning代写
- 国际贸易代写
- code代写
- 流体力学代写
- 辅导代写
- 设计代写
- marketing代写
- web代写
- 计算机代写
- verilog代写
- 心理学代写
- 线性回归代写
- 高级数据分析代写
- clingo代写
- Mplab代写
- coventorware代写
- creo代写
- nosql代写
- 供应链代写
- uml代写
- 数字业务技术代写
- 数字业务管理代写
- 结构分析代写
- tf-idf代写
- 地理代写
- financial modeling代写
- quantlib代写
- 电力电子元件代写
- atenda 2D代写
- 宏观代写
- 媒体代写
- 政治代写
- 化学代写
- 随机过程代写
- self attension算法代写
- arm assembly代写
- wireshark代写
- openCV代写
- Uncertainty Quantificatio代写
- prolong代写
- IPYthon代写
- Digital system design 代写
- julia代写
- Advanced Geotechnical Engineering代写
- 回答问题代写
- junit代写
- solidty代写
- maple代写
- 光电技术代写
- 网页代写
- 网络分析代写
- ENVI代写
- gimp代写
- sfml代写
- 社会学代写
- simulationX solidwork代写
- unity 3D代写
- ansys代写
- react native代写
- Alloy代写
- Applied Matrix代写
- JMP PRO代写
- 微观代写
- 人类健康代写
- 市场代写
- proposal代写
- 软件代写
- 信息检索代写
- 商法代写
- 信号代写
- pycharm代写
- 金融风险管理代写
- 数据可视化代写
- fashion代写
- 加拿大代写
- 经济学代写
- Behavioural Finance代写
- cytoscape代写
- 推荐代写
- 金融经济代写
- optimization代写
- alteryxy代写
- tabluea代写
- sas viya代写
- ads代写
- 实时系统代写
- 药剂学代写
- os代写
- Mathematica代写
- Xcode代写
- Swift代写
- rattle代写
- 人工智能代写
- 流体代写
- 结构力学代写
- Communications代写
- 动物学代写
- 问答代写
- MiKTEX代写
- 图论代写
- 数据科学代写
- 计算机安全代写
- 日本历史代写
- gis代写
- rs代写
- 语言代写
- 电学代写
- flutter代写
- drat代写
- 澳洲代写
- 医药代写
- ox代写
- 营销代写
- pddl代写
- 工程项目代写