r Studio代写-BU52018
BU52018 – Applied Business Statistics
Workshop 5
1 Introduction
This workshop contains a problem that will help you practice with specifying and estimating
linear regression models, as well as interpreting the results from such models. Your work on
this workshop will be assessed. Please submit your report using the Turntin as explained
during the lectures and on My Dundee under ‘Workshop 5’ no later than 11,45am, Friday the
5th of March.
The dataset associated with this workshop is available on the module’s website
in the Excel spreadsheet file named “NL FoodExpenditure.xlsx”.
A short guide to get you started with RStudio is available on the module’s
website under ‘Workshop 4’
2 Problem Statement and Data
The amount of money households spend on food products is an important indicator of the cost
of living in a region or country and, not surprisingly, policy makers pay close attention to this
measure. Additionally, expenditure on food products provides indirect information on food
consumption and nutrition. Of course, depending on the prices of food and non-food products,
as well as on socio-demographic characteristics, different households may spend considerably
different shares of their income on food products.
Your objective for this workshop is to construct and estimate a linear model that can be
used to uncover the effects of prices and socio-demographic characteristics on consumption
patterns, especially in relation to expenditure on food products.
The data in NL FoodExpenditure.xls is part of the dataset used by Adang and Melenberg
(1995) and it contains information on a set of variables at the household level, for 45 households
located in the Netherlands, each one of these households observed for a period of 42 months,
from April 1984 until September 1987. The variables included in the dataset are described in
table on the following page.
3 Assignment
You are asked to develop a statistical model that can be used to explain how expenditure on
food products is determined as a function of prices and household characteristics. You are
free to use any specification you find appropriate (for example you could model expenditure at
the household level or expenditure per household member, you can choose which independent
variables you include in the model, whether you include squared independent variables, specify
the model in logarithm(s) or not, etc.) but you should always provide a justification for your
choices. That is, before you run any models you need to explain why each variable you include
in the model is important for the analysis and provide a rationale for its positive or negative
Variable Name Description
HouseholID: A unique number in the dataset that identifies the household to
which the data correspond
Month: The month of observation: 1 is April 1984, 2 is May 1984, . . . ,
42 is September 1987
FoodExp: Amount of money spent by the household during the relevant
month on food products (Dutch Guilders)
OtherExp: Amount of money spent by the household during the relevant
month on products other than food (Dutch Guilders)
FoodPrice: A price index for food products, April 1984=100
OtherPrice: A price index for products other than food, April 1984=100
Heads: Number of heads in the household (not children)
Children 0 6: Number of children in the household between 0 and 6 years old
Children 7 11: Number of children in the household between 7 and 11 years old
Children 12 17: Number of children in the household between 12 and 17 years
Children 18: Number of children in the household older than 18 years old
(these are usually children of the household head(s) who still
live with their parents)
HSize: Household size (total number of members)
Province: A variable with codes from 1 to 13, indicating the province in
which the household lives:
– 1: Groningen
– 2: Friesland
– 3: Drenthe
– 4: Overijssel
– 5: Gelderland
– 6: Utrecht
– 7: Noord Holland
– 8: Zuid Holland
– 9: Zeeland
– 10: Noord Brabant
– 11: Limburg
– 12: Amsterdam, Rotter-
dam, The Hague
– 13: Flevoland
Class: Social class of the household coded as:
– 1: lower class
– 2: lower middle class
– 3: middle class
– 4: upper middle class
– 5: upper class
Urbanization: A variable with codes from 1 to 13, with the values of the vari-
able increasing with the population density of the area in which
the household lives
effect on the dependent variable. Also, briefly mention why you do not include some of the
variables available in the dataset. That is, I am asking you to use the confirmatory approach
when you are building your model.
After estimating a model you should check for any problems with the data or the model
(multicolinearity, non-linearity, etc.). Once you choose the appropriate specification, present
your model in standard form (you may also include the estimation results in a table as presented
by the software you use for the estimation) and interpret the values of your estimates and
comment on their statistical significance. Also interpret the R2 and the F statistic of your
Please keep the report short (3-5 pages). Do not include in it any tables or figures that
you do not interpret in the text.1 That is, include only results that can help you make a point
1I hate that I have to do this, but because I have seen very extreme cases in the past, where students simply
present a few tables and figures and let me do the interpretation of the results, I will subtract 5 out 100 points
against or in favour of a model and the parameter estimates that you interpret. If you run
multiple models before you decide which one is the best, there is no need to include complete
sets of results of the intermediate models; a few sentences on what models you tried and why
you decided to discard them should be enough.
Adang, P. and Melenberg, B. (1995). Nonnegativity constraints and intratemporal uncertainty
in a multi-good life-cycle model. Journal of Applied Econometrics, 10(1):1–15.
for every table or figure that is included in the report and not discussed/interpreted in the text.