统计数学report代写-ST404
时间:2021-02-01
Communities and Crime Data
ST404
January 2021
This document describes the Communities and Crime dataset which will be used for the
first two assignments. These data are taken from the dataset available on the UCI web site
1 maintained by Blake and Merz(1998)1 The original creator and donor of the full dataset is
Redmond(2011)2 and the full data set can be found at UCI3
The original dataset amalgamates the US 1990 Census together with data from a law
enforcement survey and crime data from the FBI. However, a smaller number of variables
have been extracted from these data. The details of the variables are given in the table
below.
The data are available on Moodle as data file USACrime.Rda which you can load into R
using the load() function. This will create a data frame USACrime in your R workspace.
The outcomes of interest in these data are the proportion of violent crimes per 100K
population (violentPerPop) and the number of non-violent crimes per 100K population (non-
ViolPerPop).
The original dataset had 2215 observations with 147 explanatory variables. In the version
of the data that we present to you we have reduced the number of potential explanatory vari-
ables to 22. Each observation represents one community, and the data refer to demographic,
educational and financial aspects of the community. In addition, the state and census district
within the USA that the community belongs to is also recorded.
• violentPerPop total number of violent crimes per 100K popuation
• nonViolPerPop total number of non-violent crimes per 100K popuation
• State US state (by 2 letter postal abbreviation) (factor)
• region Classification of states into census regions (factor)
• pctUrban percentage of people living in areas classified as urban
• pctWdiv percentage of households with investment / rent income
• medIncome median household income
• pctLowEdu percentage of people 25 and over with less than a 9th grade education
1Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases http://www.ics.
uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and
Computer Science.
2Michael Redmond (2009); Computer Science; La Salle University; Philadelphia, PA, 19141, USA
3https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized
1
• PctNotHSGrad percentage of people 25 and over that are not high school graduates
• PctCollGrad percentage of people 25 and over with a bachelors degree or higher educa-
tion
• PctUnemployed percentage of people 16 and over, in the labor force, and unemployed
• PctEmploy percentage of people 16 and over who are employed
• PctKids2Par percentage of kids in family housing with two parents
• PctKidsBornNeverMar percentage of kids born to never married
• PctHousOccup percent of housing occupied
• pctHousOwnerOccup percent of households owner occupied
• PctVacantBoarded percent of vacant housing that is boarded up
• PctVacMore6Mos percent of vacant housing that has been vacant more than 6 months
• ownHousMed owner occupied housing - median value
• ownHousQrange owner occupied housing - difference between upper quartile and lower
quartile values
• rentMed rental housing - median rent
• RentQrange rental housing - difference between upper quartile and lower quartile rent
• PopDensity population density in persons per square mile
• PctForeignBorn percent of people foreign born
2























































学霸联盟


essay、essay代写