EC 507-计量经济代写|学霸联盟

EC 507-计量经济代写

时间：2022-12-09

Introduction to Stata
EC 507 Statistics for Economics
Fall 2017*

I. Interface
II. 4 components of your program
III. Practice program
IV. Other basic commands and syntax
V. Stata commands for Descriptive Statistics
VI. Stata command syntax

Commands are in bold italics. So, if you want more details, you can look up that
command name in the Stata manuals or Stata Help on the tool bar.

I. Interface
The window labeled Command is where you type your commands. Stata then shows the
results in the larger window immediately above, called appropriately enough Results.
Your command is added to a list in the window labeled Review on the left, so you can
keep track of the commands you have used. The window labeled Variables, on the top

* This handout is a combination of its original version composed by Professor Ivan
Fernandez-Val and some extract from Germán Rodríguez’s online Stata Tutorial
(http://data.princeton.edu/stata). It is modified by Shuang Wang (shuangw@bu.edu) and
Xiaoxi Zhao (xiaoxiz@bu.edu).
2
right, lists the variables in your dataset. The Properties window immediately below that,
introduced in version 12, displays properties of your variables and dataset.

II. 4 Components of Your Stata Program

a. Data files
i. Stata format dataset is saved in files which end with .dta.
ii. Let’s say that the name of our data set is “filename.dta”. In your
program (.do file), you will tell Stata to use filename.dta by typing

use filename, clear

“clear” tells Stata to erase the previous dataset.
iii. Stata also reads non-.dta formats with infile (text files) or insheet
(Excel files) commands.
iv. You can create also Stata format datasets using the data editor and
save them

edit
introduce data and change name of the variables (double click in
the names)
exit data editor
save name_dataset, replace

“replace” tells Stata to write over the previous dta file. If you have
a dta file with this name and do not add replace, you will get an
error message.

b. Source codes
i. Source code are saved in do-files in Stata. They allow you to save
your commands for later use.
ii. You can write your programs in WordPad, the Stata program
editor, or your favorite text editor. Just be sure to save your files
with the .do extension.
iii. You can also type and run commands once at a time in the Stata
command window. This is a good way to try commands and see
what they do. However, for problem sets, you will have to
write .do files.
iv. If you have a .do file written, you can run it either by clicking the
do button or by typing the following in the Stata command window:

do filename.do, clear

3
c. Log files
i. Log files are output files. If you are running a lot of commands at
once, it is also a good way to review your program to see what
each command did.
ii. At the beginning of your .do file, type

log using filename.log, replace

“replace” tells Stata to write over the previous log file when you
run the program anew.
iii. At the end of the .do file, type

log close

iv. You can open your .log file with WordPad or your favorite text
editor.

d. Help files
i. PDF documentation is available. You can get it by clicking the tab
“help”
ii. To get help files for a specific command, type
help command

III. Practice Program

a. Let’s run a practice program and then discuss the Syntax
i. The dataset we are using is called sample.dta
ii. It is a dataset of GDP and population indicators for different
countries at several years.
iii. The variable name is in bold. Countries are identified by numerical
IDs.

country GDP population year
1 4.65e+11 1.48e+08 1990
1 4.08e+11 1.50e+08 1991
1 3.91e+11 1.53e+08 1992
1 4.38e+11 1.55e+08 1993
1 5.46e+11 1.57e+08 1994
1 7.04e+11 1.59e+08 1995

iv. In the data set above it is important to note the difference between
observations and variables. Each observation is a country-year,
with many variables. It is very important to be clear about the
structure of the dataset when you work with it.

4
b. The text enclosed in /* */ is to document what each command means. This
will help you remember what you were trying to do when you come back
to a program after not seeing it for a while

clear
/*this clears the memory so that a new dataset may be inputted*/

capture log close
/*this closes any log files that are still open*/

cd c:/z/EC507/Spring2008/Stata_practice
/*choose the path of the folder where you have the data and you want to
save the results*/

log using practice.log, replace
/*this opens my log file*/

use sample, clear
/*this tells Stata to use the dataset called “sample”*/

describe
/*this lists all the variable names and their labels*/

list
/*this lists all the observations (do not use it if you have a large data set)*/

sum
/*this gives basic summary statistics for all the variables. You can also
type “sum GDP” so that only statistics for those two variables are
produced */

sum, detail
/* produces additional statistics including skewness, kurtosis, and various
percentiles.*/

tab country
/* this lists the Frequency Table for country */

tab1 country year
/* tab1 does what tab does for more than one variables*/

sort country
/*sort the data according to country. The data must be sorted before you
can use the by command*/

by country: sum GDP population
5
/* this gives the mean and std deviation for the GDP and population of
each country.*/

gen GDPPC=GDP/population
/*this generates a new variable called GDPPC that is the GDP per
capita*/

label var GDPPC “GDP per capita”
/*creates an explanatory data label*/

gen poor=0
/*creates a new variable called poor and assigns it a 0 value*/

replace poor=1 if GDPPC<400
/*assigns the value 1 to poor for every observation where GDP per
capita<400*/
/* “poor” is a dichotomic or dummy variable*/

sum GDPPC poor

twoway (scatter GDP population), title(Scatter Diagram of GDP and
Population)
/* Plots a Scatter Diagram for GDP and population */

graph save scatter_gdp_pop,replace
/* Saves the Scatter Diagram in the file scatter_gdp_pop. This file can be
opened and printed from Stata (File + Open Graph) */

corr GDP population
/* computes the correlation between GDP and population*/

corr GDP population, cov
/*computes the variances and covariance*/

log close
/*closes the log file*/

c. Open your log file in WordPad to view the results.
d. As you can tell, Stata does not need any notation to end each command
line or to end the program.

IV. Other Basic Commands and Syntax

Stata has a very easy to use help section that you access from the toolbar. For
more details about these commands, just type the command in the search box.

6
Now we consider two useful commands to select observations and variables: keep
and drop. When trying these commands, keep in mind that: keep (drop) is
different from keep (drop) if (in) because observations are different from
variables. Open the data editor to look at the data after each command to see how
each command works.

keep GDP population
/* throws out all the variables except GDP and population for all observations*/

use sample, clear
/*reads in the full original dataset */

keep if GDP>10000
/*throws out observations that have GDP<=10000*/

use sample, clear
/*reads in the full original dataset */

keep in 1/10
/* keeps the first 10 observations */

use sample, clear
/*reads in the full original dataset */

drop GDP
/*drops the variable GDP for all observations*/

use sample, clear
/*reads in the full original dataset */

drop if population <=900
/*drops the observations which have population<=900*/

use sample, clear
/*reads in the full original dataset */

drop in 1/10
/* drops the first 10 observations */

If and in can also be used to modify other commands in similar way

use sample, clear
/* reads in the full original dataset */

sum GDP if country == 1
/* Mean and Std. Deviation for country coded as 1 */
7

sum GDP in 11/20
/* Mean and Std. Deviation for the observations 11 to 20 */

IV. Stata Commands for Descriptive Statistics

tab varname
tab1 varlist
/* Frequency table for a variable or for a list of variables */

sort varname
by varname: egen afvarname = count(varname)
/* Absolute frequency for the nominal or ordinal variable varname */

gen rfvarname = afvarname/_N
/* Relative frequency for varname */

twoway (bar afvarname varname), yscale(range(0 )) title(Bar Graph of Varname)
graph save graphname, replace
/* Absolute Frequency Bar graph for Varname */

twoway (bar rfvarname varname), yscale(range(0 1)) title(Bar Graph of Varname)
graph save graphname, replace
/* Relative Frequency Bar graph for Varname */

twoway (dropline afvarname varname), yscale(range(0 )) title(Line Graph of
Varname)
graph save graphname, replace
/* Absolute Frequency Line graph for Varname */

twoway (dropline rfvarname varname), yscale(range(0 1)) title(Line Graph of
Varname)
graph save graphname, replace
/* Relative Frequency Line graph for Varname */

graph pie, over(varname) title(Pie Chart of Varname)
graph save graphname, replace
/* Pie Chart for Varname */

sort varname
twoway (connected afvarname varname), yscale(range(0 )) title(Frequency
Polygon of Varname)
graph save graphname, replace
/* Absolute Frequency Polygon for Varname */

8
sort varname
twoway (connected rfvarname varname), yscale(range(0 1)) title(Frequency
Polygon of Varname)
graph save graphname, replace
/* Relative Frequency Polygon for Varname */

cumul varname, gen(crfvarname) eq
/* Cummulative Relative Frequencies for Varname */

sort varname
twoway (connected crfvarname varname), yscale(range(0 1)) title(Ogive of
Varname)
graph save graphname, replace
/* Ogive for Varname */

histogram varname, bin(#) width(#) start(#) frequency
graph save graphname, replace
/* Absolute Frequency Histogram for Varname */

histogram varname, bin(#) width(#) start(#) fraction
graph save graphname, replace
/* Relative Frequency Histogram for Varname */

stem varname
/* Stem and Leaf Plot for Varname */

sum varlist
/* Mean, Standard Deviation, Maximum, and Minimum for Varlist */

sum varlist, detail
/* Mean, Standard Deviation, Maximum, Minimum, and various percentiles for
Varlist */

centile varname, centile(25 50 75)
/* Quartiles for Varname */

graph box varname, title(Box Plot of Varname)
graph save graphname, replace
/* Box Plot for Varname */

tab varname1 varname2
/* Two Way table for Varname1 and Varname2 */

spearman varlist
/* Spearman Rank Order Coefficients for Varlist */

9
ktau varlist
/* Kendall Rank Order Coefficients for Varlist */

twoway (scatter varname1 varname2), title(Scatter Diagram of Varname1 and
Varname2)
graph save graphname, replace
/* Scatter Diagram for Varname1 and Varname2 */

corr varlist, cov
/* Variances and Covariances for Varlist */

corr varlist
/* Pearson Correlation Coefficients for Varlist */

V. Stata Command Syntax

Having used a few Stata commands it may be time to comment briefly on their structure,
which usually follows the following syntax, where bold indicates keywords and square
brackets indicate optional elements:
[by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [using filename]
[,options]
command
The only required element is the command itself, which is usually (but not always) an
action verb, and is often followed by the names of one or more variables.
varlist
The command is often followed by the names of one or more variables, for
example describe lexp orregress lexp loggnppc.
=exp:
Commands used to generate new variables, such as generate log_gnp = log(gnp),
include an arithmetic expression
if exp and in range:
As we have seen, a command's action can be restricted to a subset of the data by
specifying a logical condition that evaluates to true of false, such as lexp < 55.

weight:
Some commands allow the use of weights, type help weights to learn more.

using filename:
10
The keyword using introduces a file name; this can be a file in your computer, on the
network, or on the internet.
options:
Most commands have options that are specified following a comma. To obtain a list of
the options available with a command type help command where command is the actual
command name.
by varlist:
A very powerful feature, it instructs Stata to repeat the command for each group of
observations defined by distinct values of the variables in the list.