DATA7001-无代写
时间:2023-09-07
DATA7001
Introduction to Data Science
2023 Sem1
Prac-1
Introduction to R
Materials:
Prac 1.ipynb
• Jupyter Notebook
• Directory
Prac1
├── datasets
└── Prac 1.ipynb
Goals:
• Start working with R to conduct data loading/cleaning tasks
• Get familiar with some common R programming methods
Tips and Tricks
• You will probably not finish the whole prac in just the one hour session You →
can keep working on it next week, and in your own time
• We will provide some hints/pointers to useful functions and we are here to
help put you on the right track
• You may need to do some of your own research to find solutions
• There may be many solutions to each specific task
• Feel free to discuss with your classmates and tutors
Practical Assessment
Remember that Prac1 and Prac 2 need to be submitted for assessment
- Submit via Blackboard
- Due end of week 6
- Refer to instructions inside the notebooks
Both prac1 and prac2, together, will make up 10% of your
final mark.
Prac3, prac4, and prac5 will, together, make up another
10% of your final mark.
Practicals are assessed based on your solutions to each
task.
Steps
1. Go to https://coursemgr.uqcloud.net/data7001
2. If you have not done so already, click “Create Zone”
3. After a few minutes your DNS will propogate, and you will be able to access
your zone directly via the web (click the link under the “Web Address”).
You may need to append “/jupyter/” to your URL to get to the notebooks
You should see a screen like this:
Zone Recap from last week
• Your zone is your personal environment for practical work during the semester
• Prac material is “magically” loaded in before the pracs begin each week
• You may also use the zones for your project work if you wish
Part 1 – Data Quality
Copying Variables: x<-y or x = y
Accessing data from a dataframe: df[rows, columns]
Accessing data from within a list (also works for extracting or iterating
lists in a dataframe):
x <- list(a=1, b=2, c=3) # list with 3 elements
x$b # using the “dollar” notation to access b
[1] 2 # element “b” contains the value 2
https://rpubs.com/tomhopper/brackets
https://stackoverflow.com/questions/42560090/what-is-the-meaning-of-the-dollar-sign-in-r
-function
Part 1 – Data Quality (cont)
Removing missing values: Check out the “apply” function or the
“complete.cases” function:
https://ademos.people.uic.edu/Chapter4.html
https://www.statology.org/complete-cases-in-r/
The “rbind” function is useful for combining dataframes
Filtering data: You can use the “c” function to generate a vector of the
input arguments:
https://stackoverflow.com/questions/25268888/what-does-c-do-in-r
Filtering data: You can also select specific rows out of a df using standard
comparison operators:
newdf = olddf[ olddf$age > 10, ]
Part 2 – Factors
The following functions should help with this part:
factor, table, count, and as.vector
https://r4ds.had.co.nz/factors.html
Part 3 – User Defined Functions
Check out the special “function” keyword in R to see
how functions are defined and can be called
Matrix determinants can be calculated using the built-in
function “det”
Part 4 – Common Functions and Operations
Many basic arithmetic operators are pre-defined such as
“sum” or “median”
There are many versions of “apply” which can help you:
https://www.guru99.com/r-apply-sapply-tapply.html#3
The aggregate function:
https://www.geeksforgeeks.org/how-to-use-aggregate-
function-in-r/
Part 5 – Lists
Refer again to the lapply/tapply resource in the previous
slide, and to “Part3” on user defined functions.
Part 6 – Loops
R supports both for and while loops.
for (x in 1:10) {
print(x)
}
x = 1
while (x <= 10) {
print(x)
x=x+1
}


essay、essay代写