FMHU5002 Introductory Biostatistics – Data Analysis Assignment 1 Sydney School of Public Health Semester 1, 2026 Data Summary and Analysis Assignment FMHU5002 Introductory Biostatistics Due Date and Time: Sunday, March 22nd, 2026 11:59 PM Sydney Time Assignment Category: Submitted Work Assignment Sub-category: Assignment Weighting: 20% Plagiarism and Academic Dishonesty Policy You must complete your assignment alone. Submitting assignments that have been jointly completed is not acceptable. Copying someone else’s work or quoting from text without adequate attribution of the source is plagiarism and is not acceptable. All assignments will be verified by plagiarism detection software. Serious penalties apply for plagiarism, collusion, or contract cheating. Information about the University’s policy on academic honesty can be found at the following site: https://www.sydney.edu.au/students/academic-integrity.html Late Penalties and Special Consideration Unless you have an approved simple extension, special consideration or an academic plan, 1 mark (5% of 20) will be deducted from your assignment mark per day (or part thereof) until Wednesday, April 1st, 11:59 PM (Sydney Local Time). Assignments submitted past this date without approved special consideration or an academic plan will not be accepted and will be given a zero (0) mark. For students seeking simple extensions or special consideration, please use the following site: https://www.sydney.edu.au/students/special-consideration.html Instructions to students The data for the assignment has been derived from the online Introductory Biostatistics Questionnaire that students completed during Lecture 1 (and the week that followed). We had 192 responses – thank you! For this assignment each student has been allocated their own version of the dataset, comprising a random sample of respondents. A few modifications to the data have been made to make this data suitable for the Assignment. Datasets are named BIOQA###.csv, where ### is a 3-digit number. Please use the dataset that is listed against your name in the Assignment Dataset Allocation file. For example, if your allocated dataset is BIOQA012, then the dataset you need to use is BIOQA012.csv. Please ensure you use your allocated dataset. All datasets are available within the FMHU5002 Assignment Datasets folder, under Assessment Resources. FMHU5002 Introductory Biostatistics – Data Analysis Assignment 2 Sydney School of Public Health Semester 1, 2026 Important Notes: • This assignment paper (including cover page, instructions, and data dictionary) is six (6) pages in length. Please ensure you have all pages. • Please ensure you use your allocated dataset. Because the datasets differ between students, the results will differ, and the conclusions drawn could also vary. • The variable names and coding of the variables (i.e., the data dictionary) for your dataset are included at the end of this assignment on page 5 and 6. • There is not always just one correct way of handling data: you are sometimes required to use your own judgment. When this occurs, you should justify the decision you have made. • Name your submission file with your student number (SID), unit of study code, and BIOQA dataset number (e.g., 311275249_FMHU5002_BIOQA012.pdf). Ensure all pages are numbered, and that your student number is included in the header or footer of the document. • Assignments are marked anonymously, so please do NOT put your name anywhere on the assignment or submission title. • Any jamovi output presented must be edited to comply with the recommendations for presenting results as covered in the Module 1 Notes. • All tables and plots presented in your answers must conform to the recommended presentation guidelines outlined in the Module 1 notes. Penalties will apply where they do not conform. Submit your assessment as a single file in .docx or .pdf format by 11:59 PM Sydney Local Time on Sunday 22nd March 2026 via Canvas (Assessments overview > Assessment 1: Data Summary and Analysis Assignment > Assignment Submission – Click Here to Submit Your Assignment > Select the file to upload and then click “Submit Assignment”). Do not attach a jamovi .omv or a .csv file with your submission. If you have any administrative questions, please post them on the Canvas Discussion Board. Go to Discussions > Assessment 1: Data Summary and Analysis Assignment Discussion Board Alternatively contact the teaching team: fmhu5002@sydney.edu.au If you have difficulties submitting the assignment around the due time, please email fmhu5002@sydney.edu.au directly with your assignment attached to avoid late penalties. The timestamp of your email will be used as evidence of the date and time of your assignment submission. Please note responses to emails will only occur during business hours on standard working days. FMHU5002 Introductory Biostatistics – Data Analysis Assignment 3 Sydney School of Public Health Semester 1, 2026 Assignment Questions In this assignment, you will be analysing data collected from the 2026 FMHU5002 Introductory Biostatistics student survey. Students were provided with a link during the Module 1 Lecture (live, online, and in the recording) and reminded via notifications on Canvas. Question 1 (2 marks) Data screening is an important first step in any data analysis. Using appropriate methods, examine the variables height and weight for possible erroneous values. i) Describe any outlier (extreme), implausible, or impossible values within these variables by providing the ID, value, and clearly indicating whether it should be considered an outlier, implausible, or impossible value. ii) Describe the corrective action, if any, that should be taken for each of the identified observations. Perform these corrective actions. You should use your cleaned data (i.e., the data with any edits performed) for the remainder of the assignment. You can safely assume that the remaining variables are error-free and can be used as provided. Question 2 (4 marks) A person’s VO2max is a measure of the maximum amount of oxygen the body can use during intense exercise. It is a common fitness indicator which can be estimated from age and resting heart rate. For each student in your sample, calculate their VO2max (V02max) using the following formula: 02 = 15.3 × (220 − )ℎ Display the distribution of V02max using an appropriate plot. In no more than two sentences, describe the important features of this distribution including relevant summary statistics. Question 3 (6 marks) A daily step count of 10000 steps/day has often been considered an unofficial target for improved health, however recent research has shown that just 7000 steps/day is associated with lower risk across many health outcomes (Ding et al., Lancet Public Health, 2025;10(8):E668-E681). Create a new variable called step_countCat which groups the variable step_count into three categories: “Less than 7000 steps/day”; “Between 7000 – 10000 steps/day”; or “More than 10000 steps/day”. Note, for any students in your sample who do not own a device which tracks step count (i.e., with step_device = 2), their value of step_countCat should be treated as missing. FMHU5002 Introductory Biostatistics – Data Analysis Assignment 4 Sydney School of Public Health Semester 1, 2026 i) Construct a frequency table using this newly created variable step_countCat. Include in the table the relative frequencies and, if appropriate, cumulative relative frequencies for each level of step_countCat. Among students in your sample who own a device which tracks step count, what proportion do fewer than 10,000 steps/day on average? ii) Construct a two-way table using the newly created variable step_countCat and employment. Include in the table the relative frequencies for step count category within each level of employment. In no more than three sentences, summarise what patterns are evident from the table. Question 4 (3 marks) Produce an appropriate plot which visually displays the relationship between a student’s self- reported sex (sex) and their smoking status (smoke). In no more than three sentences, summarise what your plot shows. Question 5 (3 marks) In nearly all quantitative research outputs (e.g., publications, reports), authors will typically provide a table that describes the key characteristics of the sample, which often appears as the first table (Table 1) of the document. Create a “Table 1” that provides appropriate descriptive statistics for the variables age, sex, degree, study_mode, and distance for the sample included in your dataset. Note: Page 22 of the FMHU5002 Course Notes provides one example of a ‘Table 1’. You should also refer to the Tutorial 1 resources, and the published literature in your field of expertise for examples; most quantitative studies will provide a ‘Table 1’. General formatting and presentation (2 marks) A total of 2 marks is allocated to the general formatting and presentation of display items, such as tables and figures, including confirming to the recommended presentation guidelines outlined in the Module 1 notes, and providing only relevant information and output as part of the submission. Total = 20 marks This is the end of the assignment questions. The variable names and survey questions used for the data are on the following page. FMHU5002 Introductory Biostatistics – Data Analysis Assignment 5 Sydney School of Public Health Semester 1, 2026 Variable Names and Coding for the Introductory Biostatistics Class Survey 2026 Survey Question Variable Name Levels (if appropriate) Randomly generated ID (4 characters) id What degree program are you enrolled in? degree 1 = Public Health 2 = Global Health 3 = Clinical Epidemiology 4 = Surgery 5 = Higher degree by Research (MPhil / PhD) 6 = Other What mode of study are you enrolled in? study_mode 1 = Face to face 2 = Online What is your age (in years)? age What is your sex? sex 1 = Female 2 = Male 3 = Another term Approximately how far is your current residence from the University of Sydney (in kilometres)? distance What is your primary mode of transport to the University of Sydney? i.e. What type of transport do you use most often? If you use multiple modes in a single trip, select the mode of transport that you spend the most time on. transport 1 = Walk 2 = Cycle 3 = Bus 4 = Train / Light rail 5 = Car 6 = None 7 = Other What is your current employment status? employment 1 = Full time 2 = Part time 3 = Casual 4 = Not currently employed How tall are you without shoes (in centimetres)? height How much do you weigh (in kilograms)? weight What is your resting heart rate (in beats per minute; bpm)? resthr Do you have a smartphone, wearable, or other device, with tracks your step count? step_device 1 = Yes 2 = No What is your average daily step count over the past week (in steps)? step_count * Data dictionary continued on next page FMHU5002 Introductory Biostatistics – Data Analysis Assignment 6 Sydney School of Public Health Semester 1, 2026 Survey Question Variable Name Levels (if appropriate) In the past week, how many serves of vegetables have you consumed? vege In the past week, how many serves of fruit have you consumed? fruit Which of the following best describes your current smoking/vaping habits? smoke 1 = Never smoked traditional cigarettes or e-cigarettes 2 = Used to smoke traditional cigarettes and/or e-cigarettes, but do not currently use either 3 = Currently smoke traditional cigarettes and/or e-cigarettes On a scale of 0 (not at all) to 10 (extremely), how anxious are you about learning biostatistics? conf1 On a scale of 0 (not at all) to 10 (extremely), how confident are you with learning new computer software? conf2 On a scale of 0 (not at all) to 10 (extremely), how confident are you with using and understanding mathematical equations? conf3 If given the choice (and it was not a requirement for your degree program), would you willingly take a biostatistics unit of study? willing 0 = No 1 = Yes
学霸联盟