当前位置 : 首页 > 维修数据

活动数据分析:步数模式与缺失值填补研究

<|begin▁of▁sentence|>--- title: "Reproducible Research: Peer Assessment 1" output: _document: keep_md: true --- ## Loading and preprocessing the data ```r library(dplyr) ``` ``` ## ## Attaching package: 'dplyr' ``` ``` ## The following objects are masked from 'package:stats': ## ## filter, lag ``` ``` ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union ``` ```r library(ggplot2) library(lubridate) ``` ``` ## ## Attaching package: 'lubridate' ``` ``` ## The following object is masked from 'package:base': ## ## date ``` ```r library(xtable) ``` ``` ## Warning: package 'xtable' was built under R version 3.6.3 ``` ```r # Read data activity <- read.csv("activity.csv", header = TRUE) # Convert date to date format activity$date <- as.Date(activity$date, format = "%Y-%m-%d") # Convert interval to time format activity$time <- format(strptime(sprintf("%04d", activity$interval), format="%H%M"), format = "%H:%M") # Create a new column for datetime activity$datetime <- as.POSIXct(paste(activity$date, activity$time), format="%Y-%m-%d %H:%M") # Remove rows with NA steps activity_clean <- activity[!is.na(activity$steps),] ``` ## What is mean total number of steps taken per day? ```r # Calculate total steps per day total_steps_per_day <- activity_clean %>% group_by(date) %>% summarise(total_steps = sum(steps)) # Plot histogram of total steps per day ggplot(total_steps_per_day, aes(x = total_steps)) + geom_histogram(binwidth = 1000, fill = "blue", color = "black") + labs(title = "Histogram of Total Steps per Day", x = "Total Steps", y = "Frequency") ``` ![](PA1_template_files/figure-/unnamed-chunk-2-1.png) ```r # Calculate mean and median of total steps per day mean_steps <- mean(total_steps_per_day$total_steps) median_steps <- median(total_steps_per_day$total_steps) ``` The mean total number of steps taken per day is 10766.19 and the median is 10765. ## What is the average daily activity pattern? ```r # Calculate average steps per interval average_steps_per_interval <- activity_clean %>% group_by(interval) %>% summarise(average_steps = mean(steps)) # Plot time series of average steps per interval ggplot(average_steps_per_interval, aes(x = interval, y = average_steps)) + geom_line(color = "blue") + labs(title = "Average Daily Activity Pattern", x = "5-minute Interval", y = "Average Steps") ``` ![](PA1_template_files/figure-/unnamed-chunk-3-1.png) ```r # Find the interval with the maximum average steps max_interval <- average_steps_per_interval[which.max(average_steps_per_interval$average_steps), "interval"] ``` The 5-minute interval that contains the maximum number of steps on average across all days is 835. ## Imputing missing values ```r # Calculate total number of missing values total_na <- sum(is.na(activity$steps)) # Impute missing values using the mean for that 5-minute interval activity_imputed <- activity for (i in which(is.na(activity_imputed$steps))) { activity_imputed$steps[i] <- average_steps_per_interval$average_steps[average_steps_per_interval$interval == activity_imputed$interval[i]] } # Calculate total steps per day for imputed data total_steps_per_day_imputed <- activity_imputed %>% group_by(date) %>% summarise(total_steps = sum(steps)) # Plot histogram of total steps per day for imputed data ggplot(total_steps_per_day_imputed, aes(x = total_steps)) + geom_histogram(binwidth = 1000, fill = "blue", color = "black") + labs(title = "Histogram of Total Steps per Day (Imputed Data)", x = "Total Steps", y = "Frequency") ``` ![](PA1_template_files/figure-/unnamed-chunk-4-1.png) ```r # Calculate mean and median of total steps per day for imputed data mean_steps_imputed <- mean(total_steps_per_day_imputed$total_steps) median_steps_imputed <- median(total_steps_per_day_imputed$total_steps) ``` The total number of missing values in the dataset is 2304. After imputing missing data, the mean total number of steps taken per day is 10766.19 and the median is 10766.19. The mean remains the same, but the median has increased and is now equal to the mean. ## Are there differences in activity patterns between weekdays and weekends? ```r # Create a new factor variable for weekdays and weekends activity_imputed$day_type <- ifelse(weekdays(activity_imputed$date) %in% c("Saturday", "Sunday"), "weekend", "weekday") activity_imputed$day_type <- as.factor(activity_imputed$day_type) # Calculate average steps per interval for weekdays and weekends average_steps_per_interval_day_type <- activity_imputed %>% group_by(interval, day_type) %>% summarise(average_steps = mean(steps)) # Plot time series of average steps per interval for weekdays and weekends ggplot(average_steps_per_interval_day_type, aes(x = interval, y = average_steps, color = day_type)) + geom_line() + facet_wrap(~day_type, ncol = 1) + labs(title = "Average Daily Activity Pattern by Day Type", x = "5-minute Interval", y = "Average Steps") + theme(legend.position = "none") ``` ![](PA1_template_files/figure-/unnamed-chunk-5-1.png) There are differences in activity patterns between weekdays and weekends. On weekdays, there is a peak in activity in the morning, while on weekends, the activity is more evenly distributed throughout the day.

栏目列表