活动数据分析:步数模式与缺失值填补研究
<|begin▁of▁sentence|>---
title: "Reproducible Research: Peer Assessment 1"
output:
_document:
keep_md: true
---
## Loading and preprocessing the data
```r
library(dplyr)
```
```
##
## Attaching package: 'dplyr'
```
```
## The following objects are masked from 'package:stats':
##
## filter, lag
```
```
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
```
```r
library(ggplot2)
library(lubridate)
```
```
##
## Attaching package: 'lubridate'
```
```
## The following object is masked from 'package:base':
##
## date
```
```r
library(xtable)
```
```
## Warning: package 'xtable' was built under R version 3.6.3
```
```r
# Read data
activity <- read.csv("activity.csv", header = TRUE)
# Convert date to date format
activity$date <- as.Date(activity$date, format = "%Y-%m-%d")
# Convert interval to time format
activity$time <- format(strptime(sprintf("%04d", activity$interval), format="%H%M"), format = "%H:%M")
# Create a new column for datetime
activity$datetime <- as.POSIXct(paste(activity$date, activity$time), format="%Y-%m-%d %H:%M")
# Remove rows with NA steps
activity_clean <- activity[!is.na(activity$steps),]
```
## What is mean total number of steps taken per day?
```r
# Calculate total steps per day
total_steps_per_day <- activity_clean %>%
group_by(date) %>%
summarise(total_steps = sum(steps))
# Plot histogram of total steps per day
ggplot(total_steps_per_day, aes(x = total_steps)) +
geom_histogram(binwidth = 1000, fill = "blue", color = "black") +
labs(title = "Histogram of Total Steps per Day", x = "Total Steps", y = "Frequency")
```

```r
# Calculate mean and median of total steps per day
mean_steps <- mean(total_steps_per_day$total_steps)
median_steps <- median(total_steps_per_day$total_steps)
```
The mean total number of steps taken per day is 10766.19 and the median is 10765.
## What is the average daily activity pattern?
```r
# Calculate average steps per interval
average_steps_per_interval <- activity_clean %>%
group_by(interval) %>%
summarise(average_steps = mean(steps))
# Plot time series of average steps per interval
ggplot(average_steps_per_interval, aes(x = interval, y = average_steps)) +
geom_line(color = "blue") +
labs(title = "Average Daily Activity Pattern", x = "5-minute Interval", y = "Average Steps")
```

```r
# Find the interval with the maximum average steps
max_interval <- average_steps_per_interval[which.max(average_steps_per_interval$average_steps), "interval"]
```
The 5-minute interval that contains the maximum number of steps on average across all days is 835.
## Imputing missing values
```r
# Calculate total number of missing values
total_na <- sum(is.na(activity$steps))
# Impute missing values using the mean for that 5-minute interval
activity_imputed <- activity
for (i in which(is.na(activity_imputed$steps))) {
activity_imputed$steps[i] <- average_steps_per_interval$average_steps[average_steps_per_interval$interval == activity_imputed$interval[i]]
}
# Calculate total steps per day for imputed data
total_steps_per_day_imputed <- activity_imputed %>%
group_by(date) %>%
summarise(total_steps = sum(steps))
# Plot histogram of total steps per day for imputed data
ggplot(total_steps_per_day_imputed, aes(x = total_steps)) +
geom_histogram(binwidth = 1000, fill = "blue", color = "black") +
labs(title = "Histogram of Total Steps per Day (Imputed Data)", x = "Total Steps", y = "Frequency")
```

```r
# Calculate mean and median of total steps per day for imputed data
mean_steps_imputed <- mean(total_steps_per_day_imputed$total_steps)
median_steps_imputed <- median(total_steps_per_day_imputed$total_steps)
```
The total number of missing values in the dataset is 2304.
After imputing missing data, the mean total number of steps taken per day is 10766.19 and the median is 10766.19. The mean remains the same, but the median has increased and is now equal to the mean.
## Are there differences in activity patterns between weekdays and weekends?
```r
# Create a new factor variable for weekdays and weekends
activity_imputed$day_type <- ifelse(weekdays(activity_imputed$date) %in% c("Saturday", "Sunday"), "weekend", "weekday")
activity_imputed$day_type <- as.factor(activity_imputed$day_type)
# Calculate average steps per interval for weekdays and weekends
average_steps_per_interval_day_type <- activity_imputed %>%
group_by(interval, day_type) %>%
summarise(average_steps = mean(steps))
# Plot time series of average steps per interval for weekdays and weekends
ggplot(average_steps_per_interval_day_type, aes(x = interval, y = average_steps, color = day_type)) +
geom_line() +
facet_wrap(~day_type, ncol = 1) +
labs(title = "Average Daily Activity Pattern by Day Type", x = "5-minute Interval", y = "Average Steps") +
theme(legend.position = "none")
```

There are differences in activity patterns between weekdays and weekends. On weekdays, there is a peak in activity in the morning, while on weekends, the activity is more evenly distributed throughout the day.
最新文章
- 智能网联汽车:重塑未来出行的三大变革趋势
- 赛道飞驰汽车轰鸣速度与激情交织
- 汽车安全气囊系统保护乘客安全
- 传感器融合:智能汽车“最强大脑”的炼成与保障
- 高精地图赋能智能汽车:激光雷达与线控系统的未来驾驶革命
- 智能驾驶技术发展:从激光雷达到自动驾驶系统的革新
- 电动化浪潮:电池技术突破与快充平台革新引领未来出行
- 车辆被盗怎么办?一文读懂盗抢险理赔全流程
- 智能驾驶三大慧眼:激光雷达、毫米波雷达与超声波传感器
- 行车安全必知:如何正确保持安全车距与制动距离
- 第三者责任险全解析:保障范围、免赔条款与投保建议
- 交通事故处理指南:从双闪灯到定损员的全流程解析
- 新手必学:掌握跟车距离、盲区检查与预判驾驶三大安全技巧
- 智能温控系统提升汽车驾驶舒适度
- 汽车改变生活:从出行方式到社交空间的全面升级
- 汽车变速箱技术进化史:从手动到智能的全面解析
- 创建自定义安全令牌提供程序
- 车联网与电动化革命:V2X技术如何重塑未来智能出行
- 捷豹汽车传奇速度与奢华
- 自动驾驶技术发展:激光雷达与高精地图推动L3级突破
