Data.table

data.table笔记

构建测试数据 examdf <- data.frame(id = c(1,2,3,4,5,6,7,8,9), name = c("Jordan", "Kobe", "T-Mac", "Duncan", "Garnet", "Iverson", "Dwade", "CP3", "Bird"), team = c("SG", "SG", "SF", "PF", "PF", "SG", "PG", "PG", "SF"), num = c(23, 24, 1, 21, 21, 3, 3, 3, 33 ), score = c(1, 2, 3, 4, 5, 6, 7, 8, 9), stringsAsFactors = FALSE) examdf ## id name team num score ## 1 1 Jordan SG 23 1 ## 2 2 Kobe SG 24 2 ## 3 3 T-Mac SF 1 3 ## 4 4 Duncan PF 21 4 ## 5 5 Garnet PF 21 5 ## 6 6 Iverson SG 3 6 ## 7 7 Dwade PG 3 7 ## 8 8 CP3 PG 3 8 ## 9 9 Bird SF 33 9 tidyverse、d

filter one column max by group with data.table

目标 提取每个月最大日期的行。 数据结构 testdf <- data.frame(d = c("2017-01-01", "2017-01-30", "2017-02-02", "2017-02-10"), v = c("A", "B", "C", "D")) testdf$d <- as.Date(testdf$d) testdf$month <- lubridate::month(testdf$d) testdf ## d v month ## 1 2017-01-01 A 1 ## 2 2017-01-30 B 1 ## 3 2017-02-02 C 2 ## 4 2017-02-10 D 2 tidyverse testdf %>% group_by(month) %>% dplyr::filter(d == max(d)) ## # A tibble: 2 x 3 ## # Groups: month [2] ## d v month ## <date> <fct> <dbl> ## 1 2017-01-30 B 1. ## 2 2017-02-10 D 2. plyr::ddply plyr::ddply(testdf, "month", subset, d == max(d)) ## d v month ## 1 2017-01-30 B 1 ## 2 2017-02-10 D 2 data.table 方法一 setDT(testdf) testdf[, .SD[ d == max(d) ], by = month] ## month d v