R-ын DPLYR багц ашиглан өгөгдөл дээр боловсруулалт хэрхэн хийхийг жишээгээр авч үзье.
- Өгөгдөл шүүх – filter()
- Мөрүүдийг эрэмбийг өөрчлөх – arrange()
- Хувьсагчдийг нэрээр нь сонгож авах – select()
- Шинэ хувьсагч үүсгэх – mutate()
- Хураангуй гаргах – summarise()
ДАТА
> library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.3.2
✔ purrr 0.3.4
✔ tibble 3.0.3
✔ dplyr 1.0.2
✔ tidyr 1.1.1
✔ stringr 1.4.0
✔ readr 1.3.1
✔ forcats 0.5.0
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
...
Датаны индексийг багана руу оруулж шинэ дата фрэйм үүсгэх
> df <- cbind(model_name = rownames(mtcars), mtcars)
> rownames(df) <- NULL
Датаны тайлбарыг харъя:
> ?mtcars
File: /tmp/RtmprGOTm1/Rtxt6753aea983 mtcars package:datasets R Documentation
Motor Trend Car Road Tests
Description: The data was extracted from the 1974 _Motor Trend_ US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
Usage: mtcars
Format: A data frame with 32 observations on 11 (numeric) variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
Filter()
Milage per gallon нь 20-с бага, horse power 200-с дээш машинуудыг шүүе.
filter(df, mpg < 20, hp > 200)
model_name mpg cyl disp hp drat wt qsec vs am gear carb
1 Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
2 Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
3 Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
4 Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
5 Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Arrange()
> arrange(df, mpg, cyl)
model_name mpg cyl disp hp drat wt qsec vs am gear carb
1 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
2 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
3 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
4 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
5 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Сонгосон хувьсагчийг буурахаар эрэмбэлэх тохиолдолд
> arrange(df, desc(hp))
model_name mpg cyl disp hp drat wt qsec vs am gear carb
1 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
3 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
4 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
5 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
select()
> select(df, cyl, hp, gear)
cyl hp gear
1 6 110 4
2 6 110 4
3 4 93 4
4 6 110 3
5 8 175 3
> select(df, cyl:drat)
cyl disp hp drat
1 6 160.0 110 3.90
2 6 160.0 110 3.90
3 4 108.0 93 3.85
4 6 258.0 110 3.08
5 8 360.0 175 3.15
Mutate()
# mutate хийх хувьсагчдыг салгаж шинэ дата үүсгэх
> data_mutate <- select(mtcars,
mpg, wt
)
# 1 mpg = 0.425144 mk/l
# 1lbs = 0.453592 kg
> mutate(data_mutate,
kml = mpg * 0.425144,
kg = wt * 1000 * 0.453592
)
mpg wt kml kg
1 21.0 2.620 8.928024 1188.4110
2 21.0 2.875 8.928024 1304.0770
3 22.8 2.320 9.693283 1052.3334
4 21.4 3.215 9.098082 1458.2983
5 18.7 3.440 7.950193 1560.3565
# Зөвхөн шинээр үүсгэсэн хувьсагчдыг харуулах бол transmute() функц ашиглана.
> transmute(data_mutate,
kml = mpg * 0.425144,
kg = wt * 1000 * 0.453592
)
kml kg
1 8.928024 1188.4110
2 8.928024 1304.0770
3 9.693283 1052.3334
4 9.098082 1458.2983
5 7.950193 1560.3565
groupby() болон Summarise()
> group_df <- group_by(df, model_name)group_df
# A tibble: 32 x 12 # Groups: model_name [32]
model_name mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 Mazda RX4 … 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 Hornet 4 D… 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 Hornet Spo… 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
> summarise(df, mean_fuel_consumption = mean(mpg, na.rm = TRUE))
mean_fuel_consumption
1 20.09062
> ggplot(data = df, mapping = aes(x = wt, y = mpg)) +
geom_point(aes(size = hp), alpha = 1/3) +
geom_smooth(se = FALSE)