3-3. R 데이터의 활용 2

1) 오픈라이브러리 데이터활용

- txt파일 예제 (autompg.txt)

참고사이트 : http://archive.ics.uci.edu/ml/datasets/Auto+MPG

http://archive.ics.uci.edu

archive.ics.uci.edu

2) 데이터 활용2

- autompg 데이터

(1) 데이터 불러들이기 (txt)

car<-read.table(file="autompg.txt", na=" ", header=TRUE)
head(car)
dim(car) 데이터 개수, 변수 개수 표시

(2) 데이터의 전체 구조 파악하기 : str(데이터이름)

str(car) factor : 문자로 된 변수

(3) 데이터 요약하기 : summary(데이터이름)

summary(car) 최소값, 25%값, 중위수, 평균, 75%값, 최대값 표시 / 문자형은 빈도 표시

(4) 데이터의 요약통계치 (빈도 구하기) : table(데이터이름)

attach(car) attach(데이터이름) : 현재 세션에서 나오는 변수들은 그 '데이터'의 변수로 인식

-> 데이터이름 안써도 됨
table(origin)
table(year)

(5) 데이터의 요약통계치 (평균, 표준편차) 구하기

mean(mpg)
mean(hp)
mean(wt) 개별변수의 평균 구하기

apply (car[, 1:6], 2, mean) apply(변수리스트, (1=row, 2=col), FUN) 몇개 변수들의 요약통계치 한번에 구함

(6) 막대그래프 : barplot(count)

freq_cyl<-table(cyl)
names(freq_cyl) <- c ("3cyl", "4cyl", "5cyl", "6cyl", "8cyl")
barplot(freq_cyl)
barplot(freq_cyl, main="Cylinders Distribution")

(7) 히스토그램 : hist(변수이름, main="제목")

hist(mpg, main="Mile per gallon:1970-1982", col="lightblue")

(8) 3D 산점도 : scatterplot3d(변수이름, ..., main="제목")

library(scatterplot3d)
scatterplot3d(wt,hp,mpg, type="h", highlight.3d=TRUE, angle=55, scale.y=0.7, pch=16, main="3dimensional plot for autompg data")

(9) 벡터화 요약치 : lapply(변수리스트, FUN)

lapply (car[, 1:6], mean)

a1<-lapply (car[, 1:6], mean)
a2<-lapply (car[, 1:6], sd)
a3<-lapply (car[, 1:6], min)
a4<-lapply (car[, 1:6], max)
table1<-cbind(a1,a2,a3,a4)
colnames(table1) <- c("mean", "sd", "min", "max")
table1

'공부 > R & Python' 카테고리의 다른 글

4-1. R그래픽 기초1 (0)	2020.02.13
3-4. 여러형태의 DB다루기 (Excel 통합파일, DBF, SQL) (0)	2020.02.13
3-2. R 데이터 활용 1(subset, 내보내기) (0)	2020.02.12
데이터 불러들이기 (0)	2019.11.18
함수 생성 및 루프 (0)	2019.11.11

logN^블

3-3. R 데이터의 활용 2