[DL] 주성분 분석

Deep Learning/혼공 머신 러닝 - 딥러닝

[DL] 주성분 분석

KimTory 2021. 12. 21. 00:01

▶ 주성분 분석은 차원 축소 알고리즘의 하나로 데이터에서 가장 분산이 큰 방향을 찾는 방법을 주성분 방향이라 한다.

일반적으로 주성분은 원본 데이터에 있는 특성 개수보다 작다.

→ 차원 축소란 원본 데이터의 특성을 적은 수의 새로운 특성으로 변환하는 비지도 학습의 한 종류,

차원 축소는 저장 공간을 줄이고, 시각화하기 쉽다. 또 알고리즘의 성능을 향상 시킬 수 있다.

→ PCA는 주성분 분석을 수행하는 클래스 (scikit-learn)

[ 소스 코드 ]

!wget https://bit.ly/fruits_300_data -O fruits_300.npy

import numpy as np

fruits = np.load('fruits_300.npy')
fruits_2d = fruits.reshape(-1, 100*100) # 차원 동일, 한 행으로 나열

from sklearn.decomposition import PCA

pca = PCA(n_components=50) # 주성분의 개수를 지정
pca.fit(fruits_2d) # 학습, target 객체 없음
# 학습한 개체는 _로 명명

print(pca.components_.shape) # shape 출력
// (50, 10000)

import matplotlib.pyplot as plt

def draw_fruits(arr, ratio=1):
    n = len(arr)    # n은 샘플 개수
    # 한 줄에 10개씩 이미지를 그림. 샘플 개수를 10으로 나누어 전체 행 개수를 계산
    rows = int(np.ceil(n/10))
    # 행이 1개 이면 열 개수는 샘플 개수 그렇지 않으면 10개
    cols = n if rows < 2 else 10
    fig, axs = plt.subplots(rows, cols, 
                            figsize=(cols*ratio, rows*ratio), squeeze=False)
    for i in range(rows):
        for j in range(cols):
            if i*10 + j < n:    # n 개까지만 그립니다.
                axs[i, j].imshow(arr[i*10 + j], cmap='gray_r')
            axs[i, j].axis('off')
    plt.show()
    
draw_fruits(pca.components_.reshape(-1, 100, 100))
    
print(fruits_2d.shape)
fruits_pca = pca.transform(fruits_2d) # pca는 주성분을 50개로 미리 선정해둠
print(fruits_pca.shape) # 10000 → 50 으로 shpae 변경됨
# (300, 50)

target = np.array([0] * 100 + [1] * 100 + [2] * 100) # 임의로 타겟을 만듦

from sklearn.model_selection import cross_validate

scores = cross_validate(lr, fruits_2d, target)
print(np.mean(scores['test_score'])) # 정확도
print(np.mean(scores['fit_time'])) # 훈련 시간

0.9966666666666667
3.712025499343872

# 2개의 특성만으로 99% 정확도를 맞춤
scores = cross_validate(lr, fruits_pca, target) 
print(np.mean(scores['test_score']))
print(np.mean(scores['fit_time']))

1.0
0.08389606475830078

pca = PCA(n_components=0.5) # 0.5는 비율이며 , 주성분의 개수 대신 0 ~ 1 까지의 비율 입력 가능
pca.fit(fruits_2d)

PCA(n_components=0.5)

print(pca.n_components_)
// 2

from sklearn.linear_model import LogisticRegression lr = LogisticRegression()

from sklearn.cluster import KMeans

km = KMeans(n_clusters=3, random_state=42)
km.fit(fruits_pca)

KMeans(n_clusters=3, random_state=42)

print(np.unique(km.labels_, return_counts=True))

(array([0, 1, 2], dtype=int32), array([110,  99,  91]))

for label in range(0, 3):
    draw_fruits(fruits[km.labels_ == label])
    print("\n")

for label in range(0, 3):
    data = fruits_pca[km.labels_ == label]
    plt.scatter(data[:,0], data[:,1])
plt.legend(['apple', 'banana', 'pineapple'])
plt.show()

'Deep Learning > 혼공 머신 러닝 - 딥러닝' 카테고리의 다른 글

[DL] 22.01.02 메모 (0)	2022.01.02
[DL] DL에서 사용 하는 활성화 함수의 역할 (0)	2022.01.02
[DL] 비지도 학습, K-평균 (0)	2021.12.19
[DL] 비지도 학습, 군집 알고리즘 (0)	2021.12.19
[DL] 확률적 경사 하강법 (Gradient Descent) (0)	2021.12.16

현재글[DL] 주성분 분석

Machine Vision / ML / DL

Machine Vision Deep Learning 개인 공부 목적

코딩, 딥러닝, 코드업, 코딩 테스트, machine learning, coding, codeup, Algorithm, Vision, deeplearning, Machine Vision, Deep learning, opencv, coding test, ML, 파이썬, Python, c#, 알고리즘, DL,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Machine Vision / ML / DL