하지만 잠깐만요! 데이터를 자세히 조사하기 전에 항상 테스트 세트를 만들고 따로 떼어놓아야
합니다. 사실
MNIST
데이터셋은 이미 훈련 세트(앞쪽
60
,
000
개 이미지)와 테스트 세트(뒤
쪽
10
,
000
개 이미지)로 나누어 놓았습니다.
X
_
train
,
X
_
test
,
y
_
train
,
y
_
test
=
X
[:
60000
],
X
[
60000
:],
y
[:
60000
],
y
[
60000
:]
훈련 세트를 섞어서 모든 교차 검증 폴드가 비슷해지도록 만들겠습니다(하나의 폴드라도 특정
숫자가 누락되면 안 됩니다). 더군다나 어떤 학습 알고리즘은 훈련 샘플의 순서에 민감해서 많
은 비슷한 샘플이 연이어 나타나면 성능이 나빠집니다. 데이터셋을 섞으면 이런 문제를 방지할
수 있습니다.
2
import numpy as np
shuffle
_
index
=
np
.
random
.
permutation
(
60000
)
X
_
train
,
y
_
train
=
X
_
train
[
shuffle
_
index
],
y
_
train
[
shuffle
_
index
]
3.2
이진 분류기 훈련
문제를 단순화해서 하나의 숫자, 예를 들면 숫자
5
만 식별해보겠습니다. 이 ‘
5
-
감지기’는 ‘
5
’와
‘
5
아님’ 두 개의 클래스를 구분할 수 있는 이진 분류기
binary
classifier
의 한 예입니다. 분류 작업을
위해 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more.
O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.