(지난 글) https://bedlocked.tistory.com/19
AlexNet의 구조와 구현
AlexNet은 2012년에 발표된 이미지 분류 CNN 아키텍쳐이고, 그 당시에 타 모델들에 비해 뛰어난 성능으로 이미지 객체 인식 대회인 ImageNet Large Scale Visual Recognition Challenge에서 우승을 했다. 처음으로 C
bedlocked.tistory.com
지난번 게시물에서는 AlexNet이 어떠한 구조를 가지고 있는지, 왜 당시의 타 모델들에 비해 뛰어난 성능을 가질 수 있었는지 살펴보았다. 이번에는 CIFAR-10 데이터셋을 이용해 AlexNet 모델을 학습시키고 이미지를 분류해보겠다.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: ", device)
train_transform = transforms.Compose([
transforms.Resize((256, 256)), # 모든 이미지를 256x256으로 resize
transforms.RandomCrop(227), # 227x227 random crop (논문의 224는 오기)
transforms.RandomHorizontalFlip(), # random 좌우 반전
transforms.ToTensor(), # [0, 255] -> [0, 1]
transforms.Normalize( # ImageNet 이미지들의 mean, std 기준으로 정규화
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
val_transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.TenCrop(227), # 5 crops + 좌우반전
transforms.Lambda(lambda crops: torch.stack([
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)(transforms.ToTensor()(crop))
for crop in crops # ToTensor, Normalize 는 Tensor 1개씩만 처리 가능
])) # crop 10장을 하나씩 처리해서 다시 하나의 Tensor 로 묶음
])
우선 이미지를 전처리하는 과정이 필요하다. 첫번째 Input 이미지 데이터는 227x227x3 크기여야 하기 때문에 이미지를 256x256x3으로 resize 해준 뒤 RandomCrop / RandomFlip 해준다. AlexNet 논문에서도 이미지의 크기가 처음에는 256x256x3 이지만 이를 227x227x3 크기로 Random Crop / RandomFlip 시킴으로써 이미지 데이터 양을 수천배 크기로 늘린다고 설명한다. 테스트 데이터는 resize 후 각 코너 4개와 중앙 227x227 패치 5개와 그것들의 horizontal flip 5개, 총 10개를 구한 후 10개의 이미지에 대해서 softmax 예측값을 평균내 최종 예측값을 구하는 식으로 사용한다.
train_loader = DataLoader(
CIFAR10(root='./data', train=True, transform=train_transform, download=True),
batch_size=128,
shuffle=True
)
test_loader = DataLoader(
CIFAR10(root='./data', train=False, transform=val_transform, download=True),
batch_size=128,
shuffle=True
)
CIFAR-10 train 데이터와 test 데이터를 불러오고, 데이터 전처리를 진행한다.
model = AlexNet(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
for epoch in range(50):
running_loss = 0.0
correct = 0
with tqdm(train_loader, unit="batch") as tepoch:
for images, labels in tepoch:
tepoch.set_description(f"Epoch #{epoch}")
images = images.to(device)
labels = labels.to(device)
optimizer.zero_grad()
output = model(images)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
running_loss += loss.item() * labels.size(0)
preds = output.argmax(dim=1)
correct += (preds == labels).sum().item()
print(f"epoch {epoch}: train loss {running_loss / len(train_loader.dataset)}")
print(f"epoch {epoch}: train accuracy {correct / len(train_loader.dataset)}")
CIFAR-10 의 정답 라벨은 10개이기 때문에 num_classes를 10으로 설정한다. 오차는 CrossEntropyLoss, 옵티마이저는 SGD를 사용했다.
Epoch #0: 100%|██████████| 391/391 [03:09<00:00, 2.07batch/s]
epoch 0: train loss 2.3025748650360107
epoch 0: train accuracy 0.10318
Epoch #1: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 1: train loss 2.302471691055298
epoch 1: train accuracy 0.11104
Epoch #2: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 2: train loss 2.3023961328125
epoch 2: train accuracy 0.11948
Epoch #3: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 3: train loss 2.3023132469177248
epoch 3: train accuracy 0.11464
Epoch #4: 100%|██████████| 391/391 [03:09<00:00, 2.06batch/s]
epoch 4: train loss 2.3022110075378417
epoch 4: train accuracy 0.13944
Epoch #5: 100%|██████████| 391/391 [03:09<00:00, 2.07batch/s]
epoch 5: train loss 2.302078968963623
epoch 5: train accuracy 0.14272
Epoch #6: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 6: train loss 2.301888885650635
epoch 6: train accuracy 0.14406
Epoch #7: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 7: train loss 2.3016105324554443
epoch 7: train accuracy 0.16588
Epoch #8: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 8: train loss 2.3011589705657958
epoch 8: train accuracy 0.16082
Epoch #9: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 9: train loss 2.3003521165466307
epoch 9: train accuracy 0.17236
Epoch #10: 100%|██████████| 391/391 [03:07<00:00, 2.09batch/s]
epoch 10: train loss 2.29850362991333
epoch 10: train accuracy 0.18312
Epoch #11: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 11: train loss 2.2810532082366946
epoch 11: train accuracy 0.14732
Epoch #12: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 12: train loss 2.1821552590942384
epoch 12: train accuracy 0.1796
Epoch #13: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 13: train loss 2.1112220467376708
epoch 13: train accuracy 0.23972
Epoch #14: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 14: train loss 2.0582179355239867
epoch 14: train accuracy 0.25648
Epoch #15: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 15: train loss 2.021900921974182
epoch 15: train accuracy 0.26078
Epoch #16: 100%|██████████| 391/391 [03:06<00:00, 2.09batch/s]
epoch 16: train loss 1.9738730185699462
epoch 16: train accuracy 0.2647
Epoch #17: 100%|██████████| 391/391 [03:07<00:00, 2.09batch/s]
epoch 17: train loss 1.8935441204833985
epoch 17: train accuracy 0.28952
Epoch #18: 100%|██████████| 391/391 [03:06<00:00, 2.09batch/s]
epoch 18: train loss 1.8410831958389282
epoch 18: train accuracy 0.30516
Epoch #19: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 19: train loss 1.7923832113265992
epoch 19: train accuracy 0.32572
Epoch #20: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 20: train loss 1.7457027227783204
epoch 20: train accuracy 0.3512
Epoch #21: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 21: train loss 1.6902488793945312
epoch 21: train accuracy 0.3751
Epoch #22: 100%|██████████| 391/391 [03:11<00:00, 2.05batch/s]
epoch 22: train loss 1.6327673984146118
epoch 22: train accuracy 0.40214
Epoch #23: 100%|██████████| 391/391 [03:09<00:00, 2.06batch/s]
epoch 23: train loss 1.5799391551971436
epoch 23: train accuracy 0.42126
Epoch #24: 100%|██████████| 391/391 [03:09<00:00, 2.07batch/s]
epoch 24: train loss 1.5301413543319702
epoch 24: train accuracy 0.44142
Epoch #25: 100%|██████████| 391/391 [03:09<00:00, 2.06batch/s]
epoch 25: train loss 1.4802801627731323
epoch 25: train accuracy 0.45506
Epoch #26: 100%|██████████| 391/391 [03:13<00:00, 2.02batch/s]
epoch 26: train loss 1.4414562062835694
epoch 26: train accuracy 0.47278
Epoch #27: 100%|██████████| 391/391 [03:11<00:00, 2.04batch/s]
epoch 27: train loss 1.4108170933151245
epoch 27: train accuracy 0.48276
Epoch #28: 100%|██████████| 391/391 [03:10<00:00, 2.05batch/s]
epoch 28: train loss 1.3730968777084351
epoch 28: train accuracy 0.49822
Epoch #29: 100%|██████████| 391/391 [03:09<00:00, 2.06batch/s]
epoch 29: train loss 1.3446638483047486
epoch 29: train accuracy 0.5105
Epoch #30: 100%|██████████| 391/391 [03:10<00:00, 2.05batch/s]
epoch 30: train loss 1.314762866973877
epoch 30: train accuracy 0.52188
Epoch #31: 100%|██████████| 391/391 [03:10<00:00, 2.05batch/s]
epoch 31: train loss 1.2790468850326537
epoch 31: train accuracy 0.53656
Epoch #32: 100%|██████████| 391/391 [03:13<00:00, 2.03batch/s]
epoch 32: train loss 1.2483248308181762
epoch 32: train accuracy 0.55074
Epoch #33: 100%|██████████| 391/391 [03:10<00:00, 2.05batch/s]
epoch 33: train loss 1.2205873084259034
epoch 33: train accuracy 0.55768
Epoch #34: 100%|██████████| 391/391 [03:09<00:00, 2.06batch/s]
epoch 34: train loss 1.1906331009292603
epoch 34: train accuracy 0.57484
Epoch #35: 100%|██████████| 391/391 [03:09<00:00, 2.07batch/s]
epoch 35: train loss 1.166729497909546
epoch 35: train accuracy 0.5834
Epoch #36: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 36: train loss 1.1390789764785767
epoch 36: train accuracy 0.59276
Epoch #37: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 37: train loss 1.1200671800994872
epoch 37: train accuracy 0.60146
Epoch #38: 100%|██████████| 391/391 [03:07<00:00, 2.09batch/s]
epoch 38: train loss 1.096705994567871
epoch 38: train accuracy 0.60922
Epoch #39: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 39: train loss 1.074966111869812
epoch 39: train accuracy 0.61692
Epoch #40: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 40: train loss 1.0557942223358154
epoch 40: train accuracy 0.62428
Epoch #41: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 41: train loss 1.037175574054718
epoch 41: train accuracy 0.63138
Epoch #42: 100%|██████████| 391/391 [03:07<00:00, 2.08batch/s]
epoch 42: train loss 1.0174288223075867
epoch 42: train accuracy 0.63834
Epoch #43: 100%|██████████| 391/391 [03:16<00:00, 1.99batch/s]
epoch 43: train loss 1.001808921661377
epoch 43: train accuracy 0.64408
Epoch #44: 100%|██████████| 391/391 [03:16<00:00, 1.99batch/s]
epoch 44: train loss 0.9804404489135742
epoch 44: train accuracy 0.65104
Epoch #45: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 45: train loss 0.9655512557983399
epoch 45: train accuracy 0.6557
Epoch #46: 100%|██████████| 391/391 [03:08<00:00, 2.07batch/s]
epoch 46: train loss 0.946725559463501
epoch 46: train accuracy 0.66506
Epoch #47: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 47: train loss 0.9336943419647217
epoch 47: train accuracy 0.66842
Epoch #48: 100%|██████████| 391/391 [03:10<00:00, 2.05batch/s]
epoch 48: train loss 0.9135923287773132
epoch 48: train accuracy 0.67484
Epoch #49: 100%|██████████| 391/391 [03:08<00:00, 2.08batch/s]
epoch 49: train loss 0.9040963884925842
epoch 49: train accuracy 0.68108
학습이 진행됨에 따라 train accuracy가 점점 증가하는 추세를 보인다. Epoch를 50에서 더 늘린다면 더욱 정확한 결과를 얻을 수 있을 것이다.
model.eval()
running_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
B, NC, C, H, W = images.shape
images = images.view(B * NC, C, H, W)
output = model(images)
output = output.view(B, NC, -1).mean(dim=1)
loss = criterion(output, labels)
running_loss += loss.item() * labels.size(0)
preds = output.argmax(dim=1)
correct += (preds == labels).sum().item()
total += labels.size(0)
print(f"train loss {running_loss / len(test_loader.dataset)}")
print(f"train accuracy {correct / len(test_loader.dataset)}
train loss 1.0615737098693847 train accuracy 0.6212
학습시킨 모델을 Test data prediction에도 사용해본다.
몇가지 예측 결과를 이미지와 함께 확인해보자.
cifar10_classes = [
"airplane", "automobile", "bird", "cat", "deer",
"dog", "frog", "horse", "ship", "truck"
]
model.eval()
images, labels = next(iter(test_loader))
images = images[:50]
labels = labels[:50]
images = images.to(device)
labels = labels.to(device)
B, NC, C, H, W = images.shape
images = images.view(B * NC, C, H, W)
with torch.no_grad():
outputs = model(images)
outputs = outputs.view(B, NC, -1).mean(dim=1)
preds = outputs.argmax(dim=1)
for i in range(5):
true_label = cifar10_classes[labels[i].item()]
pred_label = cifar10_classes[preds[i].item()]
mean = torch.tensor([0.485, 0.456, 0.406]).view(1,3,1,1).to(device)
std = torch.tensor([0.229, 0.224, 0.225]).view(1,3,1,1).to(device)
images_vis = images * std + mean # unnormalize
images_vis = images_vis.clamp(0, 1)
plt.figure(figsize=(12,3))
for i in range(5):
plt.subplot(1,5,i+1)
plt.imshow(images_vis[10*i].permute(1,2,0).cpu())
plt.title(f"P:{cifar10_classes[preds[i]]}\nT:{cifar10_classes[labels[i]]}")
plt.axis("off")
plt.show()

'컴퓨터 비전' 카테고리의 다른 글
| ResNet의 구조 (0) | 2026.01.30 |
|---|---|
| VGG-Net 구조와 구현 (0) | 2026.01.29 |
| AlexNet의 구조와 구현 (0) | 2026.01.10 |
| 이미지에서의 Edge Detection (First Derivative, Laplacian, Canny Edge Detection) (0) | 2026.01.04 |
| openCV 활용한 문서 스캐너 프로그램 만들기(2) (0) | 2025.12.28 |