Published 2024. 12. 8. 15:44

[딥러닝] Neural Network Model의 설계

AI/Deep Learning

Neural Network Model

1) 오차 산출 단계

2) 가중치 수정 단계 (Optimization)

3) 평가 단계

위의 3단계의 단계를 거치며 모델이 만들어지고 다듬어진다.

Ouput Layer (출력층)

- Softmax Layer : 0~1 사이의 값을 가지며, 확률화 시켜주는 함수

Parametric vs Non-parametric

1) Parametric Layers

- Backpropagation을 통해 학습을 진행하는 계층

- 은닉층, 출력층, 합성곱 층 등

- 모델의 크기(메모리)를 좌우함

- 모델이 클 수록 학습 시간이 오래 걸리고, 더 많은 메모리를 필요로 하며 더 좋은 GPU 성능을 요구함

2) Non-parametric Layers

- 학습을 진행하지 않고, 단순한 계산이나 정보 추출을 진행하는 계층

- 활성화 계층, 입력층, Pooling 계층

- 모델의 크기에 변화가 없거나 오히려 줄여주는 역할을 수행

- 신호 전달을 주목적으로 함

Model Generation Process

1) Layer 정의

- 모델에서 어떤 layer를 사용할 것인지 정의 (linear, RELU 등)

- 각 layer는 parameter를 추적함

- non-parametric layer는 사전에 정의하지 않은 경우도 있음

2) Forward 정의

- 텐서의 흐름에 따라 실제 layer를 배치

- non-parametric layer는 여기서 호출하는 경우도 있음

- 모델을 call할 경우, forward 함수가 자동으로 호출됨

- forward 함수를 직접 호출하지 않음

- 설계상 Background operation이 존재

Model Layers

- PyTorch에서 layer는 모듈(nn.Module)로 구현

- Parametric & Non-parametric 구분하지 않고 모두 모듈 클래스로 관리

- 하나의 모듈은 다양한 모듈을 포함할 수 있음

모델 구성 방법

Sequential()

- 텐서를 순차적으로 flow하는 컨테이너

- Layer를 직접 순차적으로 추가하거나, OrderedDict의 모듈을 통해 선언 가능

- Sequential의 forward() 메서드는 입력 텐서를 흘려 보낸 후, 각 layer를 순차적으로 실행된 결과를 반환

- Sequential은 전체 layer의 조합을 하나의 모듈(모델)로 고려

- Sequential vs torch.nn.ModuleList : ModuleList는 말 그대로 모듈의 list, sequential은 연결되어 있음

- 비교적 단순한 모델을 구성하는 경우 사용

Model in Model

- Customize하게 만든 모델(모듈)을 순서와 상관없이 연계 가능

- Sequential의 경우, 입력이 순차적으로 bypass하는 특징이 있음

- 이렇게 모델을 구성하는 경우, 다양한 형태로 layer 조작 가능

- 비교적 고급 모델을 구성하는 경우 사용

모델 다루기

parameters()

- Parameter는 모듈과 함께 사용될 때 매우 특별한 속성을 가진 텐서 하위 클래스

- 모듈에 할당되면 parameter 목록에 자동으로 추가

params = model.parameters()

for param in params:
	print(param.shape)
    
# 출력)
# torch.Size ([512, 784])
# ...

named_parameters()

- Parameter 목록에 등록된 parameter를 이름 정보와 함께 호출

- 값을 수정하면 모델의 값이 직접 바뀜 (state_dict() 와 차이점!)

- Tensor는 메모리에서 관리되고 있기 때문

named_params = model.named_parameters()

for name, param in named_params:
	print("{}: {}".format(name, param.shape))
    
# 출력
# linear_relu_stack.0.weight: torch.Size([512, 784])
# linear_relu_stack.0.bias: torch.Size([512])
# ...

state_dict()

- 기본 기능은 named_parameter()과 동일

- 단, 자료형이 다르고 parameters(), named_parameters() 함수와 다르게 tensor의 연결이 안되어있음

- State_dict 객체는 파이썬 dictionary 형이기 때문에 쉽게 저장, 업데이트, 변경 및 복원이 가능

- Optimizer, 학습 가능한 parameter, 등록된 buffer가 있는 레이어에 state_dict 존재

- 특히, optimizer에는 옵티마이저의 상태와 사용된 하이퍼 파라미터에 대한 정보가 포함

state_dict = model.state_dict()

for name, param in state_dict.items():
	print("{}: {}".format(name, param.shape))
    
# 출력
# linear_relu_stack.0.weight: torch.Size([512, 784])
# linear_relu.stack.0.bias: torch.Size([512])
# ...

Save & Loading Model Weights

# Save the model
# 1. 현재 모델의 parameter 저장 (추후 확인용)
old_model = model.state_dict()

# 2. 모델 저장
torch.save(model.state_dict(), 'model.pth')

# Loading the model
# 3. Load the model
model.load_state_dict(torch.load('model.pth'))
new_model = model.state_dict()

# Save the optimizer and buffers
from torch.optim import SGD
optimizer = SGD(model.parameters(), lr=0.01, momemtum=0.9)
optim_state_dict = optimizer.state_dict()

torch.save({
	'model': model.state_dict(),
    'optim': optimizer.state_dict()
}, 'model.pth')

모델의 학습

def train(model, train_loader, criterion, optimizer, device):
	model.train() # 모델을 학습 모드로 설정. model.eval()은 평가 모드
    
    running_loss = 0.0 # 미니 배치별 loss값을 누적할 변수
    
    for datas, labels in train_loader:
    	datas, labels = datas.to(device), labels.to(device) # 미니 배치별 데이터, 레이블 장치 할당
        
        # 순전파
        outputs = model(datas) # forward() 함수를 타고 값을 내줌
        
        # 손실 계산
        loss = criterion(outputs, labels)
        
        # 기울기 초기화
        optimizer.zero_grad() # loss.backward()을 수행하면 기울기가 누적되므로 필요함
        
        # 역전파
        loss.backward()
        
        # 파라미터 업데이트
        optimizer.step()
        
        # 손실 누적 (평가용)
        running_loss += loss.item()
        
    # 현재 Epoch의 평균 손실 값 계산 및 반환
    return running_loss / len(train_loader)
    
        
# 순전파 -> 손실 계산 -> (기울기 초기화) -> 역전파 -> 파라미터 업데이트

모델의 평가

def evaluate(model, test_loader, criterion, device):
	model.eval() # 모델을 평가 모드로 설정
    
    running_loss = 0.0
    
    with torch.no_grad(): # 평가 중에는 기울기 계산을 하지 않음
    	for datas, labels in test_loader:
        	datas, labels = datas.to(device), labels.to(device)
            
            # 순전파
            outputs = model(datas)
            
            # 손실 계산
            loss = criterion(outputs, labels)
            
            # 손실 누적
            running_loss += loss.item()
            
        # 현재 Epoch의 평균 손실 값 계산 및 반환
    	return running_loss / len(train_loader)

'AI > Deep Learning' 카테고리의 다른 글

[딥러닝] CNN (Convolutional Neural Network) (0)	2024.12.11
[딥러닝] 과대적합 (Overfitting) (0)	2024.12.10
[딥러닝] 경사 하강법 (Gradient Descent) (0)	2024.12.10
[딥러닝] 인공 신경망의 기본 원리 (0)	2024.12.09
[딥러닝] Dataset & DataLoader (0)	2024.12.08