부스트캠프 3주차 학습 일지 - Deep Learning Basics

사용한 기술 스택들:

10/4 화

학습한 것들:

머신 러닝: 데이터로 인공지능을 학습

딥러닝: neural network를 사용하는 머신러닝

Key components of deep learning:

data
model
loss
algorithm

딥러닝의 흐름:

2012 - AlexNet

2013 - DQN (DeepMind의 시초)

2014 - Encoder/Decoder (input을 encoding 해서 원하는 output으로 decode)

2014 - Adam (결과가 잘 나오는 optimizer)

2015 - Generative Adversarial Network

2015 - ResNet(Residual Networks)

2017 - Transformer

2018 - BERT(Bidirectional Encoder Representations from Transformers

2019 - Big Language Models (GPT-X)

2020 - Self Supervised Learning

회고:

대회에서는 역시 nlp가 답인거 같다..

10/5 수

학습한 것들:

Generalization: how well the learned model will behave on unseen data

- avoiding overfitting

cross-validation: cycling through folds of train data and selecting one as validation data

Bias and variance: reducing bias = increasing variance and vice versa

Bootstrapping: using random sampling with replacement

Bagging(Bootstrapping aggregating): multiple models are being trained and predictions aggregated

Boosting: focus on samples that are hard to classify

- bringing together a set of learners in which the learner learns from the mistake of the previous weak learner

Small batch를 쓰는게 더 좋다 하지만 시간이 더 걸린다

SGD: learning rate 만큼 gradient 변경

Momentum: 관성처럼 그전 batch의 결과를 현재 batch에 반영한다

Nesterov accelerated gradient: gradient로 변경된 그곳에서 gradient를 한번 더 해줌으로써 momentum이 최솟값 근처에서 돌아다니는 걸 방지해준다

Adagrad: parameter의 변화 클수록 learning rate를 줄이고 변화가 작으면 learning rate을 늘린다

Adadelta: learning rate이 없는 adagrad의 활용

RMSprop: Exponential Moving Average를 이용한다

Adam(Adaptive Moment Estimation): momentum + adaptive learning rate

Regularization:

Early stopping
- stopping training when validation errors start to increase
Parameter Norm Penalty
- makes sure the parameter does not become too big, smooths function space
Data Augmentation
- more data, the better
- orientating images to make more similar but different data
Noise Robustness
- add random noise to images
Label Smoothing
- mix-up: mixing input and output of two randomly selected training data
- cut-mix: similar to mix-up but replaces part of an image with another data
Dropout
- randomly set some neurons to zero
Batch Normalization
- normalize the data of batch

Convolution: can blur, emboss, outline an image

CNN: consists of convolution layer, pooling layer, and fully connected layer

- convolution layer and pooling layer is used for feature extraction

- Fully connected layer is used for decision making

Stride: skipping pixels

Padding: giving corners extra data to make up for not being able to reach corners as much

Number of parameters: ((width of filter * length of filter * depth of input channel)+1)* depth of output channel)

1x1 convolution: dimension reduction to reduce the number of parameters while increasing the depth of CNN

AlexNet 성공 이유: ReLU, 2 GPU, local response normalization, dropout, data augmentation, overlapping pooling

ReLU:

preserves the property of a linear model
easy to optimize
good generalization
overcome the vanishing gradient problem

VGGNet: only 3x3, dropout, 1x1 convolution

- using two 3x3 is better than 5x5 because it has less

GoogLeNet: inception block 사용(parameter 줄이기)

- 1x1 parameter을 사용하면 parameter을 줄일수 있음

ResNet: solved the problem of overfitting(excessive number of parameters), a deeper neural network is harder to train

- utilized skip connection

- bottleneck architecture(using 1x1)

DenseNet: concatenate instead of addition

- transition block: reduce parameter

Semantic segmentation: classify object by pixel

- fully convolutional network(no dense layer)

- this process allows to output heatmap

Deconvolution: inverse of convolution

Detection: creating a bounding box

- R-CNN: make a lot of region proposals, CNN using AlexNet, using SVM

- SPPNet: CNN runs once

- Fast R-CNN: makes bounding box regressor

- Faster R-CNN: region proposal network+Fast R-CNN

- YOLO: simultaneously predicts multiple bounding boxes and class probability

Sequential Model: cannot know the size of the input

RNN(Recurrent Neural Network): feed the output of the past as input

- problem: short-term dependencies, vanishing/exploding gradient

LSTM(Long Short Term Memory): solves the problem of RNN by introducing the previous cell state, hidden state

- forget gate: decide what to throw away

- input gate: decide what to store

- update cell

- output gate

GRU(Gated Recurrent Unit): no hidden state(reset gate and update gate)

회고:

배운게 진짜 많아서 차근차근 복습을 해봐야 할 것 같다.

10/6 목

학습한 것들:

Transformer: first sequence transduction model based entirely on attention

- structure: change a sequence to another sequence

- can encode many sequences at once unlike RNN

- stacks of encoder and decoder

Encoder: self-attention and feed-forward neural network

Steps for encoder:

represent words with embedding vectors (give a unique number for each word)
transformer encodes each word to feature vectors with self-attention
1. self-attention uses the information of other words while they are put into the encoder as inputs
2. the feed-forward network is independent while the path of self-attention is dependent upon each other
self-attention looks for relationships between words (ex. "it" in a sentence refers to "the animals")
for each vector, makes 3 vectors each with a neural network of own
1. queries (Q)
2. keys (K)
3. values (V)
find the score by finding a dot product of the query of self and keys to everything
1. the score tells how much interaction is needed for the word
2. divide the score by $\sqrt{d_x}$
3. find softmax of score
4. multiply the softmax with the values
5. add all the values to get a final representation of the word
$softmax \left(\frac{Q \times K^T}{\sqrt{d_x}} \right)\times V=Z$
concatenate all the Z matrice
multiply with the weight matrix to produce the outcome dimension that is the same as the input
add positional encodings to make sure the order is taken into account
find layer norm
feed-forward
repeat

The time complexity of transformer: $O(N^2)$ because we need to iterate through all words for each word

$i$ number of heads used as input, $i$ number of attention heads(encoded vectors) as output

key vector and value vector are sent to the decoder

a self-attention layer is only allowed to attend to an earlier position by masking

in encoder-decoder attention, creates a query matrix from the layer below and use values and keys from the encoder

Vision transformer used for image classification too

회고:

제일 어려운 GAN이랑 Transformer을 배웠다 나중에 더 따로 공부를 해야할 것 같다.

10/7 금

학습한 것들:

Generative Model: learning a probability distribution $p(x)$

- Bernoulli distribution

- Categorical distribution

Independence modeling lowers parameters, but it removes the dependency

- can solve this problem by Markov's assumption

Autoregressive models leverage this conditional independency through Markov's assumption

Autoregressive model: predicting next term based on previous terms

- needs ordering of random variables

NADE(Neural Autoregressive Density Estimator): is an explicit model that can compute the density of given input

Summary of Autoregressive Model: easy to sample, easy to compute the probability, easy to be extended to continuous variables

Maximum likelihood learning: minimizing KL-divergence maximizes the expected log-likelihood

- approximate the expected log-likelihood with the empirical log-likelihood

ERM(Empirical Risk Minimization): often used method for maximum likelihood learning

- prone to overfitting

- reduce model space

Autoencoder is not a generative model

Variational Autoencoder aims to maximize $p(x)$

GAN(Generative Adversarial Networks): discriminator and generator

Diffusion Model: make an image from noise progressively

- diffusion process: inject noise

- reverse process: denoise the image

회고:

이번주도 끝이 났다. 다음주도 열심히 달려보자!

'잡다한 것들 > 부스트캠프 AI Tech 4기' 카테고리의 다른 글

6주차 학습 일지 - CV 기초대회 (0)	2022.10.24
부스트캠프 5주차 학습 일지 - Computer Vision Basics (0)	2022.10.18
부스트캠프 4주차 학습 일지 - Computer Vision Basics (0)	2022.10.11
부스트캠프 2주차 학습 일지 - Pytorch Basics (0)	2022.09.26
부스트캠프 1주차 학습 일지 - Python & AI Math (0)	2022.09.19

10/4 화

학습한 것들:

회고:

10/5 수

학습한 것들:

회고:

10/6 목

학습한 것들:

회고:

10/7 금

학습한 것들:

회고:

'잡다한 것들 > 부스트캠프 AI Tech 4기' 카테고리의 다른 글

티스토리툴바