연두색연필
LimePencil's Log
연두색연필
전체 방문자
오늘
어제

About Me

  • GitHub
  • Instagram
  • Gmail

인기 글

  • 분류 전체보기 (69)
    • Machine Learning (3)
      • MNIST (2)
    • PS & Algorithm (9)
    • Web (4)
      • HTML (1)
      • JavaScript (3)
    • Rust (2)
      • The Rust Programming Langua.. (2)
    • 논문 리뷰 (12)
      • Reinforcement Learning (10)
      • Computer Vision (2)
    • DevOps (17)
      • Docker (9)
      • Kubernetes (8)
    • Development (6)
      • SQL (6)
    • 잡다한 것들 (15)
      • 부스트캠프 AI Tech 4기 (13)

최근 댓글

Tag

  • SQL
  • 도커
  • 파이썬
  • Kubernetes
  • 백준
  • 쿠버네티스
  • Python
  • K8s
  • 부스트캠프
  • docker
06-06 23:37
hELLO · Designed By 정상우.
연두색연필

LimePencil's Log

잡다한 것들/부스트캠프 AI Tech 4기

부스트캠프 4주차 학습 일지 - Computer Vision Basics

2022. 10. 11. 23:48

사용한 기술 스택들:    

10/11 화

학습한 것들:

AI consists of cognition&percention, memory&inference, decision making, reasoning

 

Using multi-modal association for perception

 

Vision is important because 75% of data comes from vision data

 

Computer vision is the inverse of computer rendering

 

Using good and bad of our visual perception to make a CV model that compensates for the imperfection

 

Old machine learning used feature extraction by people, but deep learning does not require feature extraction by human

 

CVPR is a top 5 publication in STEM and it is getting a lot of attention from companies

 

since we cannot memorize all the data in the world, we cannot use simple tools like k nearest neighbors

 

A single fully connected layer network cannot account for new data

 

CNN: looking at part of an image to extract feature

- as it shares parameters, it is flexible to change in location

 

Dataset is always almost biased

- training datasets and real data always have a gap

 

To fill this gap, a technique of increasing the number of datasets called data augmentation is used

 

Applying various image transformations to the dataset: crop, brightness, rotate, flip, affine transformation, cut mix

 

A technique called RandAug randomly uses transformation and finds out which one works the best

 

Annotating data is very expensive, so using the pretrained model can solve the problem

- Knowledge from one dataset can be used for another dataset

 

Approaches to Transfer Learning:

  1. Freezer other layers and train only the last fully connected layer
    1. preserves the knowledge from pretrained data
  2. Set a low learning rate for other layers and a high learning rate for the last fully connected layer
  3. Teacher-student learning: use a pretrained model as a teacher to teach to a model that is not trained. this can be done to do unsupervised learning.
    1. For labeled data, use soft label(percentage) to make a loss for both teacher and student

 

Softmax with temperature: allow for non-extreme outcomes

  • Normal softmax: $\frac{exp(z_i)}{\sum_{j} exp(z_j)}$
  • Softmax with temperature: $\frac{exp(z_i/T)}{\sum_{j} exp(z_j/T)}$

 

회고:

Computer Vision 열심히 배워서 나중에 써먹어야지


10/12 수

학습한 것들:

Deeper networks learn more powerful features

 

However, the deeper the layer, the gradient vanishes or explodes, it is complex to compute, and it degrades

 

GoogLeNet: implement inception module that uses convolution of different filters in parallel

- 1x1 conv changes channel size

 

Auxiliary classifier: classifier in the middle of the layer to solve the problem of vanishing/exploding gradient

 

Degradation problem: as network depth increases, accuracy gets saturated

 

The solution to the degradation problem: add input X to the target function so that the identity remains the same

- called a residual block that uses a shortcut connection

- has $ O(2^n)$ paths

 

In the dense block, every output of each layer is concatenated along the channel axis to account for the vanishing gradient problem.

 

회고:

 

배운게 진짜 많아서 차근차근 복습을 해봐야 할 것 같다.


10/13 목

학습한 것들:

Semantic segmentation: classifying each pixel of an image into a category

- Do not care about the object, but only about the semantic category

 

Fully Convolutional Networks(FCN): no fully connected layer

 

A fully connected layer outputs a fixed dimensional vector that discards spatial coordinate

The fully convolutional layer outputs a classification map that has spatial coordinate

 

1x1 convolution layer classifies every feature vector of the convolutional feature map

- To solve the problem of having a low-resolution predicted score map, use upsampling to the size of the input image

 

Methods of upsampling:

  1. Transposed convolution: inversed convolution operation
    1. checkerboard artifact due to uneven overlapping
  2. Upsampling+convolution: interpolation followed by convolution

Adding a skip connection to the convolutional network can preserve higher spatial resolution

 

U-Net: FCN that predicts a dense map by concatenating feature maps from contracting path

- more precise segmentations

- repeatedly applying 2x2 up-convolution

- as expanding, the contracted layer is concatenated to the layer

 

Conditional Random Fields post-processes a segmentation to be refined to follow image boundaries

 

Dilated convolution: inflate the kernel by inserting spaces between the kernel element

- enable exponential expansion of the receptive field

 

Depthwise separable convolution: depthwise convolution + pointwise convolution

 

회고:

Segmentation에 대해서 배우고 팀원들도 모았다. 왠지 느낌이 좋은데?

 


10/14 금

학습한 것들:

Instance segmentation: even if the object is the same, it is classified as a different instance

 

Phanoptic segmentation: semantic segmentation+instance segmentation

 

Object detection: classification + box localization

- useful for autonomous driving, OCR

 

Traditional method - Selective search: over-segmentation, iteratively merging similar regions, extracting candidate boxes 

 

Two-stage detector: region proposal + image classification

 

R-CNN: region proposal, warp each region, CNN, classify regions

 

Fast R-CNN: recycle a pre-computed feature for multiple object detection

- convolution feature map from the original image, region of interest feature map extraction, class and box prediction for each ROI

 

Faster R-CNN: end-to-end object detection by neural region proposal

- uses a metric called Intersection over Union(IoU): area of overlap/area of union

- feature map, neural region proposal,  classification, remove other boxes with IoU>=0.5

 

One-stage detector: no RoI pooling

 

You Only Look Once: S x S grid on input, class probability map + bounding boxes&confidence

 

Single Shot MultiBox Detector: use multiple feature maps to model diverse spaces of box shapes

 

Class imbalance problem: more negative bounding boxes than positive bounding boxes

- use Focal loss for solution: improved cross-entropy loss

- give less weight to easy ones

 

RetinaNet: feature pyramid network + class/box classification networks

 

DETR: transformer for object detection

 

회고:

이번주도 끝이 났다. 다음주도 열심히 달려보자!

'잡다한 것들 > 부스트캠프 AI Tech 4기' 카테고리의 다른 글

6주차 학습 일지 - CV 기초대회  (0) 2022.10.24
부스트캠프 5주차 학습 일지 - Computer Vision Basics  (0) 2022.10.18
부스트캠프 3주차 학습 일지 - Deep Learning Basics  (0) 2022.10.03
부스트캠프 2주차 학습 일지 - Pytorch Basics  (0) 2022.09.26
부스트캠프 1주차 학습 일지 - Python & AI Math  (0) 2022.09.19
    '잡다한 것들/부스트캠프 AI Tech 4기' 카테고리의 다른 글
    • 6주차 학습 일지 - CV 기초대회
    • 부스트캠프 5주차 학습 일지 - Computer Vision Basics
    • 부스트캠프 3주차 학습 일지 - Deep Learning Basics
    • 부스트캠프 2주차 학습 일지 - Pytorch Basics
    연두색연필
    연두색연필
    ML, Programming, PS, 삶의 순간을 기록

    티스토리툴바