연두색연필
LimePencil's Log
연두색연필
전체 방문자
오늘
어제

About Me

  • GitHub
  • Instagram
  • Gmail

인기 글

  • 분류 전체보기 (69)
    • Machine Learning (3)
      • MNIST (2)
    • PS & Algorithm (9)
    • Web (4)
      • HTML (1)
      • JavaScript (3)
    • Rust (2)
      • The Rust Programming Langua.. (2)
    • 논문 리뷰 (12)
      • Reinforcement Learning (10)
      • Computer Vision (2)
    • DevOps (17)
      • Docker (9)
      • Kubernetes (8)
    • Development (6)
      • SQL (6)
    • 잡다한 것들 (15)
      • 부스트캠프 AI Tech 4기 (13)

최근 댓글

Tag

  • 백준
  • Python
  • K8s
  • 쿠버네티스
  • 부스트캠프
  • SQL
  • 파이썬
  • 도커
  • docker
  • Kubernetes
05-27 18:55
hELLO · Designed By 정상우.
연두색연필

LimePencil's Log

잡다한 것들/부스트캠프 AI Tech 4기

부스트캠프 5주차 학습 일지 - Computer Vision Basics

2022. 10. 18. 09:45

사용한 기술 스택들:    

10/18 화

학습한 것들:

CNN visualization aims to see what's inside CNN(black box)

- CNN visualization can be used to debug

 

Filter visualization: can be used to show activation visualization of an image

- it is hard to visualize like this after the first convolution layer

 

Two focus: focus on data/focus on models

 

Nearest neighbors in feature space: can look for clusters that are semantically similar/ and not just pixel-wise comparison

- locate each image to a high-dimensional feature space

 

In order to reduce that high dimensional space into observable 2d space, a technique called t-SNE(t-distributed stochastic neighbor embedding is used)

 

Use the channel's activation in layers to look for where the network is putting attention

- crop image around the max activation to make patches of what the channel is focusing on

 

Class visualization: generating a synthetic image that triggers maximal class activation

- use gradient ascent

- get the prediction score of a dummy image, backpropagate to maximize the target class until the image, and change the image with the gradient

 

Saliency by Occlusion map: hide some of the images, see how the score changes according to the location of the mask, and apply the mask throughout the image to get a heatmap representation that tells which part is important

 

Saliency by backpropagation: get the class score of the target source image, backpropagate till the image and visualize the gradient magnitude map

- when going backpropagation with relu, apply relu to the gradient too (DeConv)

- save the relu pattern and apply it backward(Backprop)

- add the top two methods together (guided backpropagation)

 

CAM(Class activation mapping): check what part of the image is contributing to the classification

- use global average pooling instead of fully connected

- can interpret why the network classified the input to that class, GAP enables localization without supervision

- ResNet and GoogLeNet already have the GAP layer

 

Grad-CAM: use the normal model and backpropagate until the convolutional layer, and apply global pooling to get CAM

 

Guided grad-cam = grad-CAM+ guided backpropagation

 

GAN dissection: use interpretation not only for analysis but for interpretation

 

Instance segmentation: semantic segmentation + distinguishing instances

 

Mask R-CNN: Faster R-CNN + Mask branch

- use RoIAlign instead of RoI

- Mask branch for binary classification of each class

 

YOLACT(You Only Look At CoeefficienTs): one-stage instance segmentation

- use protonet to assemble each instance to the final output

 

YolactEdge: extending YOLOACT to video

 

Panoptic segmentation: stuff + instances of things

 

UPSNet: semantic and instance head → Panoptic head → panoptic logits

 

VPSNet: UPSNet for video

- fusion at pixel level

- track instances at the object level

 

 

Landmark localization: predicting the coordinates of key points

- Coordinate regression: inaccurate and biased

- Heatmap classification: better performance but computationally expensive

 

Landmark location can be converted to a gaussian heatmap

 

Stacked hourglass modules allow for repeated bottom-up and top-down inference that refines the output of the previous hourglass module

- similar to UNet, but it is not skipped directly but is rather skipped through a convolutional layer

 

 

UV map: flattened representation of 3D geometry, invariant to motion

 

DensePose R-CNN: 3D landmark localization using faster R-CNN and 3D surface regression branch

 

RetinaFace: feature pyramid network + multi-task branches

 

One can detect objects using key points by landmark detection

 

CornerNet: use two(top-left, bottom-right) corners for bounding box

 

CenterNet 1: add a center point to the CornerNet

 

CenterNet 2: use width, height, and center to find the bounding box


10/19 수

학습한 것들:

 

Autograd: automatic gradient calculating API

- requires_grad argument allows storing the gradient to backpropagate

- retain_graph argument allows to not free intermediate resource for multiple calculations of gradients

- hook allows retaining the gradient when it is getting calculated

- when applying hook, do not modify the argument but return a new tensor

 

Conditional generative model: explicitly generate an image corresponding to a given condition

- can be used to translate images, super-resolution, etc

 

Using regression produces a safe average-looking image because of MAE and MSE, but GAN loss implicitly compares whether it has seen a fake or real image

 

Pix2Pix: translating an image to another style of image

- GAN loss induces more realistic output close to the real distribution

- semantic map to photo, colorization

- need pairwise data

 

CycleGAN: allows translation between domains with non-pairwise datasets

- loss: GAN loss(in both direction) + Cycle-consistency loss

- cycle-consistency loss is calculated by converting X to Y and back again and finding the difference

 

Perceptual loss: by utilizing pretrained classifiers, it can be used to utilize the perception just like human

- make a loss network with VGG to get the loss for the network of image translation

 


10/20 금

multimodal: using multiple inputs for an output

 

problems of multi-modal learning:

- it is a problem because the shape of the input is different in multimodal

- also, there is an unbalance between heterogeneous feature spaces

- bias on specific modality

 

Text embedding: the text is mapped to dense vectors

- learning dense representation allows for generalization

 

Word2vec: skip-gram model

- learn to predict neighboring n words for understanding the relationship between words

 

Joint embedding: sticking two models together and adding them later on in the last layer

- image tagging

 

Cross modal translation: change one type of modal to another type

- image captioning

- read images, CNN, attention to a specific part, and use that to generate text

- text-to-image

- cGAN, make generator and discriminator that adds the text data as input too for both the generator network and the discriminator network

 

Cross modal reasoning: using multiple modals to infer something

- visual question answering

 

Sound representation: use Fourier transform to convert wave from to power spectrum, and stack the spectrum along the time axis to make a spectrogram for learning

 

SoundNet: learn audio representation from synchronized RGB frames

- teacher-student manner: knowledge from the visual model is transferred to the sound model

 

Speech2Face: trained in a self-supervised manner for making features compatible

 

Image2Speech: image, CNN, Attention, sub-word unit, to speech

 

Sound source localization: use the audio net and visual net with an attention net to visualize where the sound comes from

 

3D data is represented in many types of styles such as mesh, volumetric, part assembly, point cloud

 

3D object recognition: use 3D CNN

 

3D object detection: useful for autonomous driving

 

3D object segmentation: useful for neuroimaging

 

Transformer: long-term dependency by attention

 

 

 

 

'잡다한 것들 > 부스트캠프 AI Tech 4기' 카테고리의 다른 글

CV 기초대회 최종 회고  (0) 2022.11.04
6주차 학습 일지 - CV 기초대회  (0) 2022.10.24
부스트캠프 4주차 학습 일지 - Computer Vision Basics  (0) 2022.10.11
부스트캠프 3주차 학습 일지 - Deep Learning Basics  (0) 2022.10.03
부스트캠프 2주차 학습 일지 - Pytorch Basics  (0) 2022.09.26
    '잡다한 것들/부스트캠프 AI Tech 4기' 카테고리의 다른 글
    • CV 기초대회 최종 회고
    • 6주차 학습 일지 - CV 기초대회
    • 부스트캠프 4주차 학습 일지 - Computer Vision Basics
    • 부스트캠프 3주차 학습 일지 - Deep Learning Basics
    연두색연필
    연두색연필
    ML, Programming, PS, 삶의 순간을 기록

    티스토리툴바