연두색연필
LimePencil's Log
연두색연필
전체 방문자
오늘
어제

About Me

  • GitHub
  • Instagram
  • Gmail

인기 글

  • 분류 전체보기 (70)
    • Machine Learning (3)
      • MNIST (2)
    • PS & Algorithm (9)
    • Web (4)
      • HTML (1)
      • JavaScript (3)
    • Rust (2)
      • The Rust Programming Langua.. (2)
    • 논문 리뷰 (12)
      • Reinforcement Learning (10)
      • Computer Vision (2)
    • DevOps (17)
      • Docker (9)
      • Kubernetes (8)
    • Development (6)
      • SQL (6)
    • 잡다한 것들 (15)
      • 부스트캠프 AI Tech 4기 (13)

최근 댓글

Tag

  • Python
  • 백준
  • Kubernetes
  • 파이썬
  • 도커
  • docker
  • K8s
  • ML
  • 부스트캠프
  • 쿠버네티스
07-15 00:08
hELLO · Designed By 정상우.
연두색연필

LimePencil's Log

잡다한 것들/부스트캠프 AI Tech 4기

부스트캠프 9주차 학습 일지 - Object Detection 1

2022. 11. 14. 13:53

사용한 기술 스택들:    

11/14 월

학습한 것들:

PR curve: calculated precision and recall from accumulated TP and FP sorted by confidence rate

 

Average precision: right rectangle estimation of PR curve

 

mAP(mean average precision):  AP of classes/number of classes

- mAP50 means that it only regards IOU over 50 as True Positive

 

IOU(Intersection Over Union): $\frac{Overlapping\;region}{combined\;region}$

 

FPS is an important measure for live video object detection

 

FLOPs (floating point operations): count of the operation performed

 

MMDetection: object detection open source written in PyTorch

 

Detectron2: Meta ai research library for object detection and segmentation

 

YOLOv5: coco pretrained model that is well developed

 

EfficientDet: image detection model based on efficientnet made by google

 

R-CNN:

  1. Extract Region proposals
    1. Sliding Window: use a fix-sized box to move that across the image to get bounding boxes
    2. Selective search: do an initial segmentation and add those together to get larger bounding boxes
      1. 2000 ROI
  2. Compute CNN features
    1. AlexNet
  3. Classify
  4. Adjust bounding box

R-CNN is not end-to-end

 

SPP-Net:

  1. Forward the whole image through ConvNet
  2. Extract ROI
  3. Spatial Pyramid Pooling Layer
    1. make the ROI the same size by passing through the layer instead of warping
  4. FC layer
  5. classify regions with SVM

 

Fast R-CNN:

  1. Forward the whole image through VGG16
  2. ROI projection to get ROI
    1. project the selective search ROI to the output of VGG16
    2. one batch only contains the ROI of an image
  3. ROI pooling to get features with the same size
    1. pyramid level 1 with 7x7 grid size
  4. FC layer
  5. Softmax classifier + bounding box regressor

R-CNN, SPP-Net, and Fast R-CNN are not end-to-end

 

Faster R-CNN:

  1. Forward images through a network to get feature maps
  2. Use Region Proposal Network(RPN) to get ROI
    1. Replace selective search
    2. Anchor box
      1. divide the image into cells that have different bound box sizes and numbers for each
      2. RPN predicts if there is an object in the cell and the transformation needed for the anchor boxes
      3. do 3x3 to make 512 channel
      4. 1x1x2 for binary classification of the existence of the object
      5. 1x1x4 for bounding box regression
    3. NMS: remove the bounding box based on IoU and class score

11/15 화

학습한 것들:

mmdetection: 많은 프레임워크를 지원하고 빠름

- pytorch 기반 오픈소스 라이브러리

- Pipeline: Input, backbone, neck, dense prediction, prediction

- config 파일로 설정

- config 상속 받고 부분만 바꿈

 

Config 기본 구조:

  • dataset: coco, VOC, cityscape
  • model: faster_rcnn, RetinaNet, RPN
    • 2stage model
      • type: type of model
      • backbone: a network that converts an image to feature map
        • can add a custom backbone
      • neck: connects backbone and head
      • rpn_head: region proposal network
      • RoI_head: region of interest
  • schedule
  • default_runtime

 

Detectron2: OD 말고 다른 알고리즘들도 지원함

- Pipeline: Setup config, setup trainer, start training

- 학습 방식은 mmdetection과 비슷함

 

Neck: Backbone과 RPN을 연결시켜주는 역할

- backbone의 중간 feature들도 사용하면서 다양한 크기의 객체를 더 잘 탐지할 수 있다

- 하위 level의 feature은 semantic이 약하므로 상대적으로 sematic이 강한 상위 feature와의 교환이 필요

- level당 feature을 섞어줌

 

featurized image pyramid: various resized image that is used to get the feature

 

single feature map: get output as the feature by passing an image

 

pyramidal feature hierarchy: pass through like a single feature map, but use the middle layer's feature too

 

feature pyramid network: give information from the high level to the low level by creating a top-down pathway

  • down-top and top-down features are added by doing 1x1 conv for the lateral connection and 2x upsampling convolution for the top-down pathway

 

Path Aggregation Network(PANet): add down-top pathway after top-down pathway for deep CNN

  • do RoI pooling for all the features

 

DetectoRS: looking and thinking twice

  • recursive feature pyramid: FPN that is done recursively
  • ASPP: give different dilation rates to increase the convolution receptive field size

EfficientDet: PANet that removes the node that is useless

  • weighted feature fusion: give weight to the layers to differentiate low level and high level
  • connect the lateral pathway to the down-top pathway too

NASFPN: find the FPN architecture by neural network search

  • Not generalizable

AugFPN: to solve the problem of loss of information on the highest feature map

  • Residual Feature Augmentation: give semantic information of a high level directly to the final pyramid
    • ratio-invariant adaptive pooling
  • Soft RoI selection: use all features to get RoI by using weights

1-stage detectors: localization and classification at the same time

- fast and easy design

- taken into context

- YOLO, SSD, RetinaNet

 

You Only Look Once(YOLO): first 1-stage detector

- modified GoogLeNet

  1. divide into the grid area
  2. get b number of bounding boxes and a confidence score for each grid
    1. confidence score: Prob(Object existing) * IOU of truth and pred
  3. get the probability of class for each grid
    1. conditional class probability: Pr(Class|Object)
  4. The output contains 30 channels
    1. 5 channels each for 2 bbox 
      1. x coordinate of the center of the grid cell
      2. y coordinate of the center of the grid cell
      3. width of bbox
      4. height of bbox
      5. bbox confidence score
    2. class maps
  5. multiply the bbox confidence score by the class maps to get the probability of the bbox being the bbox for the object
  6. make the probability zero if under a certain threshold and sort in descending order
  7. use NMS to remove redundant bbox

SSD: to solve the problem of detecting small-sized objects and using only the last layer of the feature

- use 6 different scale feature maps: a big feature map predicts small objects while a small feature map predicts large objects

- use only the convolution layer

- use anchor box

- VGG-16 as the backbone

 

YOLO v2:

- higher resolution

- convolution with anchor boxes

- no FC layer

- batch normalization

- add early feature map to late feature map

- multi-scale training

- Darknet-19

- used WordTree which combined ImageNet and COCO to make a hierarchical dataset

 

YOLO v3:

- Darknet-53

- convolution stride 2

- use 3 different scales

- use Feature Pyramid Network

RetinaNet:

- to solve the problem of 1 stage detector having too many negative samples

- use new loss function (Focal Loss): cross-entropy loss + scaling factor (more importance on harder cases)

- Improvement in performance

 

 

 


11/16 수

학습한 것들:

Width scaling: used for a small model to get small details well

 

Depth scaling: used in many models to get complex and rich features but it has a problem of gradient vanishing

 

Resolution scaling: can get details very well

 

EfficientDet: efficiently scales the model

- match the width, depth, and resolution balance to achieve great performance with low computational cost

- idea from EfficientNet

- efficiency is needed for real-time

  1. Efficient multi-scale feature fusion
    1. remove the node with one edge only
    2. add input to output by adding an edge
    3. use repeated block
    4. used a weighted sum of various resolutions
      1. BiFPN: weight passes through ReLU so that it does not become 0 and also add epsilon to make denominator non-zero, basically a weighted sum
  2. model scaling: compound scaling like EfficientNet

 

Cascade RCNN: explored change when the threshold for the positive and negative sample is changed

- higher the input IoU, the better performance for a model that is trained with the higher threshold

- higher the threshold, it performs better when the AP IoU threshold is higher

- train multiple RoI heads, and set the IoU threshold differently for each head, the bounding box of the previous head is applied to the next head

- Iterative + Integral = Cascade

 

DCN(Deformable Convolutional Networks):

  • normal CNN is weak against geometric transformation
    • traditional method: geometric augmentation, geometric invariant feature selection
  • when the convolution kernel is multiplied, give some geometric offset in the middle of convolutional operations
    • has an offset field that contains an offset vector
    • the model learns the offset of the feature
  • good performance in object detection and segmentation

The problem with ViT is that it has a high computational cost and needs a lot of data to train

 

DETR(End-to-End object detection with transformer:

  • replaces the need for NMS
  • use a high-level feature map because it needs a high computational cost
  • Pipeline
    • Input
    • CNN
    • encoder + positional encoding
    • decoder
    • Feed forward Network
    • N output
      • N> number of objects in the image
      • pad objects as no object by the amount of difference between N and the number of objects
        • this allows getting a precise amount of objects as output

 

Swin Transformer: use an architecture called window to reduce the computational cost

  • No class embedding
  • Two attention per transformer block
  • embedding is divided by the unit of the window, so the image is divided into many windows which decreases the computational cost of the model
    • has a problem of not using other parts of the window as consideration, so Shifted Window Multi-Head Attention corrects that by different divisions of windows
  • trains well with a low amount of data

 

 

 

'잡다한 것들 > 부스트캠프 AI Tech 4기' 카테고리의 다른 글

부스트캠프 14주차 학습 일지 - Semantic Segmentation  (0) 2022.12.19
부스트캠프 12주차 학습 일지 - 데이터 제작  (0) 2022.12.05
부스트캠프 8주차 학습 일지 - AI 서비스 개발 기초  (0) 2022.11.07
CV 기초대회 최종 회고  (0) 2022.11.04
6주차 학습 일지 - CV 기초대회  (0) 2022.10.24
    '잡다한 것들/부스트캠프 AI Tech 4기' 카테고리의 다른 글
    • 부스트캠프 14주차 학습 일지 - Semantic Segmentation
    • 부스트캠프 12주차 학습 일지 - 데이터 제작
    • 부스트캠프 8주차 학습 일지 - AI 서비스 개발 기초
    • CV 기초대회 최종 회고
    연두색연필
    연두색연필
    ML, Programming, PS, 삶의 순간을 기록

    티스토리툴바