부스트캠프 12주차 학습 일지 - 데이터 제작

사용한 기술 스택들:

12/5 월

학습한 것들:

성능 = 구조 + 데이터 + 최적화

Software 1.0

문제 정의
큰 문제를 작은 문제들의 집합으로 분해
개별 문제 별로 알고리즘 설계
솔루션들을 합쳐 하나의 시스템으로

Software 2.0

뉴럴넷 구조에 의해 검색을 한다
최적화를 통해 사람이 정한 목표에 가장 적합한 연산의 집합을 찾는다
경로와 목적지는 데이터와 최적화 방법에 의해서 정해진다

요즘 시대에는 인공지능이 솔루션을 찾게 설계를 한다

12/6 화

학습한 것들:

Production Process of AI Model:

모델 요구사항 확정
- 처리 시간
- 목표 정확도
- 목표 qps
- Serving 방식
- 장비 사양
데이터셋 준비
- 종류
- 수량
- 정답
모델 학습 및 디버깅
- 데이터 관련 피드백
- 요구사항 달성
설치 및 유지보수
- 성능 모니터링
- 이슈 해결

Data-centric: data만 수정하여 모델 성능 끌어올리기

서비스 출시 후의 성능 계선은 data를 늘리는 게 더 비용 절감도 되고 편하다

Model-centric: 데이터를 고정시키고 모델 성능 끌어올리기

학계에서 데이터를 다루기 힘든 이유:

좋은 데이터를 많이 모으기 힘들다
라벨링 비용이 크다
작업 기간이 오래 걸린다

라벨잉 노이즈를 상쇄할 정도로 깨끗한 라벨링 데이터가 많아야 한다

Best case: small data/clean labels/data balance

자주 보지 못하는 데이터 종류는 접해본 적이 많이 없기 때문에 라벨링 노이즈 세기가 늘어난다

Labeling is an iterative process

OCR: Optical Character Recognition

STR: Scene Text Recognition

글자 영역 다수 객체 검출: 글자 영역이냐 아니냐의 판별, 클래스 정보가 필요 없다

그냥 객체 검출과 다른 점

영역이 길고
밀도가 높다

OCR 순서:

Detector(글자 영역 검출) →

Recognizer(이미지를 글자로) →

Serializer(2D text to 1D text) →

Parser(understanding the text)

OCR services:

copy text from the image
Search image by word
move playlists to a new platform with screenshots
translate

Rectangle types:

RECT: (x1,y1,width,height)
RBOX: (x1,y1,width,height, θ)
QUAD: 4 x,y coordinates, x1,y1 is in top left and plot clockwise
Polygon: multiple coordinates to fit the arbitrary-shaped text

Regression-based: image to bbox directly using anchor box

downside: cannot work well with the arbitrary-shaped text, sometimes does not capture all the characters due to receptive field and anchor box

Segmentation-based: get the image as input to get pixel-wise data of whether that pixel is in the text area, along with 8 other probabilities for the border to divide those areas

downside: too slow post-processing, interference between different areas

Hybrid: get approximate bbox using regression and use segmentation to get pixel-wise data

EAST: An Efficient and Accurate Scene Text Detector

The network outputs two pieces of information about pixels:
- whether it is in the center of the text area (score map)
  - binary map
  - 30% less size of a ground truth bounding box
- if the pixel is a text area, where is the bbox (geometry map)
  - RBOX
    - 5 channel
  - QUAD
    - 8 channel
use U-Net
Consist of feature extractor stem, feature merging branch
use locality-aware NMS to merge the box from top to bottom as nearby pixels will predict the same text instance
class-balanced cross-entropy

Public dataset:

can acquire labeled images easily
data might not be what is needed
not many data

Synthetic image:

does not need to be labeled
fast acquirement
need to check if the data is similar to the real-world data

Crawled image:

fast at collecting images
not a lot of high-quality images
not a lot of samples
copyright

Crowd-sourced image:

expensive
high-quality

12/7 수

학습한 것들:

데이터 제작에서는 상세한 가이드라인 제작이 중요하다.

구글 검색을 통해 크롤링을 할 수가 있다

데이터 제작 순서:

가이드 작성
가이드 교육
라벨링
라벨링 검수
데이터 검수 by AI team

글자 검출 모델 평가방법:

두 영역 간의 매칭 판단 방법
- one-to-one match
- one-to-many match (split)
  - more prediction than ground truth
- may-to-one match (merge)
  - more ground truth than prediction
매칭 행렬에서 유사도 수치 계산 방법
- IOU
  - Only allow one-to-one matching
- Area recall, precision

DetEval: calculate for each cell in the matching matrix

One to one = 1
Many to one = 1
One to Many = 0.8
Recall = average by the ground truth
Precision = average by the prediction
Final score: F1-Score

TIoU: give a score for the area that is over or less than the ground truth

CLEval: score by how many characters the bbox got it right

Annotation tool:

LabelMe
- based on MIT open source
- easy to install
- cannot collaborate
CVAT
- made by intel
- multi-user
- various annotation
- the model inference is slow
Hasty Labeling tool
- multi-user
- not free
- cannot customize

DBNet: adaptive thresholding to give more threshold to the border

MOST: an improved version of EAST

TFAM: use deformable convolution to manage receptive field
PA-NMS(position-aware NMS): more weight to point that is predicted near the edge
Instance-wise IoU loss: IOU with normalization to give a scale-invariant characteristic

TextFuseNet: get global level feature along with character level feature

12/9 목

학습한 것들:

Synthetic data is cheap to create and has the correct label.

- use depth estimation to put the synthetic text in the right place

SynthText3D: use 3D virtual world to make a synthetic image

It is helpful to use synthetic data as a pretraining dataset

Use data augmentation: geometric transformation + style transformation ...

- use crop with no bbox cut so that it learns better

Multi-scale training can be helpful

'잡다한 것들 > 부스트캠프 AI Tech 4기' 카테고리의 다른 글

부스트캠프 17주차 학습 일지 - Product Serving (0)	2023.01.10
부스트캠프 14주차 학습 일지 - Semantic Segmentation (0)	2022.12.19
부스트캠프 9주차 학습 일지 - Object Detection 1 (0)	2022.11.14
부스트캠프 8주차 학습 일지 - AI 서비스 개발 기초 (0)	2022.11.07
CV 기초대회 최종 회고 (0)	2022.11.04

12/5 월

학습한 것들:

12/6 화

학습한 것들:

12/7 수

학습한 것들:

12/9 목

학습한 것들:

'잡다한 것들 > 부스트캠프 AI Tech 4기' 카테고리의 다른 글

티스토리툴바