논문 공부

Fast R-CNN 정리(2)

by Candy Lee 2021. 1. 3.

This paper is written by Ross Girshick(Microsoft Research)




논문 본문 정리 및 요약 내용 기록 남기기(개인 공부 포스팅입니다)




[2. Fast R-CNN architecture and training]

Fast R-CNN architecture


Fast R-CNN Procedure)


1. Input Image -> Conv&Max Pooling layers -> create feature map


2. From feature map of level 1 -> ROIs-> extract feature vector(fixed length)


3. Feature Vector -> Fully connected layers(FC) -> Split into 2 outputs


4.  First one : Execute ROI Classification


    Second one : Execute bounding box Regression

    (4 real-valued numbers for each K object classes)  

    ==> Adjust positions of bounding box





[2.1 The ROI pooling layer]

Main Concept)

=>This layer uses "max-pooling" to convert features inside ROI into

    small feature-map with a fixed spatial extent of H X W

    (H & W are hyper - parameters)


Structure of ROI)

=> (r,c,h,w)

(r,c) : top-left corner coordinates

(h,w) : height and width values



ROI pooling Operation)


h & w : Height and Width of ROI window


H & W : Height and Width of sub-window


1. Create approximate size of grid by calculation of h/H and w/W


2. Apply Max-Pooling in each sub-window



ROI pooling operation picture



그림 출처 www.researchgate.net/figure/Illustration-of-the-RoI-pooling-operation_fig4_333521857


Figure 6. Illustration of the RoI pooling operation.

Figure 6. Illustration of the RoI pooling operation.






[2.2 Initializing from pre-trained networks]

There are 3 pre-trained networks for this experiment.


Three tranformations will proceed during Initialization Process)


1. The last max pooling layer -> replaced by ROI pooling layer


2. Last FC & Softmax - > replaced by two splitted layers

   (Softmax Classification + Bounding Box Regressions)


3. Number of input values changed -> 2 data inputs

                                        (list of images + list of ROIs)





[2.3 Fine-tuning for detection]

SPP net is unable to update weights... Then why??


Root Cause : Back-Propagation in SPP net is not efficient!!!!



Not Efficient = Inefficient! ( Because of Training Inputs are large)

-> 가끔식 ROI가 수용 구역을 전체 이미지로 설정할 수 있기 때문이다.


But, Fast R-CNN training is different!




Fast R-CNN training Strategy)

Sample N images hierarachically & Sample R/N ROIs from each image.

So, from singular image-> multiple ROIs will be extracted(shares computation & memory)


Therefore, small value in number of images(N) is needed!

(Smaller N will decrease mini-batch computation!)


For example, with 2 images(N=2) and 128 ROIs(R=128) 

training will be 64x faster than sampling only 1 ROI(R=1) from 128 different images(N=128)



이미지 출처 : https://m.blog.naver.com/PostView.nhn?blogId=laonple&logNo=220776743537&proxyReferer=https:%2F%2Fwww.google.com%2F



마지막 정리 문장)

Fast R-CNN uses streamlined training process with one fine-tuning stage that jointly optimizes

a softmax classifier and bounding box regression.




이후 내용도 추가 포스팅을 통해 올릴 예정입니다.


오늘도 감사합니다.


