1. prepro_images.py: pre-process the images in COCO dataset (val2014.zip and train2014.zip). The dates are in /homework/datasets 2. The weights of pre-trained ResNet101 is in ./data/imagenet_weights 3. x(t) uses the previous ground truth t(t-1) in training but x(t-1) in testing 4. The V bar in topdown model is taking average pooling on each feature map (7x7x2048→1x2048)
prepro_images.py: pre-process the images in COCO dataset (val2014.zip and train2014.zip). The dates are in /homework/datasets
The weights of pre-trained ResNet101 is in ./data/imagenet_weights
x(t) uses the previous ground truth t(t-1) in training but x(t-1) in testing
The V bar in topdown model is taking average pooling on each feature map (7x7x2048→1x2048)