In Group Normalization, it says that the Batchnorm is computed along the channel dimension. However in this implementation, the forward of the BN is just computed regardless of the channel dimension:
self.x = inpt.copy()
self.mean = self.x.mean(axis=0) # shape = (w, h, c)
self.var = 1. / np.sqrt((self.x.var(axis=0)) + self.epsil) # shape = (w, h, c)
I think it should be like this:
mean = np.mean(X, axis=(0, 2, 3), keepdims=True)
variance = np.var(X, axis=(0, 2, 3), keepdims=True)
In Group Normalization, it says that the Batchnorm is computed along the channel dimension. However in this implementation, the forward of the BN is just computed regardless of the channel dimension:
I think it should be like this: