Use total damage and peanlty for false alarm for model evaluation, perform anomaly detection to make model learn which fraud causes bigger damage
- Inspired by 2019 Edelman Finalist Microsoft : Prospective Dynamic Fraud Control for Optimal Profitability in e-Commerce
- From the video, the importance of model management and reducing total damage from fraud are discussed
- Previous work : Credit Fraud Detection Model only concerned about AUROC, not total damage amount from fraud
- As it only concerned about AUROC, there was no bigger penalty for fraud case which caused bigger damage
- As model was trained as a novelty detection problem (model only learns normal sample while required to reject novelty in test set), model had no chance to learn which fraud causes bigger damage
Reduce total damage while keeping AUROC
- Set problem as anomaly detection, so that model can learn fraud which has higher potential damage
- Metric for model evaluation
- Total damage from fraud failed to detect + Penalty for false detection
- Loss Function
- Minimize reconstructoin error and KL Divergence for normal sample
- (Model 2) Maxmize reconsturciton error for fraud sample
- (Model 3) Maximize reconsturction error for fraud sample * fraud amount
- Penalty for false detection
- Calculated as minmium wage per hour / (60 mins / minutes needed to make a call to check)
- Penalty for false detection in DE : €0.7658 per a call
- Minimum wage in DE
- €9.19
- Minimum wage in DE
- Penalty for false detection in UK : €0.7108 per a call
- Minimum wage in UK
- 25 and over : £8.21
- 21 to 24 : £7.70
- 18 to 20 : £6.15
- Pound_to_Euros(2019.10.19)
- £1 = €1.16
- Minimum wage in UK
- Time required to make a call and check transaction is done by owner or not : 5 minutes (arbitrary set)
- Penalty for false detection in DE : €0.7658 per a call
- Calculated as minmium wage per hour / (60 mins / minutes needed to make a call to check)
Fully Connected Variational Autoencoder
- Simple structure with 30 -> 10 -> 2 -> 10 -> 30
- In this project, 3 methods are tested with same structure above
- Model 1 : Novelty Detection
- Trained by normal samples only
- Loss function
- Minimize reconsturction error for normal sample
- Minimize KL Divergence for normal sample
- Model 2 : Anomaly Detection
- Tained by normal sample + fraud sample
- Loss function
- Minimize reconstruction error for normal sample
- Minimize KL Divergence for normal sample
- Maximize reconstruction error for fraud sample
- Model 3 : Anomaly Detection with Fraud Amount Weighted Loss
- Trained by normal sample + fraud sample
- Looss function
- Minimize reconstruction error for normal sample
- Minimize KL Divergence for normal sample
- Maximize reconstruction error * fraud amount for fraud sample (assign bigger weight for bigger damage)
- Model 1 : Novelty Detection
Training proposed method is unstable, only in some random seed it showed better performance
- Each method was tested for 5 random seeds
- Used data from Kaggle : Credit Card Fraud Detection
- Data was splitted as below
- Random Seed : 4
- X axis : Epochs
- Y axis : Loss (log scale)
- Training process is stable
- Random Seed : 3
- X axis : Epochs
- Y axis : Loss (log scale)
- Minimize Reconsturction Error + KLD for normal samples
- Maximize Reconstruction Error for abnormal samples
- Due to maximizing reconstruction error for abnormal samples, after several epochs loss for normal samples also increases
- Weight for maximizing reconsturciton error for fraud sample should be decreased, as it disturbes model to train normal samples
- Training process of maximizing reconsturction error for fraud sample it too fast so that model failes to learn normal samples properly
- Y axis : reconstruction error x (-1)
- Hence it's actually increasing
- Random Seed : 3
- X axis : Epochs
- Y axis : Loss (log scale)
- It also shows similar result
- Random Seed : 0
- X axis : Epochs
- Y axis : Loss (log scale)
- For different random seed, model showed lowest validdation loss after more epochs unlike method 1 (novelty detection) required similar number of epochs for all random seeds
- Therefore the weight between minimizing normal loss and maximizing abnormal loss must be fine-tuned to train model properly
- As proposed method showed better performance in some random seed, it seems proposed method is valid
- But as it showed in the training history, while minimizing normal error and maximizing normal error, it is concerned that maximizing reconsturction error could disturb model to learn normal samples
- Therefore weight between minmizing normal error and maximizing abnormal error must be concerned to train model properly. The weight between to losses is hyperparameter for this method and must be tuned



















