[SYSTEMDS-3421] Add serialization support for UDF encoders#2397
[SYSTEMDS-3421] Add serialization support for UDF encoders#2397zohrehasadi00 wants to merge 2 commits intoapache:mainfrom
Conversation
Signed-off-by: Zohreh <zohrehasadi2018@yahoo.com>
Signed-off-by: Zohreh <zohrehasadi2018@yahoo.com>
|
@janniklinde This is my first time contributing to an open-source project voluntarily. Should I do anything from my side, or will someone review the PR automatically through the queue or sth? tnx. |
|
Thanks for the PR @zohrehasadi00. I'm not really familiar with this issue but I don't see how this fully addresses issue 3421? @Baunsgaard do you have any comments on this? |
Baunsgaard
left a comment
There was a problem hiding this comment.
Overall, i think it looks fine.
However, I think the PR currently is missing some tests for null handling of strings, because the federated site has some special handling of UDF functions that sometimes serialize null values. Could we add some more tests for different serialization scenarios?
There is some history on this issue, where I made a followup PR to merge another:
| return new ColumnEncoderDummycode(); | ||
| case FeatureHash: | ||
| return new ColumnEncoderFeatureHash(); | ||
| case PassThrough: | ||
| return new ColumnEncoderPassThrough(); | ||
| case Recode: | ||
| return new ColumnEncoderRecode(); | ||
| case WordEmbedding: | ||
| return new ColumnEncoderWordEmbedding(); | ||
| case BagOfWords: | ||
| return new ColumnEncoderBagOfWords(); | ||
| default: | ||
| throw new DMLRuntimeException("Unsupported encoder type: " + etype); | ||
| return new ColumnEncoderDummycode(); | ||
| case FeatureHash: | ||
| return new ColumnEncoderFeatureHash(); | ||
| case PassThrough: | ||
| return new ColumnEncoderPassThrough(); | ||
| case Recode: | ||
| return new ColumnEncoderRecode(); | ||
| case WordEmbedding: | ||
| return new ColumnEncoderWordEmbedding(); | ||
| case BagOfWords: | ||
| return new ColumnEncoderBagOfWords(); | ||
| case UDF: | ||
| return new ColumnEncoderUDF(); | ||
| default: | ||
| throw new DMLRuntimeException("Unsupported encoder type: " + etype); |
There was a problem hiding this comment.
Seems like the indentation is somehow changed. Should be fixed to highlight the new case you added.
Adds UDF encoder type to ColumnEncoder enum and EncoderFactory -> UDF encoders can be created/serialized.
Implements readExternal/writeExternal for ColumnEncoderUDF to persist function name and domain size.
Adds a regression test.