Skip to content

[SYSTEMDS-3421] Add serialization support for UDF encoders#2397

Open
zohrehasadi00 wants to merge 2 commits intoapache:mainfrom
zohrehasadi00:SYSTEMDS-3421-udf-serialization
Open

[SYSTEMDS-3421] Add serialization support for UDF encoders#2397
zohrehasadi00 wants to merge 2 commits intoapache:mainfrom
zohrehasadi00:SYSTEMDS-3421-udf-serialization

Conversation

@zohrehasadi00
Copy link
Contributor

Adds UDF encoder type to ColumnEncoder enum and EncoderFactory -> UDF encoders can be created/serialized.
Implements readExternal/writeExternal for ColumnEncoderUDF to persist function name and domain size.
Adds a regression test.

Signed-off-by: Zohreh <zohrehasadi2018@yahoo.com>
Signed-off-by: Zohreh <zohrehasadi2018@yahoo.com>
@zohrehasadi00
Copy link
Contributor Author

@janniklinde This is my first time contributing to an open-source project voluntarily. Should I do anything from my side, or will someone review the PR automatically through the queue or sth? tnx.

@janniklinde
Copy link
Contributor

Thanks for the PR @zohrehasadi00. I'm not really familiar with this issue but I don't see how this fully addresses issue 3421? @Baunsgaard do you have any comments on this?

Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, i think it looks fine.

However, I think the PR currently is missing some tests for null handling of strings, because the federated site has some special handling of UDF functions that sometimes serialize null values. Could we add some more tests for different serialization scenarios?

There is some history on this issue, where I made a followup PR to merge another:

See:
#1716
and:
#1681

Comment on lines -279 to +295
return new ColumnEncoderDummycode();
case FeatureHash:
return new ColumnEncoderFeatureHash();
case PassThrough:
return new ColumnEncoderPassThrough();
case Recode:
return new ColumnEncoderRecode();
case WordEmbedding:
return new ColumnEncoderWordEmbedding();
case BagOfWords:
return new ColumnEncoderBagOfWords();
default:
throw new DMLRuntimeException("Unsupported encoder type: " + etype);
return new ColumnEncoderDummycode();
case FeatureHash:
return new ColumnEncoderFeatureHash();
case PassThrough:
return new ColumnEncoderPassThrough();
case Recode:
return new ColumnEncoderRecode();
case WordEmbedding:
return new ColumnEncoderWordEmbedding();
case BagOfWords:
return new ColumnEncoderBagOfWords();
case UDF:
return new ColumnEncoderUDF();
default:
throw new DMLRuntimeException("Unsupported encoder type: " + etype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the indentation is somehow changed. Should be fixed to highlight the new case you added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants