Skip to content

serialization-deserialization bug #143

@patrickleonardy

Description

@patrickleonardy

Bug Report

After serializing and de-serializing a PreProcessor with only contiguous variables (to check if it is also the case when categorical variables are present)

  1. the preprocessor object can not be printed -> AttributeError
  2. when trying to transform data the KBinsDiscretizer throws -> NotFittedError

Description

For the first point: It seems that the problem with the difference in the naming of the attributes and the parameters in the function definition. self._get_param_names() returns "categorical_data_processor" but getattr() only knows "_categorical_data_processor".
By changing the naming this problem is resolved is there no other way ?

For the second point: There is a problem when creating the pipeline_dictionary it seems that some keywords are empty even if they should have a value.

Steps to Reproduce

  1. Load a dataset:
    from sklearn.datasets import load_iris
    import pandas as pd
    X, y = load_iris(return_X_y=True, as_frame=True)
    df = pd.concat([X,y])
    df = df.rename({0:"target"}, axis=1)
  2. Create preprocessor and fit it
    from cobra.preprocessing import PreProcessor
    preprocessor = PreProcessor.from_params()
    continuous_vars = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
    discrete_vars = []
    preprocessor.fit( df, continuous_vars= continuous_vars, discrete_vars= discrete_vars, target_column_name="target" )
  3. Serialize the preprocessor
    pipeline_serialized = preprocessor.serialize_pipeline()
  4. De-serialize
    new_preprocessor = PreProcessor.from_pipeline(pipeline_serialized)
  5. See what happens when printing
    new_preprocessor
  6. See what happens when transforming
    new_preprocessor.transform( df, continuous_vars= continuous_vars, discrete_vars= [] )

Actual Results

I got ...

MicrosoftTeams-image
MicrosoftTeams-image (1)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions