Skip to content

Pearson equivalence#171

Merged
Hess3l merged 14 commits into
developmentfrom
pearson-equivalence
May 8, 2026
Merged

Pearson equivalence#171
Hess3l merged 14 commits into
developmentfrom
pearson-equivalence

Conversation

@Hess3l
Copy link
Copy Markdown

@Hess3l Hess3l commented May 7, 2026

Description

Implements the pearson equivalence test from pgmpy in rust

Related Issues

implements #105 and #106

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Checklist

  • I ran cargo fmt --all (code is formatted)
  • I ran cargo clippy --workspace --all-targets -- -D warnings (no warnings)
  • I ran cargo test --workspace (all tests pass)
  • I added tests for new functionality
  • I updated relevant documentation

Screenshots (if applicable)

Questions for Reviewers

@Hess3l Hess3l marked this pull request as ready for review May 7, 2026 14:04
@Hess3l Hess3l requested a review from Hiddentale as a code owner May 7, 2026 14:04
@Hess3l Hess3l self-assigned this May 7, 2026
clippy::cast_precision_loss,
reason = "array length and number of variables most likely won't exceed 2^53"
)]
let std_error_factor = sqrt((n - s - 3) as f64);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if s+3 is bigger than or equal to n?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let std_error_factor = ((n - s - 3) as f64).sqrt();

For std version

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the program bail if this is the case, good find

Ok(wrap_result(
self.boolean,
p_value,
coefficient,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return atanh(rho) instead of the raw coefficient, is that the right approach?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the approach used in pgmpy, so I assume it is

use crate::ci_tests::pearson_correlation::PearsonCorrelation;
use crate::strategy::{CITest, CITestDataType, TestResult};
use ndarray::{Array1, Array2};
use libm::{atanh, sqrt};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f64::sqrt()
f64::atanh()

both exist in the standard library, so we do not need another package

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it

statistic
};

let coefficient = atanh(rho);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let coefficient = rho.atanh();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's easier to read by:
let rho = if statistic <= -1.0 {-1.0 + 1e-12}
else if statistic >= 1.0 {1.0 - 1e-12}
else {statistic};

};

let coefficient = atanh(rho);
let z_delta = atanh(self.delta_threshold);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let z_delta = self.delta_threshold.atanh();

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it

import pandas as pd
import numpy as np

# registry = PyRegistry()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By course standards, no commented out code in production code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about this file, used it mostly for some rudimentary testing. I will remove this from the pull request since the actual benchmarks are not yet done

// Expected: low p_value (<= 0.05), low |coefficient| (< 0.1)
// Note: p_value is opposite to other tests
#[test]
fn unconditional_independent_data_is_not_rejected() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can shorten the name fn uncond_independent_data_not_rejected()

Copy link
Copy Markdown

@ayca12111 ayca12111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unit tests the names can be shortened.

Copy link
Copy Markdown
Contributor

@Hiddentale Hiddentale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now :)

@Hess3l Hess3l merged commit 1caea6e into development May 8, 2026
5 checks passed
@Hiddentale Hiddentale deleted the pearson-equivalence branch May 8, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write unit tests for Pearson equivalence Re-implementation of Pearson equivalence in Rust

3 participants