Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
83e1531
Use the correct path to the cache directory for the task
PGijsbers Jan 20, 2026
f90036d
Push configuration of test server URL exclusively to config.py
PGijsbers Jan 21, 2026
3a257ab
Update the test to use a dataset which does not have a parquet file
PGijsbers Jan 28, 2026
3b79017
Replace hard-coded cache directory by configured one
PGijsbers Jan 28, 2026
f524d75
Update test to use dataset file that is already in cache
PGijsbers Jan 28, 2026
7ef12c2
Windows test
satvshr Jan 29, 2026
a5601e3
relax assumptions on local file structure
PGijsbers Jan 29, 2026
d862be2
Do not use static cache directory
PGijsbers Jan 29, 2026
16699e6
Update expected number to match initial server state
PGijsbers Jan 29, 2026
7c14c68
bug fixing
satvshr Jan 29, 2026
78b2038
merge main
satvshr Jan 29, 2026
16ceeaa
remove db refresh every test
satvshr Jan 29, 2026
015acf4
bug fixing
satvshr Jan 29, 2026
937fc77
bug fixing
satvshr Jan 29, 2026
30972f8
bug fixing
satvshr Jan 29, 2026
775dcf7
Add symlink to regular test cache directory
PGijsbers Jan 30, 2026
319cb35
Skip test for 1.8 since expected results differ too much
PGijsbers Jan 30, 2026
a680ebe
Simplify path to static cache directory
PGijsbers Jan 30, 2026
b161b3b
Update symbolic link to be relative
PGijsbers Jan 30, 2026
0b989d1
Fix typo
PGijsbers Jan 30, 2026
892ea6c
trying ot fix multiple threads issue
satvshr Jan 31, 2026
ae3befb
removed test file
satvshr Jan 31, 2026
5f396a0
removed unnecessary code (?)
satvshr Jan 31, 2026
8a319cd
Trigger Build
satvshr Jan 31, 2026
4ba4239
Clean up code
satvshr Feb 1, 2026
0292404
comment fixing
satvshr Feb 1, 2026
a7b5d76
attempted bug fixing
satvshr Feb 1, 2026
9b0f3d7
attempted bug fixing
satvshr Feb 1, 2026
630f240
attempted bug fixing
satvshr Feb 1, 2026
c61d410
attempted bug fixing reverts
satvshr Feb 1, 2026
1ab42b7
disabling parallel runs
satvshr Feb 1, 2026
06405c8
disabling parallel runs
satvshr Feb 2, 2026
e22b7ca
disabling windows CI
satvshr Feb 2, 2026
1b00a7f
removed docker from pytest default
satvshr Feb 6, 2026
cc6e673
change mysql port
satvshr Feb 6, 2026
c1bf558
Change order of ci flow
satvshr Feb 6, 2026
1a794fe
CI testing
satvshr Feb 11, 2026
dbe7782
CI testing
satvshr Feb 11, 2026
d8be5f1
CI testing
satvshr Feb 11, 2026
b204845
CI testing
satvshr Feb 11, 2026
54725fa
Windows CI bugfixing
satvshr Feb 11, 2026
abc44a5
merging 2 branches
satvshr Feb 11, 2026
b034687
merging 2 branches
satvshr Feb 11, 2026
b8826f5
merging 2 branches
satvshr Feb 11, 2026
445cbe8
merging 2 branches
satvshr Feb 11, 2026
295ef93
curl to verify server is running
satvshr Feb 11, 2026
488f409
path fix
satvshr Feb 11, 2026
93d7409
Merge branch 'update-tests-for-local' into i1614
satvshr Feb 11, 2026
45e7257
run all test server tests
satvshr Feb 11, 2026
7fcf039
fix 'Cleanup Docker setup'
satvshr Feb 11, 2026
37cfb2e
skipping windows given docker binaries do not match
satvshr Feb 11, 2026
9290010
testing out locally
satvshr Feb 12, 2026
bbfa193
replacing with 8080
satvshr Feb 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 47 additions & 13 deletions .github/workflows/test.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be following steps to setup the docker services, though these should be ignored for window machines

  1. clone the repo, which you are already doing
  2. run the docker services
  3. verify that localhost endpoints are live

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add env var

OPENML_USE_LOCAL_SERVICES: "true"

Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,35 @@ jobs:
echo "BEFORE=$git_status" >> $GITHUB_ENV
echo "Repository status before tests: $git_status"

- name: Clone Services
if: matrix.os == 'ubuntu-latest'
run: |
git clone --depth 1 https://github.com/openml/services.git
cd services

git config user.email "ci@openml.org"
git config user.name "CI"

git fetch origin pull/13/head:pr-13
git merge pr-13 --no-edit

git fetch origin pull/15/head:pr-15
git merge pr-15 --no-edit

- name: Start Docker Services
if: matrix.os == 'ubuntu-latest'
working-directory: ./services
run: |
sudo systemctl stop mysql.service
docker compose --profile rest-api --profile minio --profile evaluation-engine up -d
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to run evaluation-engine?

docker wait openml-test-database-setup

- name: Verify API is Reachable
if: matrix.os == 'ubuntu-latest'
run: |
timeout 20s bash -c 'until curl -sSf http://localhost:8000/api/v1/xml/data/1 > /dev/null; do sleep 3; done'
curl -I http://localhost:8000/api/v1/task/1
Comment on lines 129 to 131
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: |
timeout 180s bash -c 'until curl -sSf http://localhost:8000/api/v1/xml/data/1 > /dev/null; do
echo "Server still booting... retrying in 5s";
sleep 5;
done'
curl -I http://localhost:8000/api/v1/task/1
run: |
echo "Waiting for API to become available..."
timeout 30s bash -c 'until curl -sSf http://localhost:8080/api/v1/task/1 > /dev/null; do sleep 2; done'
echo "Response:"
curl http://localhost:8080/api/v1/task/1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what there was earlier, changed it the previous commit and was going to revert it after CIs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what there was earlier, changed it the previous commit and was going to revert it after CIs


- name: Show installed dependencies
run: python -m pip list

Expand All @@ -112,9 +141,9 @@ jobs:
fi

if [ "${{ matrix.sklearn-only }}" = "true" ]; then
marks="sklearn and not production and not uses_test_server"
marks="sklearn and not production"
else
marks="not production and not uses_test_server"
marks="not production"
fi

pytest -n 4 --durations=20 --dist load -sv $codecov -o log_cli=true -m "$marks"
Expand All @@ -127,9 +156,9 @@ jobs:
fi

if [ "${{ matrix.sklearn-only }}" = "true" ]; then
marks="sklearn and production and not uses_test_server"
marks="sklearn and production"
else
marks="production and not uses_test_server"
marks="production"
fi

pytest -n 4 --durations=20 --dist load -sv $codecov -o log_cli=true -m "$marks"
Expand All @@ -139,6 +168,20 @@ jobs:
run: | # we need a separate step because of the bash-specific if-statement in the previous one.
pytest -n 4 --durations=20 --dist load -sv --reruns 5 --reruns-delay 1 -m "not uses_test_server"

- name: Upload coverage
if: matrix.code-cov && always()
uses: codecov/codecov-action@v4
with:
files: coverage.xml
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true
verbose: true

- name: Cleanup Docker setup
if: matrix.os == 'ubuntu-latest' && always()
run: |
sudo rm -rf services

- name: Check for files left behind by test
if: matrix.os != 'windows-latest' && always()
run: |
Expand All @@ -151,15 +194,6 @@ jobs:
exit 1
fi

- name: Upload coverage
if: matrix.code-cov && always()
uses: codecov/codecov-action@v4
with:
files: coverage.xml
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true
verbose: true

dummy_windows_py_sk024:
name: (windows-latest, Py, sk0.24.*, sk-only:false)
runs-on: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion openml/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ def check_server(server: str) -> str:

def replace_shorthand(server: str) -> str:
if server == "test":
return "https://test.openml.org/api/v1/xml"
return f"{config.TEST_SERVER_URL}/api/v1/xml"
if server == "production":
return "https://www.openml.org/api/v1/xml"
return server
Expand Down
4 changes: 3 additions & 1 deletion openml/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
OPENML_SKIP_PARQUET_ENV_VAR = "OPENML_SKIP_PARQUET"
_TEST_SERVER_NORMAL_USER_KEY = "normaluser"

TEST_SERVER_URL = "http://localhost:8080"


class _Config(TypedDict):
apikey: str
Expand Down Expand Up @@ -213,7 +215,7 @@ class ConfigurationForExamples:
_last_used_server = None
_last_used_key = None
_start_last_called = False
_test_server = "https://test.openml.org/api/v1/xml"
_test_server = f"{TEST_SERVER_URL}/api/v1/xml"
_test_apikey = _TEST_SERVER_NORMAL_USER_KEY

@classmethod
Expand Down
11 changes: 6 additions & 5 deletions openml/tasks/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,9 +415,10 @@ def get_task(
if not isinstance(task_id, int):
raise TypeError(f"Task id should be integer, is {type(task_id)}")

cache_key_dir = openml.utils._create_cache_directory_for_id(TASKS_CACHE_DIR_NAME, task_id)
tid_cache_dir = cache_key_dir / str(task_id)
tid_cache_dir_existed = tid_cache_dir.exists()
task_cache_directory = openml.utils._create_cache_directory_for_id(
TASKS_CACHE_DIR_NAME, task_id
)
task_cache_directory_existed = task_cache_directory.exists()
try:
task = _get_task_description(task_id)
dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
Expand All @@ -431,8 +432,8 @@ def get_task(
if download_splits and isinstance(task, OpenMLSupervisedTask):
task.download_split()
except Exception as e:
if not tid_cache_dir_existed:
openml.utils._remove_cache_dir_for_id(TASKS_CACHE_DIR_NAME, tid_cache_dir)
if not task_cache_directory_existed:
openml.utils._remove_cache_dir_for_id(TASKS_CACHE_DIR_NAME, task_cache_directory)
raise e

return task
Expand Down
2 changes: 1 addition & 1 deletion openml/testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ class TestBase(unittest.TestCase):
"user": [],
}
flow_name_tracker: ClassVar[list[str]] = []
test_server = "https://test.openml.org/api/v1/xml"
test_server = f"{openml.config.TEST_SERVER_URL}/api/v1/xml"
admin_key = "abc"
user_key = openml.config._TEST_SERVER_NORMAL_USER_KEY

Expand Down
4 changes: 2 additions & 2 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ def with_server(request):
openml.config.apikey = None
yield
return
openml.config.server = "https://test.openml.org/api/v1/xml"
openml.config.server = f"{openml.config.TEST_SERVER_URL}/api/v1/xml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the env variable OPENML_USE_LOCAL_SERVICES and update server accordingly

@pytest.fixture(autouse=True)
def with_server(request):
    if os.getenv("OPENML_USE_LOCAL_SERVICES") == "true":
        openml.config.TEST_SERVER_URL = "http://localhost:8080"

    if "production" in request.keywords:
        openml.config.server = "https://www.openml.org/api/v1/xml"
        openml.config.apikey = None
        yield
        return
    openml.config.server = f"{openml.config.TEST_SERVER_URL}/api/v1/xml"
    openml.config.apikey = TestBase.user_key
    yield

openml.config.apikey = TestBase.user_key
yield

Expand All @@ -295,8 +295,8 @@ def with_test_cache(test_files_directory, request):
openml.config.set_root_cache_directory(_root_cache_directory)
if tmp_cache.exists():
shutil.rmtree(tmp_cache)



@pytest.fixture
def static_cache_dir():
return Path(__file__).parent / "files"
Expand Down
37 changes: 13 additions & 24 deletions tests/test_datasets/test_dataset_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -527,27 +527,20 @@ def test_deletion_of_cache_dir(self):
def test_deletion_of_cache_dir_faulty_download(self, patch):
patch.side_effect = Exception("Boom!")
self.assertRaisesRegex(Exception, "Boom!", openml.datasets.get_dataset, dataset_id=1)
datasets_cache_dir = os.path.join(self.workdir, "org", "openml", "test", "datasets")
datasets_cache_dir = os.path.join(openml.config.get_cache_directory(), "datasets")
assert len(os.listdir(datasets_cache_dir)) == 0

@pytest.mark.uses_test_server()
def test_publish_dataset(self):
# lazy loading not possible as we need the arff-file.
openml.datasets.get_dataset(3, download_data=True)
file_path = os.path.join(
openml.config.get_cache_directory(),
"datasets",
"3",
"dataset.arff",
)
arff_file_path = self.static_cache_dir / "org" / "openml" / "test" / "datasets" / "2" / "dataset.arff"
dataset = OpenMLDataset(
"anneal",
"test",
data_format="arff",
version=1,
licence="public",
default_target_attribute="class",
data_file=file_path,
data_file=arff_file_path,
)
dataset.publish()
TestBase._mark_entity_for_removal("data", dataset.dataset_id)
Expand Down Expand Up @@ -886,7 +879,7 @@ def test_create_invalid_dataset(self):

@pytest.mark.uses_test_server()
def test_get_online_dataset_arff(self):
dataset_id = 100 # Australian
dataset_id = 128 # iris -- one of the few datasets without parquet file
# lazy loading not used as arff file is checked.
dataset = openml.datasets.get_dataset(dataset_id, download_data=True)
decoder = arff.ArffDecoder()
Expand Down Expand Up @@ -1464,8 +1457,9 @@ def test_data_edit_critical_field(self):
raise e
time.sleep(10)
# Delete the cache dir to get the newer version of the dataset

shutil.rmtree(
os.path.join(self.workdir, "org", "openml", "test", "datasets", str(did)),
os.path.join(openml.config.get_cache_directory(), "datasets", str(did)),
)

@pytest.mark.uses_test_server()
Expand Down Expand Up @@ -1730,7 +1724,6 @@ def test_delete_dataset(self):

@mock.patch.object(requests.Session, "delete")
def test_delete_dataset_not_owned(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = (
test_files_directory / "mock_responses" / "datasets" / "data_delete_not_owned.xml"
)
Expand All @@ -1745,14 +1738,13 @@ def test_delete_dataset_not_owned(mock_delete, test_files_directory, test_api_ke
):
openml.datasets.delete_dataset(40_000)

dataset_url = "https://test.openml.org/api/v1/xml/data/40000"
dataset_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/data/40000"
assert dataset_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")


@mock.patch.object(requests.Session, "delete")
def test_delete_dataset_with_run(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = (
test_files_directory / "mock_responses" / "datasets" / "data_delete_has_tasks.xml"
)
Expand All @@ -1767,14 +1759,13 @@ def test_delete_dataset_with_run(mock_delete, test_files_directory, test_api_key
):
openml.datasets.delete_dataset(40_000)

dataset_url = "https://test.openml.org/api/v1/xml/data/40000"
dataset_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/data/40000"
assert dataset_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")


@mock.patch.object(requests.Session, "delete")
def test_delete_dataset_success(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = (
test_files_directory / "mock_responses" / "datasets" / "data_delete_successful.xml"
)
Expand All @@ -1786,14 +1777,13 @@ def test_delete_dataset_success(mock_delete, test_files_directory, test_api_key)
success = openml.datasets.delete_dataset(40000)
assert success

dataset_url = "https://test.openml.org/api/v1/xml/data/40000"
dataset_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/data/40000"
assert dataset_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")


@mock.patch.object(requests.Session, "delete")
def test_delete_unknown_dataset(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = (
test_files_directory / "mock_responses" / "datasets" / "data_delete_not_exist.xml"
)
Expand All @@ -1808,7 +1798,7 @@ def test_delete_unknown_dataset(mock_delete, test_files_directory, test_api_key)
):
openml.datasets.delete_dataset(9_999_999)

dataset_url = "https://test.openml.org/api/v1/xml/data/9999999"
dataset_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/data/9999999"
assert dataset_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")

Expand Down Expand Up @@ -1903,9 +1893,8 @@ def _dataset_features_is_downloaded(did: int):


def _dataset_data_file_is_downloaded(did: int):
parquet_present = _dataset_file_is_downloaded(did, "dataset.pq")
arff_present = _dataset_file_is_downloaded(did, "dataset.arff")
return parquet_present or arff_present
cache_directory = Path(openml.config.get_cache_directory()) / "datasets" / str(did)
return any(f.suffix in (".pq", ".arff") for f in cache_directory.iterdir())


def _assert_datasets_retrieved_successfully(
Expand Down Expand Up @@ -2010,7 +1999,7 @@ def test_get_dataset_parquet(requests_mock, test_files_directory):
test_files_directory / "mock_responses" / "datasets" / "data_description_61.xml"
)
# While the mocked example is from production, unit tests by default connect to the test server.
requests_mock.get("https://test.openml.org/api/v1/xml/data/61", text=content_file.read_text())
requests_mock.get(f"{openml.config.TEST_SERVER_URL}/api/v1/xml/data/61", text=content_file.read_text())
dataset = openml.datasets.get_dataset(61, download_data=True)
assert dataset._parquet_url is not None
assert dataset.parquet_file is not None
Expand Down
15 changes: 5 additions & 10 deletions tests/test_flows/test_flow_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,6 @@ def test_delete_flow(self):

@mock.patch.object(requests.Session, "delete")
def test_delete_flow_not_owned(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_not_owned.xml"
mock_delete.return_value = create_request_response(
status_code=412,
Expand All @@ -466,14 +465,13 @@ def test_delete_flow_not_owned(mock_delete, test_files_directory, test_api_key):
):
openml.flows.delete_flow(40_000)

flow_url = "https://test.openml.org/api/v1/xml/flow/40000"
flow_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/flow/40000"
assert flow_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")


@mock.patch.object(requests.Session, "delete")
def test_delete_flow_with_run(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_has_runs.xml"
mock_delete.return_value = create_request_response(
status_code=412,
Expand All @@ -486,14 +484,13 @@ def test_delete_flow_with_run(mock_delete, test_files_directory, test_api_key):
):
openml.flows.delete_flow(40_000)

flow_url = "https://test.openml.org/api/v1/xml/flow/40000"
flow_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/flow/40000"
assert flow_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")


@mock.patch.object(requests.Session, "delete")
def test_delete_subflow(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_is_subflow.xml"
mock_delete.return_value = create_request_response(
status_code=412,
Expand All @@ -506,14 +503,13 @@ def test_delete_subflow(mock_delete, test_files_directory, test_api_key):
):
openml.flows.delete_flow(40_000)

flow_url = "https://test.openml.org/api/v1/xml/flow/40000"
flow_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/flow/40000"
assert flow_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")


@mock.patch.object(requests.Session, "delete")
def test_delete_flow_success(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_successful.xml"
mock_delete.return_value = create_request_response(
status_code=200,
Expand All @@ -523,15 +519,14 @@ def test_delete_flow_success(mock_delete, test_files_directory, test_api_key):
success = openml.flows.delete_flow(33364)
assert success

flow_url = "https://test.openml.org/api/v1/xml/flow/33364"
flow_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/flow/33364"
assert flow_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")


@mock.patch.object(requests.Session, "delete")
@pytest.mark.xfail(reason="failures_issue_1544", strict=False)
def test_delete_unknown_flow(mock_delete, test_files_directory, test_api_key):
openml.config.start_using_configuration_for_example()
content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_not_exist.xml"
mock_delete.return_value = create_request_response(
status_code=412,
Expand All @@ -544,6 +539,6 @@ def test_delete_unknown_flow(mock_delete, test_files_directory, test_api_key):
):
openml.flows.delete_flow(9_999_999)

flow_url = "https://test.openml.org/api/v1/xml/flow/9999999"
flow_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/flow/9999999"
assert flow_url == mock_delete.call_args.args[0]
assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key")
Loading
Loading