Conversation
Locally, MinIO already has more parquet files than on the test server.
Note that the previously strategy didn't work anymore if the server returned a parquet file, which is the case for the new local setup.
This means it is not reliant on the evaluation engine processing the dataset. Interestingly, the database state purposely seems to keep the last task's dataset in preparation explicitly (by having processing marked as done but having to dataset_status entry).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1630 +/- ##
==========================================
+ Coverage 52.04% 52.79% +0.74%
==========================================
Files 36 36
Lines 4333 4336 +3
==========================================
+ Hits 2255 2289 +34
+ Misses 2078 2047 -31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
Thanks for these fixes.
I hope the test-server can be fully replicated locally for the tests now.
Left a few questions, and tests/files/localhost:8080 file needs to revised.
| task_cache_directory = openml.utils._create_cache_directory_for_id( | ||
| TASKS_CACHE_DIR_NAME, task_id | ||
| ) | ||
| task_cache_directory_existed = task_cache_directory.exists() |
There was a problem hiding this comment.
task_cache_directory now points to the folder with all the tasks instead of the folder with the specific task ...
There was a problem hiding this comment.
No, it does not. The _create_cache_directory_for_id function, as the name implies, returns the cache directory for the specific identifier, e.g., [...]/org/openml/test/tasks/1882
| if not tid_cache_dir_existed: | ||
| openml.utils._remove_cache_dir_for_id(TASKS_CACHE_DIR_NAME, tid_cache_dir) | ||
| if not task_cache_directory_existed: | ||
| openml.utils._remove_cache_dir_for_id(TASKS_CACHE_DIR_NAME, task_cache_directory) |
There was a problem hiding this comment.
.. and therefore remove the complete directory, instead of the specific task. is that intentional?
| @@ -0,0 +1 @@ | |||
| org/openml/test | |||
There was a problem hiding this comment.
why this symlink is uploaded? probably a mistake?
or is this used for linking remote-test-server cache with local-test-server cache. if this was intentional, can I ask what's it's purpose and this path can't be tracked on windows because of the :
leads to CI failure on windows: job/61983305389
There was a problem hiding this comment.
or is this used for linking remote-test-server cache with local-test-server cache.
Yes, it is. There is a cache folder checked in to the repository here. Because cache lookup depends on the configured server, they are not found if the server is configured to be localhost:8000, this symlink solves that.
I suppose we can update the cache directory resolution to drop the port number (and update the symlink file accordingly), then it should continue to work while windows PCs can still check out the repository.
update: done.
There was a problem hiding this comment.
Thanks for the update.
How safe is the symlink here? I was getting errors like "could not update file in this path", but I'll have a look again.
| TEST_SERVER_URL = "https://test.openml.org" | ||
| TEST_SERVER_URL = "http://localhost:8000" | ||
|
|
There was a problem hiding this comment.
unintentional mistake here:
| TEST_SERVER_URL = "https://test.openml.org" | |
| TEST_SERVER_URL = "http://localhost:8000" | |
| TEST_SERVER_URL = "https://test.openml.org" | |
| reversed_url_suffix = os.sep.join(url_suffix.split(".")[::-1]) # noqa: PTH118 | ||
| url_parts = url_suffix.split(".")[::-1] | ||
| url_parts_no_port = [part.split(":")[0] for part in url_parts] | ||
| reversed_url_suffix = os.sep.join(url_parts_no_port) # noqa: PTH118 |
There was a problem hiding this comment.
this ignores the port completely, as a result apis running on different ports will lead to same path, e.g (php-api on port:8080 and python-api on port:8082) which should not really be the case.
~/.cache/openml/localhost
Update the tests to allow connecting to a local test server instead of a remote one (requires openml/services#13).
Running the tests locally:
Locally start the services (as defined in Update to function as out-of-the-box test server services#13) using
docker compose --profile "rest-api" --profile "evaluation-engine" up -d. Startup can take a few minutes, as currently the PHP container still builds the ES indices from scratch.I noticed that the
start_periodfor some services isn't sufficient on my M1 Mac, possibly due to some containers requiring Rosetta to run, slowing things down. You can recognize this by the services reporting "Error" while the container remains running. To avoid this, you can either increase thestart_periodof the services (mostly elastic search and php api), or you can simply run the command again (the services are then already in healthy state and the services that depended on it can start successfully).The following containers should run: openml-test-database, openml-php-rest-api, openml-nginx, openml-evaluation-engine, openml-elasticsearch, openml-minio
Update the
openml/config.py'sTEST_SERVER_URLvariable to"http://localhost:8000".Run the tests (
python -m pytest -m "not production" tests).This PR builds off unmerged PR #1620.