Update to function as out-of-the-box test server#13
Conversation
NGINX is now also listens to port 8000 on the docker network. This is an important step to being able to start these `services` and have them function as a local test server for openml-python among others.
|
|
||
| # Update openml.expdb.dataset with the same url | ||
| mysql -hdatabase -uroot -pok -e 'UPDATE openml_expdb.dataset DS, openml.file FL SET DS.url = FL.filepath WHERE DS.did = FL.id;' | ||
|
|
There was a problem hiding this comment.
These removed updates are now embedded in the state of the database on the new image
| sed -i -E 's/^(::1\t)localhost (.*)$/\1\2/g' /etc/hosts.new | ||
| cat /etc/hosts.new > /etc/hosts | ||
| rm /etc/hosts.new | ||
|
|
There was a problem hiding this comment.
For other containers updating /etc/hosts through configuration was sufficient.
For this one, the pre-existing /etc/hosts took precidence, so it needed to be updated.
| - "8000:8000" | ||
| networks: | ||
| default: | ||
| ipv4_address: 172.28.0.2 |
There was a problem hiding this comment.
the static ip address is required so that we can add entries to /etc/hosts file of other containers, so they contact nginx when they resolve localhost.
| @@ -1,4 +1,4 @@ | |||
| CONFIG=api_key=AD000000000000000000000000000000;server=http://php-api:80/ | |||
| CONFIG=api_key=abc;server=http://php-api:80/ | |||
There was a problem hiding this comment.
I don't understand, here the api key is set from AD000000000000000000000000000000 to abc ...
There was a problem hiding this comment.
AD000000000000000000000000000000 was the api key in the old test database image, but this has been changed to abc to match the test server database.
There was a problem hiding this comment.
The evaluation engine needs administrator access currently.
| apikey=normaluser | ||
| server=http://localhost:8000/api/v1/xml |
There was a problem hiding this comment.
... and here the api key is set from AD000000000000000000000000000000 to normaluser
So far, these were the keys for developers:
php-api (v1) test-server: normaluser
php-api (v1) local-server: AD000000000000000000000000000000
has anything changed here?
Also what are the api keys for python-api (v2), now that it will also be added to services with a frozen docker image
There was a problem hiding this comment.
This configuration is just for when you spin up a openml-python container to use the Python API. They do not need administrator access, so I changed the key to normaluser which is a normal read-write account.
There was a problem hiding this comment.
The Python-based REST API uses the keys that are in the database. The server is unaffected, but I will need to update the keys that are used in its tests.
josvandervelde
left a comment
There was a problem hiding this comment.
Looking good! I encountered some problems when using python to connect to the local running containers.
| minio: | ||
| profiles: ["all", "minio", "evaluation-engine"] | ||
| image: openml/test-minio:v0.1.20241110 | ||
| image: openml/test-minio:v0.1.20260204 |
There was a problem hiding this comment.
This minio contains most parquet files out of the box, but not all!
bash-5.1# ls /data/datasets/0000/0001
dataset_1.pq phpFsFYVN
bash-5.1# ls /data/datasets/0000/0128
iris.arff
This is probably a mistake?
Also, it contains some weird files:
bash-5.1# ls /data/datasets/0000
0000 '0000?C=S;O=A' '0000?C=D;O=A' '0000?C=M;O=A' '0000?C=N;O=D' ....
|
|
||
| ## Prerequisites | ||
| - Linux/MacOS with Intell processor (because of our old ES version, this project currently does not support `arm` architectures) | ||
| - Linux/MacOS (For Mac with `arm` architectures, enable Rosetta for emulation. QEMU and Docker VMM do not work with the elastic search image) |
| You can run the openml-python code on your own local server now! | ||
|
|
||
| ```bash | ||
| docker run --rm -it -v ./config/python/config:/root/.config/openml/config:ro --network openml-services openml/openml-python |
There was a problem hiding this comment.
This doesn't work anymore when using the new config/python/config, where localhost is used instead of nginx.
I see 2 options: overwriting the /etc/hosts on this container, or just using --network host. I used the latter.
| @@ -0,0 +1,135 @@ | |||
| #!/bin/bash | |||
| # This test assumes services are running locally: | |||
There was a problem hiding this comment.
Nice! All pass for me as well.
| my_task = openml.tasks.get_task(my_task.task_id) | ||
| from sklearn import compose, ensemble, impute, neighbors, preprocessing, pipeline, tree | ||
| clf = tree.DecisionTreeClassifier() | ||
| run = openml.runs.run_model_on_task(clf, my_task) |
There was a problem hiding this comment.
I get errors here:
OSError: Repetition level histogram size mismatch on
Traceback (most recent call last):
File "/openml/openml/datasets/dataset.py", line 593, in _parse_data_from_pq
data = pd.read_parquet(data_file)
It seems to have something to do with the pyarrow version in openml-python. Maybe unrelated to this PR, but I haven't seen these problems before. Do you see these problems as well?
Updating routing and data of the images to allow an out of the box test server on a local machine.
Currently the updated configuration allows running of the openml-python unit tests that require the test server (see openml/openml-python#1630).
Have to cross-check I didn't break other functionality in the process.