Inference API v2 - design docs kick-off by PawelPeczek-Roboflow · Pull Request #2277 · roboflow/inference

PawelPeczek-Roboflow · 2026-04-27T09:50:33Z

What does this PR do?

Related Issue(s):

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Other:

Testing

I have tested this change locally
I have added/updated tests for this change

Test details:

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

dkosowski87

1/3 reviewed

dkosowski87 · 2026-04-27T10:25:12Z

+* `GET /v2/models` - discover loaded models
+* `DELETE /v2/models` - unload all models
+* `POST /v2/models/load` - load given model
+* `POST /v2/models/unload` - unload given model


Suggested change

* `POST /v2/models/unload` - unload given model

* `DELETE /v2/models/unload` - unload given model

dkosowski87 · 2026-04-27T10:36:22Z

+
+* `POST /v2/models/infer` - predict from model
+* `GET /v2/models/interface` - discover model interface
+* `GET /v2/models/compatibility` - discover models compatible with current server configuration


if GET /v2/models means discover loaded models->GET /v2/models/compatibility` seems confusing as it doesn't operate on the loaded models but, as I understand, returns a broader lists. Maybe

GET /v2/models?state=loaded

GET /v2/models?state=compatible

dkosowski87 · 2026-04-27T10:41:57Z

+
+## Models endpoints
+
+* `POST /v2/models/infer` - predict from model


One broader design question. Don't we see any value in separating model management endpoints from prediction endpoints. Similar as in torch serve where we have different ports for both. This probably would only make sense in self-hosted environment. Where a company admin manages model loading and unloading and we have some flag like SMART_MODEL_MANAGEMENT_ON_PREDICT=false where the model manager doesn't decide on loading/unloading models on predict requests.

dkosowski87 · 2026-04-27T10:48:09Z

+curl -X POST https://serverless.roboflow.com/v2/models/infer \
+    --data-urlencode 'model_id=whatever/model-id/we?can?figure-out' \
+    -F "image=@photo.jpg;type=image/jpeg" \
+    -F 'inputs={"confidence": 0.5};type=application/json'


This would probably also map nicely for params that be strictly assigned to elements of the batch. Assuming order needs to be kept.:

-F 'inputs=[{"confidence":0.5}, {"confidence":0.4}];type=application/json' \ -F "image=@photo-1.jpg" \ -F "image=@photo-2.jpg"

Mixing scalar and batch:

-F 'inputs=[{"confidence":0.5, "fuse_nms": true}, {"confidence":0.4, "fuse_nms" : true}];type=application/json' \ -F "image=@photo-1.jpg" \ -F "image=@photo-2.jpg"

Alternatively:

-F 'inputs=[{"confidence":0.5}, {"confidence":0.4}];type=application/json' \ -f 'defaults={"fuse_nms": true};type=application/json' \ -F "image=@photo-1.jpg" \ -F "image=@photo-2.jpg"

So we don't duplicate, but also separate batch inputs from scalars.

dkosowski87 · 2026-04-27T10:55:35Z

+```
+
+> [!IMPORTANT]  
+> Since we have `inference-models` and one model may have multiple model-packages, `model_package_id` is natural candidate for structured query param - letting clients specify which exact model package they want, altering dafault auto-loading choice. We can decide also that certain parameters of auto-loading should be possible to be passed (although we need to decide on that relatively fast due to engineering work in progress and impact on model manager).


Probably all relevant parameters, not sure if choosing those certain parameters makes sense, if the client is advanced enough to decide on those, he probably want to have the option of full control.

dkosowski87 · 2026-04-27T10:56:36Z

+```
+
+> [!IMPORTANT]  
+> There is security issue **embedded into accepting URLs as inputs - especially on the platform.** We accepted the risk of being middle-man in DDoS attack so far, likely it is going to be the case in the future (for user convenience), but would be good for all of parties involved into discussion to recognize and acknowledge this risk - to avoid surprises in the future.


I see we have the ALLOW_URL_INPUT_WITHOUT_FQDN and ALLOW_NON_HTTPS_URL_INPUT vars. This plus timeouts and image size checks is imo ok.

dkosowski87

2/3

dkosowski87 · 2026-04-27T16:37:35Z

+{
+    "type": "roboflow-classification-compact-v1",
+    "class_names": ["cat", "dog"],
+    "confidences": [0.6, 0.4],


Imo top_class is a little bit misleading, one could assume that top class is the highest confidence class, irrespective of the confidence threshold. I would prefer to have the prediction part split from the decision more clearly:

{ "type": "roboflow-classification-compact-v1", "class_names": ["cat", "dog", "mouse"], "confidences": [0.3, 0.4, 0.3], "top_class_id": 1, "confidence_threshold": 0.5, "predicted_class_ids": [], }

Returning the confidence_threshold given that we use default values is imo needed.

We can omit the top_class_* assuming that if the user is advanced enough to decide to use a value that is below the specified threshold, he is also savy enough to get the argmax or do an ordering;)

dkosowski87 · 2026-04-27T17:02:35Z

+    "top": [{ "class_name": "cat", "class_id": 1, "confidence": 0.92 }]
+}
+```
+_Bonus question - how should filtering work here - just discard **top**, or alter **candidates**?_


Filter decision, leave prediction. candidates always present as this is what the model returned.

dkosowski87 · 2026-04-27T17:04:25Z

+    "type": "roboflow-classification-compact-v1",
+    "class_names": ["cat", "dog"],
+    "confidences": [0.6, 0.4],
+    "detected_classes_ids": [0]


detected sounds like this was object detection - imo predicted plus confidence_threshold and we are aligned with the single(multi) class case.

dkosowski87

3/3 without workflows, need to separately think about them

dkosowski87 · 2026-04-27T22:53:37Z

+}
+```
+
+Proposed **_rich_** representation:


What should be the role of the rich representation in your opinion?

dkosowski87 · 2026-04-27T22:55:45Z

+        [0, 1, 2, 3],
+        [0, 1, 2, 3]
+    ],
+    "class_id": [0, 1],


class_ids, confidences Let's keep it plural as in the case of classification responses

dkosowski87 · 2026-04-27T23:33:00Z

+    "class_id": [0, 1],
+    "confidence": [0.33, 0.64],
+    "tracker_id": [0, 1],
+}


This is a slippery slope, but as I recommended including the confidence_threshold in the classification response. Here in order to be consistent we probably would need to provide:

"confidence_threshold": 0.3, "iou_threshold": 0.7, "max_detections": 100

Now I'm thinking that this could be included in the rich representation.
So I get only the minimum info on the compact one, but in the rich representation information about:

what directly changed the output prediction

useful for debugging and interpretation

dkosowski87 · 2026-04-27T23:33:20Z

+    ],
+    "class_id": [0, 1],
+    "confidence": [0.33, 0.64],
+    "tracker_id": [0, 1],


Would the tracker_id be optional?

dkosowski87 · 2026-04-27T23:57:05Z

+```json
+{
+    "type": "roboflow-semantic-segmentation-compact-v1",
+    "pixels_scores": [[0.3, 0.4, ...]],  # maybe due to the size, available on demand only?


CxHxW - this might be huge, C times larger than responses from depth estimation. I see a use case when someone is doing active learning and needs these pixel scores for identifying uncertain areas, but in that case using a dedicated workflow would be more appropriate.

Inference API v2 - design docs kick-off

a7ef0df

dkosowski87 reviewed Apr 27, 2026

View reviewed changes

dkosowski87 reviewed Apr 28, 2026

View reviewed changes

	* `POST /v2/models/unload` - unload given model
	* `DELETE /v2/models/unload` - unload given model


		## Models endpoints

		* `POST /v2/models/infer` - predict from model

Conversation

PawelPeczek-Roboflow commented Apr 27, 2026

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

dkosowski87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkosowski87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkosowski87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants