Go package defining a common interface for generating text and image embeddings.
godoc is currently incomplete.
This is a simple abstraction library, written in Go, around a variety of services which produce vector embeddings. There are many such libraries and this one is ours. It tries to be the "simplest dumbest" thing for the most common operations and data needs. These ideas are encapsulated in the EmbeddingsRequest and EmbeddingsResponse types.
type EmbeddingsRequest struct {
Id string `json:"id,omitempty"`
Model string `json:"model"`
Body []byte `json:"body"`
}
type EmbeddingsResponse[T Float] interface {
Id() string
Model() string
Embeddings() []T
Dimensions() int32
Precision() string
Created() int64
}
The default implementation of the EmbeddingsResponse interface is the CommonEmbeddingsResponse type:
type CommonEmbeddingsResponse[T Float] struct {
EmbeddingsResponse[T] `json:",omitempty"`
CommonId string `json:"id,omitempty"`
CommonEmbeddings []T `json:"embeddings"`
CommonModel string `json:"model"`
CommonCreated int64 `json:"created"`
CommonPrecision string `json:"precision"`
}
While not specific to SFO Museum this package is targeted at the kinds of things SFO Museum needs to today meaning it may be lacking features you need or want.
To account for the fact that most embeddings models still return float32 vector data but an increasing number of models return float64 vectors this package wraps both options in a Float interface.
type Float interface{ ~float32 | ~float64 }
That Float is then used as a generic value (for embeddings) in a common EmbeddingsResponse interface:
type EmbeddingsResponse[T Float] interface {
Id() string
Model() string
Embeddings() []T
Dimensions() int32
Precision() string
Created() int64
}
That interface is then used as the return value for an Embedder interface:
type Embedder[T Float] interface {
TextEmbeddings(context.Context, *EmbeddingsRequest) (EmbeddingsResponse[T], error)
ImageEmbeddings(context.Context, *EmbeddingsRequest) (EmbeddingsResponse[T], error)
}
This means that you need to specify the float type you want the interface to return when you instantiate that interface. For example:
ctx := context.Backgroud()
uri32 := "ollama://?model=embeddinggemma"
uri64 := "encoderfile://"
cl, _ := embeddings.NewEmbedder[float32](ctx, uri32)
cl, _ := embeddings.NewEmbedder[float64](ctx, uri64)
There are also handy NewEmbedder32 and NewEmbedder64 methods which are little more than syntactic sugar. For example:
ctx := context.Backgroud()
uri32 := "ollama://?model=embeddinggemma"
uri64 := "encoderfile://"
cl, _ := embeddings.NewEmbedder32(ctx, uri32)
cl, _ := embeddings.NewEmbedder64(ctx, uri64)
The NewEmbedder, NewEmbedder32 and NewEmbedder64 all have the same signature: A context.Context instance and a URI string used to configure and instantiate the underlying embeddings provider implementation. These are discussed in detail below.
Both the TextEmbeddings and ImageEmbeddings methods take the same input, a EmbeddingsRequest struct:
type EmbeddingsRequest struct {
Id string `json:"id,omitempty"`
Model string `json:"model"`
Body []byte `json:"body"`
}
As mentioned both methods return an EmbeddingsResponse[T] instance. The default implementation of the EmbeddingsResponse[T] interface used by this package is the CommonEmbeddingsResponse type. See response.go for details.
Error handling omitted for the sake of brevity.
import (
"context"
"encoding/json"
"os"
"github.com/sfomuseum/go-embeddings"
)
func main() {
ctx := context.Background()
emb, _ := embeddings.NewEmbedder32(ctx, "ollama://?model=embeddinggemma")
req := &embeddings.EmbeddingsRequest{
Body: []byte("Hello world"),
}
rsp, _ := emb.TextEmbeddings(ctx, req)
enc := json.NewEncoder(os.Stdout)
enc.Encode(rsp)
Which would return the following:
{
"embeddings": [
-0.21400317549705505,
0.02651195414364338,
... more embeddings
-0.04678588733077049,
-0.042774248868227005
],
"model": "ollama/embeddinggemma",
"created": 1771985811,
"precision": "float32"
}
The convention for precision values is a string, for example "float32". Typically an embeddings service will return vector embeddings with a single precision but the Embedder interface allows you to derive embeddings as either float32 or float64 value. In order to preserve the origin precision information if embeddings are requested in a precision other than that generated by a service the requested precision will be appened to the origin value.
For example, if you request float64 values from a service that returns float32 values those data will be recast and the precision string will be updated to read "float32#as-float64".
Derive vector embeddings from an instance of the Mozilla encoderfile application, running as an HTTP server.
encoderfile://?{PARAMETERS}
| Name | Value | Required | Notes |
|---|---|---|---|
| client-uri | string | no | The URI for the embedderfile HTTP server endpoint. Default is http://localhost:8080. The gRPC server endpoint provided by encoderfile is not supported yet. |
Derive vector embedding from an instance of the Mozilla llamafile application. Note that newer versions of llamafile not longer expose an interface for deriving embeddings so this implementation will only work with older builds. See the encoderfile:// implementation for an alternative.
llamafile://?{PARAMETERS}
| Name | Value | Required | Notes |
|---|---|---|---|
| client-uri | string | no | The URI for the llamafile HTTP server endpoint. Default is http://localhost:8080. |
Derive vector embeddings from a Python script using the harperreed/mlx_clip library which emits JSON-encoded embeddings to STDOUT.
The option requires a device using an Apple Silicon chip and involves a non-zero manual set up process discussed below.
mlxclip://{PATH_TO_EMBEDDINGS_DOT_PY}?{PARAMETERS}
Valid query parameters are:
| Name | Value | Required | Notes |
|---|---|---|---|
| python | string | no | The path to the Python runtime to use. For example one created by a Python virtual environment. |
As of this writing I am not sure I have working set up instructions. Specifically you want something like the include code in the mlxclip_py.txt file which in turn loads the mlx_clip library. Nothing fancy but since first getting this to work something has changed (?) that prevents Python from importing the mlx_clip package. This remains to be resolved.
Derive vector embeddings from the MobileCLIP models exposed via an instance of the sfomuseum/swift-mobileclip gRPC endpoint.
mobileclip://?{PARAMETERS}
| Name | Value | Required | Notes |
|---|---|---|---|
| client-uri | string | yes | The URI for the swift-mobileclip gRPC server endpoint. Default is grpc://localhosr:8080. |
- https://github.com/apple/ml-mobileclip
- https://github.com/sfomuseum/swift-mobileclip
- https://github.com/sfomuseum/go-mobileclip
Derive null (empty) vector embeddings. This is a "placeholder" implementation that will always return a zero-length list of embeddings.
null://
Derive vector embeddings from an instance of the Ollama application.
ollama://?{PARAMETERS}
| Name | Value | Required | Notes |
|---|---|---|---|
| client-uri | string | no | Default is http://localhost:11434. |
| model | string | yes | The name of the model to use for generating embeddings. |
Derive vector embeddings from a web service exposing the OpenCLIP model and library.
The option involves a non-zero manual set up process discussed below.
openclip://?{PARAMETERS}
| Name | Value | Required | Notes |
|---|---|---|---|
| client-uri | string | no | The URI of the HTTP endpoint exposing the OpenCLIP model functionality. Default is http://localhost:5000. |
Using this implementation requires running a HTTP service exposing the OpenCLIP functionality. The easiest way to do that is in a Python "virtual environment" configured as follows:
$> python -m venv openclip
$> cd openclip/
$> bash bin/activate
$> bin/pip install flask
$> bin/pip install open_clip_torch
$> bin/pip install Pillow
Then, copy the included code in openclip_server.txt in to a file called openclip_server.py and launch it as follows:
$> bin/flask --app openclip_server run
* Serving Flask app 'openclip_server'
* Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
INFO:werkzeug:Press CTRL+C to quit
Derive vector embeddings from a Python script using the Google SigLIP (2) models.
Set up is not yet automated so you'll need to do something like this:
$> cd /usr/local/src
$> python -m venv siglip
$> cd siglip/
$> bash bin/activate
$> bin/pip install torch transformers pillow protobuf SentencePiece Flask
If you want to derive SigLIP embeddings from a simple command line tool copy the included code in siglip_py.txt in to a file called embeddings.py (or whatever you choose). Putting it all together the URI to create a new Embedder intance would be:
siglib://{OPTIONAL_HOST}{PATH_TO_EMBEDDINGS_DOT_PY}?{PARAMETERS}`
Valid query parameters are:
| Name | Value | Required | Notes |
|---|---|---|---|
| model | string | yes | The HuggingFace checkpoint URI of the model to use. For example "google/siglip-so400m-patch14-384" |
| python | string | no | The path to the Python runtime to use. For example one created by a Python virtual environment. |
For example:
siglip:///usr/local/src/siglip/embeddings.py?model=google/siglip-base-patch16-224&python=/usr/local/src/siglip/bin/python
Note how the Python runtime created in the virtual environment is specified in the ?python= query parameter.
And then putting it altogether with the bin/embeddings tool described above:
$> echo "Hello world" | ./bin/embeddings -client-uri 'siglip://venv/usr/local/src/siglip/embeddings.py?model=google/siglip-base-patch16-224&python=/usr/local/src/siglip/bin/python' text -
{"embeddings":[0.010030805,-0.02573614,0.029724538,... and so on
If you want to derive SigLIP embeddings from a long-running server instance copy the included code in siglip_server_py.txt in to a file called embeddings_server.py (or whatever you choose). This is a simple Flask application which can be launch as follows:
$> ./bin/flask --app embeddings_server run
Loading weights: 100%
* Serving Flask app 'embeddings_server'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
Note: As of this writing the included server code only supports a single SigLIP model. The default value is google/siglip-base-patch16-224. If you want to use a different model you will need to change it manually.
The URI to create a new Embedder instance with this server would be:
siglip-client://?{PARAMTERS}
Valid parameters are:
| Name | Value | Required | Notes |
|---|---|---|---|
| client-uri | string | no | The URI of the HTTP endpoint exposing the OpenCLIP model functionality. Default is http://localhost:5000. |
For example:
$> ./bin/embeddings -client-uri 'siglip-client://' image test.pmg
{"embeddings":[-0.017064538,0.00726526,-0.0042089703 ... and so on
- https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md
- https://huggingface.co/google/siglip-base-patch16-224
- https://huggingface.co/google/siglip-so400m-patch14-384
Because so many of the implementations above depend on the availability of external, third-party services their tests depend on the presence of Go build tags to run. They are :
| Implementation | Build tag |
|---|---|
| encoderfile:// | encoderfile |
| llamafile:// | llamafile |
| mlxclip:// | mlxclip |
| mobileclip:// | mobileclip |
| ollama:// | ollama |
| openclip:// | openclip |
| siglip:// | siglip |