Feb 25, 2026

Intel Arc Pro B50 & OpenVino

Part 2: Getting the Intel Arc Pro B50 to serve an LLM (Text Inference) via API

Part 1 is here: Intel Arc Pro B50 & Jellyfin

This article goes through how I set-up OpenVino and the journey to get there. It was not as simple as setting up Ollama with Nvidia GPUs so this guide will hopefully help anyone else with an Intel GPU.

Setting up & deploying OpenVINO via Docker

Continuing from the previous Article, I already updated the Ubuntu Docker host's drivers and validated the GPU device's Group:

I had to of course stop Ollama (not that it could start-up) but saved the configuration just incase I wanted to rollback to my Nvidia GPU.

I followed the OpenVINO documentation and found that whilst running Docker commands worked, I found that converting the Docker command to a Docker Compose .yaml made my life a lot easier.

I was required to create the OpenVino models folder, e.g. I have mine in a specific directory /home/docker/openvino/models
I was then required to ensure the permissions to this directory was set correctly using the following two commands to update the
1. Add all permissions t sudo chmod 777 -R /home/docker/openvino/models

Assign user PUID:PGID to the directory sudo chown -R 1000:1000 /home/docker/openvino/models

I understand that using CHMOD 777 is a big no-no but I kept getting pull errors from OpenVino, I think you should probably start with CHMOD 755 instead

After that, I added the following compose.yaml using sudo vi compose.yaml in my respective directory.
1. I chose Qwen3-4B cause it has enough context for my requirements and fits onto the Intel Arc B50's VRAM
2. I'm using model_server:latest-gpu to ensure I can leverage GPU-based inference
3. model_repository_path is only required if you're pulling from
4. If you require a gated model (e.g. Llama), you'll need HF_Token=${HF_Token} in the environment: and .env file
5. My GPU's groupID is 993 you would need to update it to match yours.

services:
  openvino:
    image: openvino/model_server::2025.4.1-gpu
    user: "1000:1000"
    devices:
      - /dev/dri/renderD128:/dev/dri/renderD128
    group_add:
      - 993
    volumes:
      - /home/docker/openvino/models:/opt/models
    ports:
      - "8000:8000"
    environment:
      - LIBVA_DRIVER_NAME=iHD
    restart: unless-stopped
    command: >
      --source_model "OpenVINO/Qwen3-4B-int4-ov"
      --model_repository_path /opt/models/
      --model_name Qwen3-4B
      --rest_port 8000
      --target_device GPU
      --task text_generation

compose.yaml for OpenVINO

Open supports the following AI Models: https://docs.openvino.ai/2025/documentation/compatibility-and-support/supported-models.html

After the file is saved, do a sudo docker compose pull && sudo docker compose up
You should see the logs for Docker Compose running the required pulls, if there's errors, you'll need to troubleshoot.
1. If there's permission errors, check the directory paths and ownership (e.g. CHOWN / CHMOD)
2. If there's errors pulling, make sure you specific the docker volume bind mounts and model_repository_path correctly
Once the server is up and running, I recommend leaving the Docker Compose watch window and opening another terminal to test.

Openvino with logs to confirm it can Serve models

Testing OpenVino API endpoints

Since OpenVino seems like it is working, lets test the API endpoints locally before we add those APIs to an application like Karakeep.

To check the available model, run this command from your Docker Host (or remotely by replacing 'locahost' with the IP address)
curl http://localhost:8000/v1/config

This should be the response:

{
"Qwen3-4B" :
{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": "OK"
   }
  }
 ]
}
}%

If the model was not loaded yet, I was getting a blank {}% response

If this the model is available, you can now test it using a test text generation

curl http://10.0.1.15:8000/v3/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-4B",
    "messages": [{"role": "user", "content": "hello"}]
  }'

The response should look like this:

If you get no response, check the container's logs. It will tell you if the context cache is maxing out or just plain crashing. If it is crashing, it might be worth finding a new model.

You should be good to start testing with your other applications that use OpenAI API endpoints.

Updating Karakeep to use OpenVino

Now that testing is successful, we can now update the API endpoints in other other applications. Note that OpenVino uses /v3 instead of /v1 in its OpenAI API endpoint.

I updated my Karakeep compose.yaml which was originally configured for Ollama.
1. Add the following environment variables:
  1. OPENAI_BASE_URL=http://INSERT-YOUR-IP:8000/v3
  2. OPENAI_API_KEY=notused-this-is-required-can
  3. INFERENCE_TEXT_MODEL=Qwen3-4B

Karakeep requires an OPENAI_API_KEY to be set even if API auth is disabled in OpenVINO.

My Karakeep file now looks like this, the .env file has the above in it alongside other variables. Remember to update the IP Address in OPENAI_BASE_URL

services:
  kara-web:
    image: ghcr.io/karakeep-app/karakeep:latest
    restart: unless-stopped
    ports:
      - 3000:3000
    env_file:
      - .env
    environment:
      MEILI_ADDR: http://kara-meilisearch:7700
      BROWSER_WEB_URL: http://kara-chrome:9222
      DATA_DIR: /data
      KARAKEEP_VERSION: release
      NEXTAUTH_URL: ${NEXTAUTH_URL}
      NEXTAUTH_SECRET: ${NEXTAUTH_SECRET}
      MEILI_MASTER_KEY: ${MEILI_MASTER_KEY}
      OPENAI_BASE_URL: http://localhost:8000/v3
      OPENAI_API_KEY: unused
      INFERENCE_TEXT_MODEL=Qwen3-4B
    volumes:
      - ./data:/data
  kara-chrome:
    image: gcr.io/zenika-hub/alpine-chrome:124
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  kara-meilisearch:
    image: getmeili/meilisearch:v1.13.3
    restart: unless-stopped
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - ./meilisearch:/meili_data
networks: {}

My Karakeep compose.yaml file as an example. What is not shown is the .env file

I tested Karakeep's AI Summarisation and kept the <Thinking> tags but I'm fine with that, I only use Karakeep's AI Tagging.

Output.. its got thinking tags. Probably could use a non-thinking model for completion.

Logs generated during AI summarisation:

Cache usage with Qwen3:4b. After it finishes it the scheduled requests should be 0

I added some bookmarks and checked the AI Tagging, AI tagging can be more intensive as it sends all existing ai-tags

Conclusion

Getting OpenVino working was not straight forward for me, despite having working drivers, their documentation is definitely not as clear-cut as many other providers.

They had a lack of consistency in configurations (PORT vs REST_PORT), naming conventions and just general user guides made things quite unclear.

After 2 days of trial and error, I eventually found a model that did not crash despite my replacement GPU having more VRAM than its incumbent (16GB vs 8GB), there clearly is still a lot more to go with Intel and GPU-based hardware acceleration if they want others to use it.

I'm otherwise happy now that it is working, the Karakeep AI Tagging & AI summarisation is actually faster on the 'Intel Arc Pro B50 with Qwen3-4B' than the 'Nvidia RTX 3070Ti with gemma3:4b'

For now my software side of this SFF-downsize is done but I still need to replace the SATA 2.5" drives with some M2 NVMe drives since they're hanging mid-air from the case.

I've published my findings also on Karakeep's Github Discussions