Hi, hello. How do I skip api validation and go straight to my locally deployed Qwen2-VL model #381

Mrguanglei · 2025-02-25T03:27:32Z

No description provided.

dillonalaird · 2025-02-27T17:20:32Z

Hello @Mrguanglei , I don't understand the question, could you rephrase it and maybe provide an example of what you are trying to do?

ld-xy · 2025-02-28T02:17:42Z

Hello @Mrguanglei , I don't understand the question, could you rephrase it and maybe provide an example of what you are trying to do?

Hello, I don't want to use the big model of OpenAI, but the big model of qwen-vl deployed locally. How can I use it in this project? Can you provide a demo for reference?

Mrguanglei · 2025-02-28T02:29:56Z

@dillonalaird Here, you use ollama to call other models, I want to use VLLM to encapsulate the local Qwen-VL-72B model to call, but your config does not seem to have, I use vllm to encapsulate the local model into an api interface. However, there were many errors when the code was running, although the model was able to reply.

(vision-agent) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/vision-agent$ python generate.py
Step 10: None
Code to Execute:
============================== Code ==============================
1
2 image = load_image("OIP-C.jpg")
3 claude35_vqa('Please look at this image and count the number of people in it.', [image])
4 suggestion('How can I find the number of people in this image?', [image])
5 get_tool_for_task('Count the number of people in this image', image)
Code Execution Output (5.13s): ----- stdout -----

----- stderr -----

----- Error -----
Traceback (most recent call last):
File "/apps/llms/vision-agent/vision_agent/utils/execute.py", line 651, in exec_cell
self.nb_client.execute_cell(cell, len(self.nb.cells) - 1)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/jupyter_core/utils/init.py", line
165, in wrapped
return loop.run_until_complete(inner)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/asyncio/base_events.py", line 647, in
run_until_complete
return future.result()
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/nbclient/client.py", line 1062, in
async_execute_cell
await self._check_raise_for_error(cell, cell_index, exec_reply)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/nbclient/client.py", line 918, in
_check_raise_for_error
raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:

import os
import numpy as np
import cv2
from typing import *
from vision_agent.tools import *
from vision_agent.tools.planner_tools import claude35_vqa, suggestion, get_tool_for_task
from pillow_heif import register_heif_opener
register_heif_opener()
import matplotlib.pyplot as plt

image = load_image("OIP-C.jpg")
claude35_vqa('Please look at this image and count the number of people in it.', [image])
suggestion('How can I find the number of people in this image?', [image])
get_tool_for_task('Count the number of people in this image', image)

----- stdout -----
[claude35_vqa output]
There are six people in the image.
[end of claude35_vqa output]

AttributeError Traceback (most recent call last)
Cell In[1], line 15
13 claude35_vqa('Please look at this image and count the number of people in it.', [image])
14 suggestion('How can I find the number of people in this image?', [image])
---> 15 get_tool_for_task('Count the number of people in this image', image)

File /apps/llms/vision-agent/vision_agent/tools/planner_tools.py:388, in get_tool_for_task(task, images,
exclude_tools)
383 with (
384 tempfile.TemporaryDirectory() as tmpdirname,
385 CodeInterpreterFactory.new_instance() as code_interpreter,
386 ):
387 image_paths = []
--> 388 for k in images.keys():
389 for i, image in enumerate(images[k]):
390 image_path = f"{tmpdirname}/{k}_{i}.png"

AttributeError: 'numpy.ndarray' object has no attribute 'keys'
^CTraceback (most recent call last):
File "/apps/llms/vision-agent/generate.py", line 5, in
code_context = agent.generate_code(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_coder_v2.py", line 404, in generate_code
plan_context = self.planner.generate_plan(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_planner_v2.py", line 561, in generate_plan
updated_chat = maybe_run_code(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_planner_v2.py", line 301, in maybe_run_code
execution, obs, code = execute_code_action(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_planner_v2.py", line 232, in execute_code_action
response = cast(str, model.chat([{"role": "user", "content": prompt}]))
File "/apps/llms/vision-agent/vision_agent/lmm/lmm.py", line 130, in chat
response = self.client.chat.completions.create(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_utils/_utils.py", line 279, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/resources/chat/completions/completions.py", line 879, in create
return self._post(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_base_client.py", line 1290, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_base_client.py", line 967, in request
return self._request(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_base_client.py", line 1003, in _request
response = self._client.send(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 926, in send
response = self._send_handling_auth(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 954, in _send_handling_auth
response = self._send_handling_redirects(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 991, in _send_handling_redirects
response = self._send_single_request(request)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 1027, in _send_single_request
response = transport.handle_request(request)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_transports/default.py", line 236, in handle_request
resp = self._pool.handle_request(req)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
data = self._network_stream.read(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_backends/sync.py", line 128, in read
return self._sock.recv(max_bytes)
KeyboardInterrupt

Mrguanglei · 2025-02-28T02:31:20Z

@dillonalaird This is openLLM in the llm I modified

class OpenAILMM(LMM):
r"""An LMM class for the OpenAI LMMs."""

def __init__(
    self,
    model_name: str = "qwenVL2B",
    api_key: Optional[str] = None,
    max_tokens: int = 4096,
    json_mode: bool = False,
    image_size: int = 768,
    image_detail: str = "low",
    **kwargs: Any,
):
    if not api_key:
        self.client = OpenAI()
    else:
        self.client = OpenAI(api_key=api_key)

    self.client = OpenAI(base_url='http://0.0.0.0:8000/v1')
    self.model_name = model_name
    self.image_size = image_size
    self.image_detail = image_detail
    # o1 does not use max_tokens
    if "max_tokens" not in kwargs and not (
        model_name.startswith("o1") or model_name.startswith("o3")
    ):
        kwargs["max_tokens"] = max_tokens
    if json_mode:
        kwargs["response_format"] = {"type": "json_object"}
    self.kwargs = kwargs

def __call__(
    self,
    input: Union[str, Sequence[Message]],
    **kwargs: Any,
) -> Union[str, Iterator[Optional[str]]]:
    if isinstance(input, str):
        return self.generate(input, **kwargs)
    return self.chat(input, **kwargs)

def chat(
    self,
    chat: Sequence[Message],
    **kwargs: Any,
) -> Union[str, Iterator[Optional[str]]]:
    """Chat with the LMM model.

    Parameters:
        chat (Squence[Dict[str, str]]): A list of dictionaries containing the chat
            messages. The messages can be in the format:
            [{"role": "user", "content": "Hello!"}, ...]
            or if it contains media, it should be in the format:
            [{"role": "user", "content": "Hello!", "media": ["image1.jpg", ...]}, ...]
    """
    fixed_chat = []
    for c in chat:
        fixed_c = {"role": c["role"]}
        fixed_c["content"] = [{"type": "text", "text": c["content"]}]  # type: ignore
        if "media" in c:
            for media in c["media"]:
                resize = kwargs["resize"] if "resize" in kwargs else self.image_size
                image_detail = (
                    kwargs["image_detail"]
                    if "image_detail" in kwargs
                    else self.image_detail
                )
                encoded_media = encode_media(cast(str, media), resize=resize)

                fixed_c["content"].append(  # type: ignore
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": (
                                encoded_media
                                if encoded_media.startswith(("http", "https"))
                                or encoded_media.startswith("data:image/")
                                else f"data:image/png;base64,{encoded_media}"
                            ),
                            "detail": image_detail,
                        },
                    },
                )
        fixed_chat.append(fixed_c)

    # prefers kwargs from second dictionary over first
    tmp_kwargs = self.kwargs | kwargs
    tmp_kwargs = {}
    response = self.client.chat.completions.create(
        model=self.model_name, messages=fixed_chat, **tmp_kwargs  # type: ignore
    )
    if "stream" in tmp_kwargs and tmp_kwargs["stream"]:

        def f() -> Iterator[Optional[str]]:
            for chunk in response:
                chunk_message = chunk.choices[0].delta.content  # type: ignore
                yield chunk_message

        return f()
    else:
        return cast(str, response.choices[0].message.content)

def generate(
    self,
    prompt: str,
    media: Optional[Sequence[Union[str, Path]]] = None,
    **kwargs: Any,
) -> Union[str, Iterator[Optional[str]]]:
    message: List[Dict[str, Any]] = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
            ],
        }
    ]
    if media and len(media) > 0:
        for m in media:
            resize = kwargs["resize"] if "resize" in kwargs else None
            image_detail = (
                kwargs["image_detail"]
                if "image_detail" in kwargs
                else self.image_detail
            )
            encoded_media = encode_media(m, resize=resize)
            message[0]["content"].append(
                {
                    "type": "image_url",
                    "image_url": {
                        "url": (
                            encoded_media
                            if encoded_media.startswith(("http", "https"))
                            or encoded_media.startswith("data:image/")
                            else f"data:image/png;base64,{encoded_media}"
                        ),
                        "detail": image_detail,
                    },
                },
            )

    # prefers kwargs from second dictionary over first
    tmp_kwargs = self.kwargs | kwargs
    response = self.client.chat.completions.create(
        model=self.model_name, messages=message, **tmp_kwargs  # type: ignore
    )
    if "stream" in tmp_kwargs and tmp_kwargs["stream"]:

        def f() -> Iterator[Optional[str]]:
            for chunk in response:
                chunk_message = chunk.choices[0].delta.content  # type: ignore
                yield chunk_message

        return f()
    else:
        return cast(str, response.choices[0].message.content)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hi, hello. How do I skip api validation and go straight to my locally deployed Qwen2-VL model #381

Hi, hello. How do I skip api validation and go straight to my locally deployed Qwen2-VL model #381

Mrguanglei commented Feb 25, 2025

dillonalaird commented Feb 27, 2025

ld-xy commented Feb 28, 2025

Mrguanglei commented Feb 28, 2025

Mrguanglei commented Feb 28, 2025

Hi, hello. How do I skip api validation and go straight to my locally deployed Qwen2-VL model #381

Hi, hello. How do I skip api validation and go straight to my locally deployed Qwen2-VL model #381

Comments

Mrguanglei commented Feb 25, 2025

dillonalaird commented Feb 27, 2025

ld-xy commented Feb 28, 2025

Mrguanglei commented Feb 28, 2025

image = load_image("OIP-C.jpg") claude35_vqa('Please look at this image and count the number of people in it.', [image]) suggestion('How can I find the number of people in this image?', [image]) get_tool_for_task('Count the number of people in this image', image)

----- stdout ----- [claude35_vqa output] There are six people in the image. [end of claude35_vqa output]

Mrguanglei commented Feb 28, 2025

image = load_image("OIP-C.jpg")
claude35_vqa('Please look at this image and count the number of people in it.', [image])
suggestion('How can I find the number of people in this image?', [image])
get_tool_for_task('Count the number of people in this image', image)

----- stdout -----
[claude35_vqa output]
There are six people in the image.
[end of claude35_vqa output]