Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi, hello. How do I skip api validation and go straight to my locally deployed Qwen2-VL model #381

Open
Mrguanglei opened this issue Feb 25, 2025 · 4 comments

Comments

@Mrguanglei
Copy link

No description provided.

@dillonalaird
Copy link
Member

Hello @Mrguanglei , I don't understand the question, could you rephrase it and maybe provide an example of what you are trying to do?

@ld-xy
Copy link

ld-xy commented Feb 28, 2025

Hello @Mrguanglei , I don't understand the question, could you rephrase it and maybe provide an example of what you are trying to do?

Hello, I don't want to use the big model of OpenAI, but the big model of qwen-vl deployed locally. How can I use it in this project? Can you provide a demo for reference?

@Mrguanglei
Copy link
Author

@dillonalaird Here, you use ollama to call other models, I want to use VLLM to encapsulate the local Qwen-VL-72B model to call, but your config does not seem to have, I use vllm to encapsulate the local model into an api interface. However, there were many errors when the code was running, although the model was able to reply.

(vision-agent) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/vision-agent$ python generate.py
Step 10: None
Code to Execute:
============================== Code ==============================
1
2 image = load_image("OIP-C.jpg")
3 claude35_vqa('Please look at this image and count the number of people in it.', [image])
4 suggestion('How can I find the number of people in this image?', [image])
5 get_tool_for_task('Count the number of people in this image', image)
Code Execution Output (5.13s): ----- stdout -----

----- stderr -----

----- Error -----
Traceback (most recent call last):
File "/apps/llms/vision-agent/vision_agent/utils/execute.py", line 651, in exec_cell
self.nb_client.execute_cell(cell, len(self.nb.cells) - 1)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/jupyter_core/utils/init.py", line
165, in wrapped
return loop.run_until_complete(inner)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/asyncio/base_events.py", line 647, in
run_until_complete
return future.result()
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/nbclient/client.py", line 1062, in
async_execute_cell
await self._check_raise_for_error(cell, cell_index, exec_reply)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/nbclient/client.py", line 918, in
_check_raise_for_error
raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:

import os
import numpy as np
import cv2
from typing import *
from vision_agent.tools import *
from vision_agent.tools.planner_tools import claude35_vqa, suggestion, get_tool_for_task
from pillow_heif import register_heif_opener
register_heif_opener()
import matplotlib.pyplot as plt

image = load_image("OIP-C.jpg")
claude35_vqa('Please look at this image and count the number of people in it.', [image])
suggestion('How can I find the number of people in this image?', [image])
get_tool_for_task('Count the number of people in this image', image)

----- stdout -----
[claude35_vqa output]
There are six people in the image.
[end of claude35_vqa output]


AttributeError Traceback (most recent call last)
Cell In[1], line 15
13 claude35_vqa('Please look at this image and count the number of people in it.', [image])
14 suggestion('How can I find the number of people in this image?', [image])
---> 15 get_tool_for_task('Count the number of people in this image', image)

File /apps/llms/vision-agent/vision_agent/tools/planner_tools.py:388, in get_tool_for_task(task, images,
exclude_tools)
383 with (
384 tempfile.TemporaryDirectory() as tmpdirname,
385 CodeInterpreterFactory.new_instance() as code_interpreter,
386 ):
387 image_paths = []
--> 388 for k in images.keys():
389 for i, image in enumerate(images[k]):
390 image_path = f"{tmpdirname}/{k}_{i}.png"

AttributeError: 'numpy.ndarray' object has no attribute 'keys'
^CTraceback (most recent call last):
File "/apps/llms/vision-agent/generate.py", line 5, in
code_context = agent.generate_code(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_coder_v2.py", line 404, in generate_code
plan_context = self.planner.generate_plan(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_planner_v2.py", line 561, in generate_plan
updated_chat = maybe_run_code(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_planner_v2.py", line 301, in maybe_run_code
execution, obs, code = execute_code_action(
File "/apps/llms/vision-agent/vision_agent/agent/vision_agent_planner_v2.py", line 232, in execute_code_action
response = cast(str, model.chat([{"role": "user", "content": prompt}]))
File "/apps/llms/vision-agent/vision_agent/lmm/lmm.py", line 130, in chat
response = self.client.chat.completions.create(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_utils/_utils.py", line 279, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/resources/chat/completions/completions.py", line 879, in create
return self._post(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_base_client.py", line 1290, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_base_client.py", line 967, in request
return self._request(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/openai/_base_client.py", line 1003, in _request
response = self._client.send(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 926, in send
response = self._send_handling_auth(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 954, in _send_handling_auth
response = self._send_handling_redirects(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 991, in _send_handling_redirects
response = self._send_single_request(request)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_client.py", line 1027, in _send_single_request
response = transport.handle_request(request)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpx/_transports/default.py", line 236, in handle_request
resp = self._pool.handle_request(req)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 217, in _receive_event
data = self._network_stream.read(
File "/home/ubuntu/anaconda3/envs/vision-agent/lib/python3.9/site-packages/httpcore/_backends/sync.py", line 128, in read
return self._sock.recv(max_bytes)
KeyboardInterrupt

@Mrguanglei
Copy link
Author

@dillonalaird This is openLLM in the llm I modified

class OpenAILMM(LMM):
r"""An LMM class for the OpenAI LMMs."""

def __init__(
    self,
    model_name: str = "qwenVL2B",
    api_key: Optional[str] = None,
    max_tokens: int = 4096,
    json_mode: bool = False,
    image_size: int = 768,
    image_detail: str = "low",
    **kwargs: Any,
):
    if not api_key:
        self.client = OpenAI()
    else:
        self.client = OpenAI(api_key=api_key)

    self.client = OpenAI(base_url='http://0.0.0.0:8000/v1')
    self.model_name = model_name
    self.image_size = image_size
    self.image_detail = image_detail
    # o1 does not use max_tokens
    if "max_tokens" not in kwargs and not (
        model_name.startswith("o1") or model_name.startswith("o3")
    ):
        kwargs["max_tokens"] = max_tokens
    if json_mode:
        kwargs["response_format"] = {"type": "json_object"}
    self.kwargs = kwargs

def __call__(
    self,
    input: Union[str, Sequence[Message]],
    **kwargs: Any,
) -> Union[str, Iterator[Optional[str]]]:
    if isinstance(input, str):
        return self.generate(input, **kwargs)
    return self.chat(input, **kwargs)

def chat(
    self,
    chat: Sequence[Message],
    **kwargs: Any,
) -> Union[str, Iterator[Optional[str]]]:
    """Chat with the LMM model.

    Parameters:
        chat (Squence[Dict[str, str]]): A list of dictionaries containing the chat
            messages. The messages can be in the format:
            [{"role": "user", "content": "Hello!"}, ...]
            or if it contains media, it should be in the format:
            [{"role": "user", "content": "Hello!", "media": ["image1.jpg", ...]}, ...]
    """
    fixed_chat = []
    for c in chat:
        fixed_c = {"role": c["role"]}
        fixed_c["content"] = [{"type": "text", "text": c["content"]}]  # type: ignore
        if "media" in c:
            for media in c["media"]:
                resize = kwargs["resize"] if "resize" in kwargs else self.image_size
                image_detail = (
                    kwargs["image_detail"]
                    if "image_detail" in kwargs
                    else self.image_detail
                )
                encoded_media = encode_media(cast(str, media), resize=resize)

                fixed_c["content"].append(  # type: ignore
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": (
                                encoded_media
                                if encoded_media.startswith(("http", "https"))
                                or encoded_media.startswith("data:image/")
                                else f"data:image/png;base64,{encoded_media}"
                            ),
                            "detail": image_detail,
                        },
                    },
                )
        fixed_chat.append(fixed_c)

    # prefers kwargs from second dictionary over first
    tmp_kwargs = self.kwargs | kwargs
    tmp_kwargs = {}
    response = self.client.chat.completions.create(
        model=self.model_name, messages=fixed_chat, **tmp_kwargs  # type: ignore
    )
    if "stream" in tmp_kwargs and tmp_kwargs["stream"]:

        def f() -> Iterator[Optional[str]]:
            for chunk in response:
                chunk_message = chunk.choices[0].delta.content  # type: ignore
                yield chunk_message

        return f()
    else:
        return cast(str, response.choices[0].message.content)

def generate(
    self,
    prompt: str,
    media: Optional[Sequence[Union[str, Path]]] = None,
    **kwargs: Any,
) -> Union[str, Iterator[Optional[str]]]:
    message: List[Dict[str, Any]] = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
            ],
        }
    ]
    if media and len(media) > 0:
        for m in media:
            resize = kwargs["resize"] if "resize" in kwargs else None
            image_detail = (
                kwargs["image_detail"]
                if "image_detail" in kwargs
                else self.image_detail
            )
            encoded_media = encode_media(m, resize=resize)
            message[0]["content"].append(
                {
                    "type": "image_url",
                    "image_url": {
                        "url": (
                            encoded_media
                            if encoded_media.startswith(("http", "https"))
                            or encoded_media.startswith("data:image/")
                            else f"data:image/png;base64,{encoded_media}"
                        ),
                        "detail": image_detail,
                    },
                },
            )

    # prefers kwargs from second dictionary over first
    tmp_kwargs = self.kwargs | kwargs
    response = self.client.chat.completions.create(
        model=self.model_name, messages=message, **tmp_kwargs  # type: ignore
    )
    if "stream" in tmp_kwargs and tmp_kwargs["stream"]:

        def f() -> Iterator[Optional[str]]:
            for chunk in response:
                chunk_message = chunk.choices[0].delta.content  # type: ignore
                yield chunk_message

        return f()
    else:
        return cast(str, response.choices[0].message.content)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants