making converter additionally support deepseek-coder dense hf model #167

leo-liuzy · 2025-02-17T19:40:26Z

Changes made:

adding support for converting back/forth for deepseek-coder dense model on top of existing PR.
restructure RoPEScalingConfig to support linear scaling; previously, the object only contains counterparts of huggingface's llama3 rope scaling, whereas deepseek-coder uses linear scaling.

@dirkgr

add deepseek1b support

…kpoint

add two directions converter (hf->olmo, olmo->hf)

clean imports

replace hf to olmo conveter

dirkgr · 2025-02-19T22:31:15Z

src/examples/huggingface/convert_checkpoint_from_hf.py

-# HF_MODEL = "meta-llama/Llama-3.2-1B"
-# HF_MODEL = "meta-llama/Llama-3.2-8B"
+HF_MODEL = f"{os.environ['SHARE_RES_DIR']}/models/deepseek/deepseek-coder-1.3b-base"
+# HF_MODEL = "/home/zliu/shared_resources/models/llama3/hf/Llama-3.2-1B"


Please don't put your private paths into the code base.

dirkgr · 2025-02-19T22:32:40Z

src/examples/huggingface/convert_checkpoint_from_hf.py

+# TOKENIZER_CONFIG = TokenizerConfig.from_hf(HF_MODEL)
+TOKENIZER_CONFIG = TokenizerConfig.from_hf("deepseek-ai/deepseek-coder-1.3b-base")
+# TOKENIZER_CONFIG = TokenizerConfig.from_hf("meta-llama/Llama-3.2-1B")


Same here. This is one-off stuff that breaks everybody else's workflow. Imagine someone wants to use this tool that has zero context about this part of the work, and they just want to convert some checkpoint.

dirkgr · 2025-02-19T22:33:03Z

src/examples/huggingface/convert_checkpoint_from_hf.py

 MODEL_CONFIG: TransformerConfig
-if HF_MODEL == "meta-llama/Llama-3.2-1B":
+if "Llama-3.2-1B" in HF_MODEL:


Do we also need this for bigger Llama models?

I change it so that it works with local cache dir. We are encountering some additional weird bug related to conversion; and requires some help from you. We will submit a separate issue regarding that. Do you prefer we incorporate solution to that issue into this PR or should I separate that from this current PR?

Many small PRs is always better, if possible.

src/olmo_core/nn/transformer/config.py

# Conflicts: # src/examples/huggingface/convert_checkpoint_to_hf.py

dirkgr · 2025-02-19T23:10:24Z

@lingchensanwen , I merged current main into this. There were a fair number of conflicts, and I'm not sure I did it right. Can you check?

yw23374 and others added 11 commits February 13, 2025 11:51

add deepseek1b support

c7c7d7a

Merge pull request #1 from lingchensanwen/yating-dev

f7f34f0

add deepseek1b support

add a linear rope scaling to support conversion from deepseek hf chec…

1371b61

…kpoint

add two directions converter (hf->olmo, olmo->hf)

0979f01

Merge pull request #2 from lingchensanwen/yating-dev

f8dd383

add two directions converter (hf->olmo, olmo->hf)

clean imports

5ef600e

Merge pull request #3 from lingchensanwen/yating-dev

3998fb2

clean imports

replace hf to olmo conveter

6aeed33

Merge pull request #4 from lingchensanwen/yating-dev

64868d6

replace hf to olmo conveter

making sure converter support ds-coder and llama3

76c250f

rm unneeded files

2876384

dirkgr requested changes Feb 19, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into lingchensanwen/main

dec0c46

# Conflicts: # src/examples/huggingface/convert_checkpoint_to_hf.py

Merge branch 'main' into main

29b18c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

making converter additionally support deepseek-coder dense hf model #167

making converter additionally support deepseek-coder dense hf model #167

leo-liuzy commented Feb 17, 2025

dirkgr Feb 19, 2025

dirkgr Feb 19, 2025

dirkgr Feb 19, 2025

leo-liuzy Feb 19, 2025

dirkgr Feb 19, 2025

dirkgr commented Feb 19, 2025

making converter additionally support deepseek-coder dense hf model #167

Are you sure you want to change the base?

making converter additionally support deepseek-coder dense hf model #167

Conversation

leo-liuzy commented Feb 17, 2025

dirkgr Feb 19, 2025

Choose a reason for hiding this comment

dirkgr Feb 19, 2025

Choose a reason for hiding this comment

dirkgr Feb 19, 2025

Choose a reason for hiding this comment

leo-liuzy Feb 19, 2025

Choose a reason for hiding this comment

dirkgr Feb 19, 2025

Choose a reason for hiding this comment

dirkgr commented Feb 19, 2025