Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] support ep for deepseek v3 #6185

Merged
merged 6 commits into from
Feb 11, 2025
Merged

Conversation

ver217
Copy link
Member

@ver217 ver217 commented Feb 6, 2025

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs
  • I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@ver217 ver217 requested a review from a team as a code owner February 6, 2025 08:53
@ver217 ver217 force-pushed the feature/deepseek-v3 branch from ab91a06 to 3f84584 Compare February 6, 2025 09:22
@ver217 ver217 merged commit 2b415e5 into hpcaitech:main Feb 11, 2025
6 checks passed
@ver217 ver217 deleted the feature/deepseek-v3 branch February 11, 2025 08:11
@xs1997zju
Copy link

@ver217 great job, 想问下,v3-671B, bf16-全量训练,这边用了几机的配置, 能训的最大长度能到几k呢?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@ver217 great job, I would like to ask, v3-671B, bf16-full training, how many machines are used here, how many k can the maximum length of training be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants