-
Notifications
You must be signed in to change notification settings - Fork 830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[No Merge Until Feb 25] - FastRTC Release Post #2698
[No Merge Until Feb 25] - FastRTC Release Post #2698
Conversation
fastrtc.md
Outdated
|
||
# FastRTC: The Real-Time Communication Library for Python | ||
|
||
In the last six months, the AI audio space has exploded with model releases (for both open and closed source models) and investor and developer interest. To name a few milestones: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the last six months, the AI audio space has exploded with model releases (for both open and closed source models) and investor and developer interest. To name a few milestones: | |
In the last few months, many new real-time speech models have been released and entire companies have been founded (around both open and closed source models). To name a few milestones: |
fastrtc.md
Outdated
In the last six months, the AI audio space has exploded with model releases (for both open and closed source models) and investor and developer interest. To name a few milestones: | ||
|
||
- OpenAI and Google released their live multimodal APIs for ChatGPT and Gemini. OpenAI even went so far as to release a 1-800-ChatGPT phone number! | ||
- Kyutai released Moshi, a fully open-source audio-to-audio LLM. Alibaba released Qwen2-Audio and Fixie.ai released Ultravox - two open-source LLMs that natively understand audio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to models on Hub
fastrtc.md
Outdated
|
||
- OpenAI and Google released their live multimodal APIs for ChatGPT and Gemini. OpenAI even went so far as to release a 1-800-ChatGPT phone number! | ||
- Kyutai released Moshi, a fully open-source audio-to-audio LLM. Alibaba released Qwen2-Audio and Fixie.ai released Ultravox - two open-source LLMs that natively understand audio. | ||
- EleveLabs raised $180m in their Series C. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- EleveLabs raised $180m in their Series C. | |
- ElevenLabs <a href="https://elevenlabs.io/blog/series-c" target="_blank">raised $180m in their Series C</a>. |
fastrtc.md
Outdated
- Kyutai released Moshi, a fully open-source audio-to-audio LLM. Alibaba released Qwen2-Audio and Fixie.ai released Ultravox - two open-source LLMs that natively understand audio. | ||
- EleveLabs raised $180m in their Series C. | ||
|
||
Despite the explosion in the model and funding side, it's still difficult to build real-time AI applications, especially in Python. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the explosion in the model and funding side, it's still difficult to build real-time AI applications, especially in Python. | |
Despite the explosion in the model and funding side, it's still difficult to build real-time AI applications that stream audio or video, especially in Python. |
fastrtc.md
Outdated
|
||
Despite the explosion in the model and funding side, it's still difficult to build real-time AI applications, especially in Python. | ||
|
||
- ML engineers may not have experience with the technologies needed to build real-time applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ML engineers may not have experience with the technologies needed to build real-time applications. | |
- ML engineers may not have experience with the technologies needed to build real-time applications, such as WebRTC. |
fastrtc.md
Outdated
- ML engineers may not have experience with the technologies needed to build real-time applications. | ||
- Even code assistant tools like Cursor and Copilot struggle to write python code that supports real-time audio/video applications. I know from experience! | ||
|
||
That's why we're excited to announce `FastRTC`, the real-time communication library for Python. The library is designed to make it super easy to build real-time audio and video AI applications entirely in Python! Let's dive in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would show an example code snippet here to illustrate how simple FastRTC is to use, before the Core Features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it'll be more natural to talk about the Stream
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed mentions of Stream
and merged the core features with the introduction. I don't want them to go after code because they motivate what the code will show about fastrtc.
fastrtc.md
Outdated
|
||
Let's break it down: | ||
- The `ReplyOnPause` will handle the voice detection and turn taking for you. You just have to worry about the logic for responding to the user. Any generator that returns a tuple of audio, (represented as `(sample_rate, audio_data)`) will work. | ||
- The `Stream` class will build a production-ready Gradio UI for you to quickly test out your stream (or deploy to prod!). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "deploy to prod" part is not clear to me. I think I would add a third bullet point, something like, "once you have finished prototyping, you can deploy your Stream as a production-ready FastAPI app in a single line of code"
fastrtc.md
Outdated
stream.ui.launch() | ||
``` | ||
|
||
We're using the SambaNova API since it's fast. But you can use any LLM/text-to-speech/speech-to-text API. Bring the tools you love - `FastRTC` just handles the real-time communication layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stt_model = get_stt_model()
tts_model = get_tts_model()
are doing some heavy lifting. I would mention what they are doing and that you can replace them with your own STT/TTS models are skip them altogether if you use a voice-to-voice model/api
Left a few comments @freddyaboulton but otherwise this is looking great ⚡! |
4270dc7
to
43ea95e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yaml / structure look fine to me, will go through the content later :)
fastrtc.md
Outdated
Despite the explosion on the model and funding side, it's still difficult to build real-time AI applications that stream audio and video, especially in Python. | ||
|
||
- ML engineers may not have experience with the technologies needed to build real-time applications, such as WebRTC. | ||
- Even code assistant tools like Cursor and Copilot struggle to write python code that supports real-time audio/video applications. I know from experience! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Even code assistant tools like Cursor and Copilot struggle to write python code that supports real-time audio/video applications. I know from experience! | |
- Even code assistant tools like Cursor and Copilot struggle to write Python code that supports real-time audio/video applications. I know from experience! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful, post looks great to me!
Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.
Preparing the Article
You're not quite done yet, though. Please make sure to follow this process (as documented here):
md
file. You can also specifyguest
ororg
for the authors.Here is an example of a complete PR: #2382
Getting a Review
Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.
Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.