Skip to content

Commit

Permalink
Limit the number of retries on the server process
Browse files Browse the repository at this point in the history
  • Loading branch information
jakep-allenai committed Dec 2, 2024
1 parent b3ca86a commit 35502bc
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions pdelfin/beakerpipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -509,10 +509,17 @@ async def timeout_task():


async def sglang_server_host(args, semaphore):
while True:
MAX_RETRIES = 5
retry = 0

while retry < MAX_RETRIES:
await sglang_server_task(args, semaphore)
logger.warning("SGLang server task ended")
retry += 1

logger.error(f"Ended up restarting the sglang server more than {retry} times, cancelling")
sys.exit(1)


async def sglang_server_ready():
max_attempts = 300
Expand Down Expand Up @@ -883,7 +890,6 @@ async def main():
asyncio.run(main())

# TODO
# - Fix broken process pool, maybe we can just do better with a "spawn"?
# - Figure out simple repro case for new sglang livelock case with indexerrors
# - It seems another case of deadlocks is when many requests are sent/pending to sglang, ex. 3k+ or 4k+ requests, probably hitting some internal limit
# - aiohttp repro and bug report
Expand Down

0 comments on commit 35502bc

Please sign in to comment.