Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test] there is no enough k8s resources in buildkite to run job controller #4824

Open
aylei opened this issue Feb 26, 2025 · 0 comments
Open

Comments

@aylei
Copy link
Collaborator

aylei commented Feb 26, 2025

https://buildkite.com/skypilot-1/smoke-tests/builds/219#019539c1-4d63-48f5-8116-24e597fe9d7e

I 02-24 21:00:25 optimizer.py:1321] No resource satisfying Kubernetes(cpus=4+, mem=8x, disk_size=50) on Kubernetes.
--
  | I 02-24 21:00:25 optimizer.py:1331] Try specifying a different CPU count, or add "+" to the end of the CPU count to allow for larger instances.
  | I 02-24 21:00:25 optimizer.py:1335] Try specifying a different memory size, or add "+" to the end of the memory size to allow for larger instances.
  | D 02-24 21:00:25 optimizer.py:301] #### Task<name=t-jobs-retry-logs-cc>(run='source ~/skypilot-ru...')
  | D 02-24 21:00:25 optimizer.py:301]   resources: Kubernetes(cpus=4+, mem=8x, disk_size=50) ####
  | D 02-24 21:00:25 optimizer.py:316] Defaulting the task's estimated time to 1 hour.
  | D 02-24 21:00:25 skypilot_config.py:155] Using config path: /home/buildkite/.sky/config.yaml
  | D 02-24 21:00:25 skypilot_config.py:160] Config loaded:
  | D 02-24 21:00:25 skypilot_config.py:160] {'azure': {'storage_account': 'buildkitestorage'}}
  | D 02-24 21:00:25 skypilot_config.py:172] Config syntax check passed.
  | D 02-24 21:00:25 sdk.py:1422] Got request with error: sky.jobs.launch
  | E 02-24 21:00:25 sdk.py:1434] === Traceback on SkyPilot API Server ===
  | E 02-24 21:00:25 sdk.py:1434] Traceback (most recent call last):
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/server/requests/executor.py", line 258, in _request_execution_wrapper
  | E 02-24 21:00:25 sdk.py:1434]     return_value = func(**request_body.to_kwargs())
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/utils/common_utils.py", line 454, in _record
  | E 02-24 21:00:25 sdk.py:1434]     return f(*args, **kwargs)
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/utils/common_utils.py", line 454, in _record
  | E 02-24 21:00:25 sdk.py:1434]     return f(*args, **kwargs)
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/jobs/server/core.py", line 185, in launch
  | E 02-24 21:00:25 sdk.py:1434]     return execution.launch(task=controller_task,
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/utils/common_utils.py", line 454, in _record
  | E 02-24 21:00:25 sdk.py:1434]     return f(*args, **kwargs)
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/utils/common_utils.py", line 454, in _record
  | E 02-24 21:00:25 sdk.py:1434]     return f(*args, **kwargs)
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/execution.py", line 532, in launch
  | E 02-24 21:00:25 sdk.py:1434]     return _execute(
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/execution.py", line 291, in _execute
  | E 02-24 21:00:25 sdk.py:1434]     dag = optimizer.Optimizer.optimize(dag,
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/utils/common_utils.py", line 454, in _record
  | E 02-24 21:00:25 sdk.py:1434]     return f(*args, **kwargs)
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/utils/common_utils.py", line 434, in _record
  | E 02-24 21:00:25 sdk.py:1434]     return f(*args, **kwargs)
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/optimizer.py", line 130, in optimize
  | E 02-24 21:00:25 sdk.py:1434]     unused_best_plan = Optimizer._optimize_dag(
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/optimizer.py", line 1061, in _optimize_dag
  | E 02-24 21:00:25 sdk.py:1434]     Optimizer._estimate_nodes_cost_or_time(local_topo_order,
  | E 02-24 21:00:25 sdk.py:1434]   File "/home/buildkite/.buildkite-agent/builds/kubernetes/d1a31de9756d-1/skypilot-1/smoke-tests/sky/optimizer.py", line 404, in _estimate_nodes_cost_or_time
  | E 02-24 21:00:25 sdk.py:1434]     raise exceptions.ResourcesUnavailableError(error_msg)
  | E 02-24 21:00:25 sdk.py:1434] sky.exceptions.ResourcesUnavailableError: Kubernetes cluster does not contain any instances satisfying the request: 1x Kubernetes(cpus=4+, mem=8x, disk_size=50).
  | E 02-24 21:00:25 sdk.py:1434] To fix: relax or change the resource requirements.
  | E 02-24 21:00:25 sdk.py:1434]
  | E 02-24 21:00:25 sdk.py:1434] Hint: sky show-gpus to list available accelerators.
  | E 02-24 21:00:25 sdk.py:1434]       sky check to check the enabled clouds.
  | E 02-24 21:00:25 sdk.py:1434]
  | D 02-24 21:00:25 sdk.py:77] To stream request logs: sky api logs 694ed670-fdd7-4ea0-b0e3-815429218d23
  | sky.exceptions.ResourcesUnavailableError: Kubernetes cluster does not contain any instances satisfying the request: 1x Kubernetes(cpus=4+, mem=8x, disk_size=50).
  | To fix: relax or change the resource requirements.

#4813 might address this, but looks like we still have cases that requires to run in normal resource level

@pytest.mark.no_lowest_resource_mode # Do not run this test in lowest resource mode

@aylei aylei changed the title [Test] there is enough k8s resources in buildkite to run job controller [Test] there is no enough k8s resources in buildkite to run job controller Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant