-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k3s cluster and nat gateway #454
Comments
Hi, do you see the server(s) attached to the |
Please SSH into one of the servers attached to the network and run SUBNET="10.13.0.0/16"
SUBNET_PREFIX=$(echo $SUBNET | cut -d'/' -f1 | sed 's/\./\\./g' | sed 's/0$//')
echo $SUBNET_PREFIX Does it return the correct prefix? Then run ip -4 addr show | grep -q "inet $SUBNET_PREFIX" What does it return? |
My gut feeling is that there is something wrong with your Attach a temp server to the same network, then SSH into it and with /bin/sh, not bash (since Cloud Init script must work in regular sh shell) try running your post create commands and see if all of them work just fine. |
What do you get with |
Can you try |
I changed the subnet and the problem disappeared(I don't know if it has something to do with the subnets I've split.), of course I haven't included post_create_commands yet, but I get a situation like the following, is this coming from ssh?
|
Yeah that may be a problem with SSH, perhaps with the key. Can you try enabling the agent? |
Are you talking about the |
Another possibility may be some issue with Debian due to some recent changes made to address the new way of handling custom ssh ports in newer versions of Ubuntu. Can you try with Ubuntu but with the same configuration to see if that's the problem? |
It's not a warning :) It's just telling you that some ponds were probably pending due to lack of resources so the cluster had to scale up. Did it add a new node? |
What do you see in the autoscaler's logs? |
I managed to run it properly, I think I will write a small article on the subject to my blog address. Thank you very much for your help. I just want to ask a very small question
Even if I configure it as such, why would it be receiving ip over 10.14.1.0/24. |
Can you share the solution for posterity? Can you also clarify the question? :p |
autoscaler stopped working even though I made no changes, 1- turning on the machine I see from the hetzner cloud panel There's nothing after that, I'm tied up because I don't have ssh access. I can't see the logs. It's like it's not doing calm installations. clustera doesn't even include the node. The machine has only private ip address behind NAT gateway. Routing is full there is no problem there either. I organised it according to the documentation. How can I debug this situation? |
I finally managed to solve the problem, due to the lack of public ip, the installations started to be incomplete due to both route and dns problems. I don't know how this happened, but I solved the situation by manually intervening in the cloud-init config. On machines with NAT gateway, the route and dns configuration needs to be run before all processes. Even if we add post_create_commands to the top, it runs at the bottom. hetzner-k3s/templates/cloud_init.yaml Line 35 in 60b862b
|
I am sorry, but I am not following. Can you clarify what exactly fixed your problem and what changes you needed to make to hetzner-k3s to to solve it? I could make a new release with your fixes or you could make a PR if you are up to it. :) |
In a k8s structure where there is no public network, the following should be implemented. 1-network settings should be made and nat gateway should be configured. # Add network interface to route nat gateway
- |
cat <<'EOF' >> /etc/systemd/network/10-enp7s0.network
[Match]
Name=enp7s0
[Network]
DHCP=yes
Gateway=10.144.0.1
EOF
# reload networkd
- systemctl restart systemd-networkd
# Configure systemd-resolved
- systemctl enable systemd-resolved
- systemctl start systemd-resolved
# Set DNS
- |
cat <<'EOF' >> /etc/systemd/resolved.conf
[Resolve]
Cache=yes
DNS=185.12.64.1 185.12.64.2
FallbackDNS=1.1.1.1
EOF
- systemctl daemon-reload
- systemctl restart systemd-resolved 2- packages should not be installed with packages: command (Packages should be included in the system immediately after cloud-init network settings.) so the cloud-init file has to be like this. If ipv4 and ipv6 are completely off #cloud-config
preserve_hostname: true
write_files:
- path: /etc/systemd/system/ssh.socket.d/listen.conf
content: |
[Socket]
ListenStream=
ListenStream=22
- path: /etc/configure-ssh.sh
permissions: '0755'
content: |
if systemctl is-active ssh.socket > /dev/null 2>&1
then
# OpenSSH is using socket activation
systemctl disable ssh
systemctl daemon-reload
systemctl restart ssh.socket
systemctl stop ssh
else
# OpenSSH is not using socket activation
sed -i 's/^#*Port .*/Port 22/' /etc/ssh/sshd_config
fi
systemctl restart ssh
runcmd:
- hostnamectl set-hostname $(curl http://169.254.169.254/hetzner/v1/metadata/hostname)
- update-crypto-policies --set DEFAULT:SHA1 || true
- /etc/configure-ssh.sh
- |
cat <<'EOF' >> /etc/systemd/network/10-enp7s0.network
[Match]
Name=enp7s0
[Network]
DHCP=yes
Gateway=10.144.0.1
EOF
# reload networkd
- systemctl restart systemd-networkd
# Configure systemd-resolved
- systemctl enable systemd-resolved
- systemctl start systemd-resolved
# Set DNS
- |
cat <<'EOF' >> /etc/systemd/resolved.conf
[Resolve]
Cache=yes
DNS=185.12.64.1 185.12.64.2
FallbackDNS=1.1.1.1
EOF
- systemctl daemon-reload
- systemctl restart systemd-resolved
- apt update & apt-get install -y ifupdown net-tools
- echo "nameserver 8.8.8.8" > /etc/k8s-resolv.conf
- |
touch /etc/initialized
HOSTNAME=$(hostname -f)
PUBLIC_IP=$(hostname -I | awk '{print $1}')
if [ "true" = "true" ]; then
echo "Using private network " > /var/log/hetzner-k3s.log
SUBNET="10.144.1.0/24"
SUBNET_PREFIX=$(echo $SUBNET | cut -d'/' -f1 | sed 's/\./\\./g' | sed 's/0$//')
MAX_ATTEMPTS=30
DELAY=10
UP="false"
for i in $(seq 1 $MAX_ATTEMPTS); do
if ip -4 addr show | grep -q "inet $SUBNET_PREFIX"; then
echo "Private network IP in subnet $SUBNET is up" 2>&1 | tee -a /var/log/hetzner-k3s.log
UP="true"
break
fi
echo "Waiting for private network IP in subnet $SUBNET to be available... (Attempt $i/$MAX_ATTEMPTS)" 2>&1 | tee -a /var/log/hetzner-k3s.log
sleep $DELAY
done
if [ "$UP" = "false" ]; then
echo "Timeout waiting for private network IP in subnet $SUBNET" 2>&1 | tee -a /var/log/hetzner-k3s.log
fi
PRIVATE_IP=$(ip route get 10.144.1.0 | awk -F"src " 'NR==1{split($2,a," ");print a[1]}')
NETWORK_INTERFACE=" --flannel-iface=$(ip route get 10.144.1.0 | awk -F"dev " 'NR==1{split($2,a," ");print a[1]}') "
else
echo "Using public network " > /var/log/hetzner-k3s.log
PRIVATE_IP="${PUBLIC_IP}"
NETWORK_INTERFACE=" "
fi
mkdir -p /etc/rancher/k3s
cat > /etc/rancher/k3s/registries.yaml <<EOF
mirrors:
"*":
EOF
curl -sfL https://get.k3s.io | K3S_TOKEN="REDACTED" INSTALL_K3S_VERSION="v1.31.1+k3s1" K3S_URL=https://10.144.1.16:6443 INSTALL_K3S_EXEC="agent \
--node-name=$HOSTNAME --kubelet-arg "cloud-provider=external" --kubelet-arg "resolv-conf=/etc/k8s-resolv.conf" \
--node-ip=$PRIVATE_IP \
--node-external-ip=$PUBLIC_IP \
$NETWORK_INTERFACE " sh -
echo true > /etc/initialized Unfortunately, I cannot support the project because I do not know the software language in which the project is developed :) |
Thanks for clarifying! I see what you mean now. I will do some testing and see if I can release some changes that might help with this kind of setup in the next release. |
I would like to confirm whether the solution functions correctly when public IP addresses are completely disabled. While the process is slow, taking around 6-7 minutes to create a small cluster, it still works as expected. I tested this without modifying the cloud-init configuration, using only the post-commands. |
In the cloud-init configuration, I moved the nat-gateway rules to the front and the download processes to the end, so that the machine can run without further delay. I use it in a project where I have 132 cloud servers. With such an arrangement, I reduced the machine startup process to 3 minutes. I am closing this issuance to avoid misunderstanding. |
Hi @mertcangokgoz, I reopened this to remind myself to work on it. I plan to make some adjustments in an upcoming release to simplify the setup process for these scenarios. |
I would like to make an addition, if all public ip is removed and nat gateway is used The Otherwise, it randomly disconnects the network connection and node access is interrupted. For your information. |
Hi @mertcangokgoz, I tried a setup with the public interface disabled, and it worked well using the configuration outlined in https://github.com/vitobotta/hetzner-k3s/blob/main/docs/Private_clusters_with_public_network_interface_disabled.md. I didn't need anything else, at least not with Debian. I haven't tested it on other operating systems yet. Do you have a chance to try this configuration and see if it resolves your issues? |
In the first installation stages, yes, according to this document, the machine can be added to the cluster and run, albeit late. However, if speed is important, the method I mentioned is necessary, so maybe an addition can be made here. 9 minutes is a very long time for the node to run, if the change I made is applied, the node is included in the cluster in 3-4 minutes. |
Gotcha. Will see to address it in an upcoming release. |
I am currently using nat gateway in my project, I need k3s and I want to communicate my cluster only with private ip without any public ip address. I am using debian-12 image in the cluster.
As a result of this configuration, I expect the machines to go to the internet and at the same time the pods to stand up. However, during the installation, it makes an output like the following, I think the installation is not completed in a healthy way.
The text was updated successfully, but these errors were encountered: