Why Is My AWS EC2 Instance Unreachable After Rebooting?

You hit reboot on your AWS EC2 instance, expecting it to come back online in a minute or two. Instead, your SSH session hangs. Your website goes dark.

The dashboard shows a failed status check. Your heart sinks a little. Sound familiar? This is one of the most common headaches AWS users face, and the good news is that almost every cause has a clear fix.

An unreachable instance after a reboot rarely means your data is gone. More often, a small configuration change, a network setting, or a boot error is blocking access. In this guide, you will learn exactly why this happens and how to fix it step by step.

Key Takeaways

  • A reboot keeps your IP, but a stop and start does not. A true reboot preserves your public IP, private IP, and instance store data. If you used stop and start instead, your dynamic public IP changes, which often explains why your old address suddenly stops working.
  • Security groups and network ACLs are the top suspects for connection timeouts. A simple rule change or a missing inbound port can lock you out completely, even though the instance itself is running fine inside AWS.
  • Status checks tell you where the problem lives. A system status check failure points to AWS hardware, while an instance status check failure points to your operating system, network config, or a broken boot file.
  • A bad /etc/fstab entry is a classic reboot killer. When a mount fails at boot, Linux can hang and refuse all SSH connections. This single mistake causes a huge share of post reboot lockouts.
  • The root volume rescue method almost always works as a last resort. You detach the broken disk, attach it to a healthy instance, fix the file, and reattach it. No data loss required.
  • Prevention beats repair. Elastic IPs, the nofail mount option, and CloudWatch alarms stop most of these problems before they start.

Understanding What Actually Happens During a Reboot

Many problems begin with a misunderstanding of the word reboot. A reboot and a stop and start are not the same thing in AWS.

When you reboot an instance, AWS keeps it on the same physical host. Your public IP, private IP, Elastic IP, and any instance store data all stay the same. The machine simply restarts its operating system.

A stop and start works differently. AWS moves your instance to a new host. Your dynamic public IP address changes, and instance store data disappears. This single difference explains a large number of “unreachable” reports.

People stop and start an instance, then try the old IP address and assume the machine is broken. Always confirm which action you performed first. This one check often solves the mystery in seconds.

Checking Your EC2 Instance Status Checks First

Your first stop should be the EC2 console. Open the Instances page and look at the Status Checks column. AWS runs two separate checks, and knowing the difference saves you hours.

The system status check monitors the underlying AWS hardware and network. The instance status check monitors your operating system and configuration.

If the system status check fails, the problem is on the AWS side. You can often fix it with a stop and start, which moves your instance to healthy hardware. If the instance status check fails, the problem lives inside your instance.

This points to a boot error, a full disk, a network misconfiguration, or a bad fstab entry. Read the check that failed carefully. It is the single best clue you have for choosing the right repair path next.

Reviewing Security Groups and Inbound Rules

A timeout error almost always points to a network block, not a broken machine. Security groups act like a firewall around your instance. If a rule changed, or if your IP address shifted, the connection silently fails. Open the EC2 console, select your instance, and check the attached security group.

Make sure an inbound rule allows SSH on port 22 for Linux or RDP on port 3389 for Windows. Confirm the source matches your current IP address. Home and office IP addresses change often, so a rule that worked yesterday may block you today. If you host a website, also confirm ports 80 and 443 are open.

Pros: Security group edits apply instantly and require no reboot. Cons: Overly open rules like 0.0.0.0/0 create a serious security risk, so always scope rules to trusted IP ranges.

Confirming Your Elastic IP and Public IP Address

This cause catches almost everyone at least once. A dynamic public IP is temporary by design. If you performed a stop and start instead of a reboot, AWS assigned a brand new public IP. Your old address now points to nothing, which produces a timeout that looks like a dead instance.

Open the EC2 console and check the current public IP under the instance details. Compare it against the address you are trying to reach. If they differ, simply connect using the new IP. To stop this from happening again, allocate an Elastic IP and associate it with your instance.

Pros: An Elastic IP stays the same across stops, starts, and reboots, giving you a permanent address. Cons: AWS charges a small fee for an Elastic IP that is allocated but not attached to a running instance, so release any you no longer use.

Diagnosing a Broken /etc/fstab File

This is one of the most common reasons a Linux instance refuses to boot after a reboot. The fstab file tells Linux which drives to mount at startup. If you added a volume or a network file system entry and made a typo, the boot process can hang while it waits for a drive that never appears.

The symptom is classic. The instance shows a running state, but SSH returns “connection refused” or simply times out. The operating system never finishes booting, so no service responds. EFS and EBS mounts are frequent culprits here.

The fix is to add the nofail option to your fstab entries, and _netdev for network mounts. These options tell Linux to continue booting even if a mount fails. You will need the rescue method below to edit the file if you are already locked out.

Using the EC2 Serial Console to See the Boot Process

When SSH fails completely, the EC2 Serial Console becomes your window into the machine. It connects you directly to the instance serial port, so you can watch boot messages and even log in without a network connection. This tool is incredibly useful for diagnosing fstab errors, kernel panics, and services that fail to start.

First, enable the Serial Console at the account level in the EC2 settings. Confirm your IAM permissions allow serial console access. Then select your instance, choose Connect, and open the EC2 Serial Console tab. You will need a password set for a local user to log in.

Pros: You can fix many problems live without detaching any volumes, which saves a lot of time. Cons: It only works on Nitro based instances, and you must set up access and a user password in advance.

Reading the Instance System Log for Clues

Before you take any drastic action, read the system log. AWS captures the boot output of your instance, and this log often names the exact problem in plain text. You do not need access to the machine to read it, which makes it perfect for a locked out instance.

Select your instance in the console, choose Actions, then Monitor and troubleshoot, and click Get system log. Scroll through the output and look for repeated errors. Watch for fstab mount failures, kernel panics, out of memory messages, and filesystem check prompts. These lines tell you whether the issue is a disk, the OS, or memory.

Pros: The log is free, instant, and requires no special setup, making it the ideal first diagnostic step. Cons: The output can be long and technical, and it sometimes lags by a minute or two after a reboot, so refresh if it looks empty.

Fixing a Full Disk That Blocks the Boot

A full root volume quietly breaks many instances. When the disk has no free space, services cannot write logs or temporary files, and SSH daemons often fail to start. The reboot exposes a problem that was building up for days. The system log usually shows “No space left on device” errors.

If you can still reach the Serial Console, log in and run df -h to check disk usage. Clear old logs in /var/log, remove unused packages, and delete large temporary files. If you are fully locked out, use the root volume rescue method to clean the disk from a healthy instance.

For a long term fix, increase the size of your EBS volume in the console and then expand the filesystem. Pros: A larger volume removes the bottleneck for good. Cons: Larger EBS volumes cost more per month, so size them to real need rather than guessing high.

Handling Memory Exhaustion and CPU Credit Problems

Sometimes the reboot is not the cause but the symptom. An instance that ran out of memory may have triggered an automatic recovery or a forced restart. After it comes back, the same workload can exhaust memory again, and SSH becomes unreachable within minutes. The system log often shows out of memory killer messages.

Burstable instances add another twist. T series instances run on CPU credits, and when those credits hit zero, the machine slows to a crawl and stops responding. A reboot gives a brief reset, but the problem returns under load.

The fix is to stop the instance and change to a larger instance type or an instance family with more memory. You can also enable a swap file to absorb memory spikes. Pros: A right sized instance ends the cycle. Cons: Bigger instances cost more, so monitor usage with CloudWatch before you upgrade.

Rescuing Your Instance With the Root Volume Method

When nothing else works, this method nearly always saves the day. You move the broken disk to a healthy machine, fix the problem, and move it back. It sounds intense, but it follows a clear sequence and protects your data the whole time.

First, stop the broken instance and note its Availability Zone. Detach the root volume in the EBS section. Launch or pick a healthy rescue instance in the same Availability Zone. Attach the broken volume to it as a secondary disk, for example /dev/sdf.

Now SSH into the rescue instance, mount the secondary volume, and edit the broken file, such as /etc/fstab. Fix the typo or add the nofail option, then unmount it. Detach the volume, reattach it to the original instance as the root device, usually /dev/xvda or /dev/sda1, and start the instance.

Pros: It works for almost any boot or config failure with zero data loss. Cons: It involves several careful steps, and a mistake with the Availability Zone or device name causes delays.

Checking Network ACLs and Subnet Route Tables

If your security group looks correct but you still cannot connect, look one layer deeper. Network ACLs control traffic at the subnet level, and they work differently from security groups. They are stateless, which means you must allow both inbound and outbound traffic explicitly.

Open the VPC console and review the Network ACL attached to your instance subnet. Confirm inbound rules allow your port and that outbound rules allow the ephemeral port range, roughly 1024 to 65535. A missing return rule blocks the response even when the request gets through.

Also check the route table for the subnet. A public instance needs a route to an Internet Gateway for the 0.0.0.0/0 destination.

If someone removed that route, the instance becomes unreachable from the internet. Pros: These checks catch problems that security groups hide. Cons: ACL rules are easy to misorder, since AWS evaluates them by rule number from lowest to highest.

Verifying Auto Recovery and Automatic Reboots

Sometimes your instance rebooted on its own, and you never pressed a button. AWS uses automatic instance recovery to handle hardware failures, and it may restart your instance after a problem on the underlying host. This can surprise you and make the instance look unstable.

Check the CloudWatch alarms and the instance activity in the EC2 console. Look for recovery actions or scheduled events. AWS sometimes schedules maintenance that forces a reboot, and these events appear in the console and your email. Knowing this prevents you from chasing a problem that AWS already handled.

If recovery keeps triggering, the root cause is usually a failing host or a recurring OS issue. Use the system log to find the real trigger. Pros: Auto recovery keeps your instance available without manual work. Cons: Repeated recoveries point to a deeper fault that you still need to fix at the source.

How to Prevent Reboot Lockouts in the Future

Prevention is far easier than recovery, and a few habits remove most of these problems. Always test configuration changes before you reboot. If you edit fstab, run mount -a first to confirm there are no errors. This single command catches the most common boot killer before it strikes.

Attach an Elastic IP so your address never changes across stops and starts. Add the nofail option to every fstab mount, and _netdev for network drives. Set up CloudWatch alarms for CPU, memory, and status checks so you get warned before a small issue becomes a lockout.

Take regular EBS snapshots or full AMI backups so you always have a clean restore point. Pros: These steps cost little time and save hours of stress. Cons: Backups and Elastic IPs add small ongoing charges, but the protection they offer is well worth the price for any instance you care about.

Frequently Asked Questions

Does rebooting an EC2 instance change its IP address?

No. A reboot keeps the same public IP, private IP, and Elastic IP. Only a stop and start changes a dynamic public IP address. This is the most common point of confusion, so always confirm which action you actually performed before assuming the IP changed.

How long should a normal EC2 reboot take?

A healthy instance usually comes back within one to three minutes. If it stays unreachable past five minutes, check your status checks and system log. A reboot that never finishes points to a boot error, a full disk, or a broken fstab entry that needs the rescue method.

Why does my SSH say connection refused after a reboot?

Connection refused means the instance is reachable but the SSH service is not running. This usually points to a boot failure, a full disk, or a hung mount. A timeout, by contrast, points to a network block like a security group or ACL.

Can I fix a broken instance without losing my data?

Yes. The root volume rescue method lets you fix almost any boot problem with zero data loss. You detach the disk, repair it on a healthy instance, and reattach it. Your files stay safe on the EBS volume the entire time.

What is the difference between system and instance status checks?

A system status check monitors AWS hardware and network. An instance status check monitors your operating system and configuration. A system failure is fixed by AWS or by a stop and start, while an instance failure usually needs you to repair the OS or config yourself.

Should I reboot or stop and start to fix problems?

Try a reboot first, since it keeps your IP and data. If the system status check failed, use stop and start instead, because it moves your instance to healthy hardware. For OS level issues like a bad fstab file, neither helps, and you need the rescue method.

Similar Posts