How to Conduct a Server Health Check and 3 Tools to Make it Easier

by Helder Machado | Sep 1, 2021 | Managed IT Services

Originally published in 2021, we’ve updated this article on Conducting Server Health Checks for 2024.

Your server is the backbone of your network; yet, it’s often overlooked when it comes to monitoring your business’ operational health. Often, it’s not until a server’s performance has noticeably degraded that anyone thinks to check. And sometimes a change in your server’s health forewarns of a possible hardware failure; other times, it points to an application flaw.

Considering recent events like the SolarWinds security breach, it’s clear there’s a need at every type of business for ongoing server health monitoring. Careful monitoring should identify unusual behavior like increased resource consumption which can be an indicator of malicious activity. The sooner a potential compromise is identified, the sooner it can be contained. With the average time to detect a compromise at over 200 days, every business can benefit from proactive and early detection.

What is Server Health?

Server health refers to how well a server functions; but, it is more than the performance of hard drives and power supplies. Health checks depend on the server. For example, web servers have different performance metrics than file servers. Email and application servers are not the same as database servers. Each implementation has its unique set of evaluation criteria when it comes to server health.

A physical check may include CPU usage, memory availability, and disk capacity. Other checks may encompass connectivity, dependency, and anomaly evaluation.

The checks are designed to establish a baseline using historical data.

The baseline is then used to identify deviations that can be addressed to ensure optimum performance.

Why Are Health Checks Important?

Understanding how servers are functioning is essential to business success. Without regular checks, servers can fail unexpectedly or perform erratically. Unexpected downtime because of a server failure can be costly. It’s estimated unplanned downtime can cost $100,000 per hour.

But what about the impact on customer experience?

How many times have you heard — Sorry, the computer is running slow today — as you wait in line or on the phone? Waiting for the server to respond degrades the customer experience. After all, no one likes to wait. Given that 84% of customers will move to a competitor after three bad experiences, you just can’t afford to have a customer wait because of server performance problems.

Not only is customer experience important, but underperforming servers or unpredictable behavior can be a sign of unauthorized access and use. Cyber security threats continue to rise, with an attempt made every 11 seconds.

Deploying server health checking tools can result in faster detection and remediation.

Given that the average security breach takes over 200 days to detect, using a server health check tool and reviewing cyber incident reports can significantly reduce the time cybercriminals can roam freely through your network.

Proactive server health monitoring provides data that can be used to anticipate problems in the future. By monitoring your database and comparing current with historical data, companies can identify potential failures and address them before they impact the bottom line. They can also use monitoring data to make informed decisions about server replacement, performance optimization, server data security improvements, and operational adjustments.

Who Should Perform Server Health Checks?

Practically anyone can perform a server health check. It just takes time. Often, it takes lots of time.

How much time depends on the number of servers and their location. Server health check tools can help you automate the process, but interpreting the data needs a level of expertise. Putting together a comprehensive monitoring plan also requires experience to identify the critical metrics to evaluate.

FREE OFFER: If you have concerns or issues with your server, and would like help with a server health check, contact us for a FREE Server Health Check today. You’ll get a full report of your network and server health, along with an action plan for improvements.

How to Conduct a Server Health Check

How a health check is conducted depends on the server being tested. Certain physical capabilities apply no matter the server type; however, an SQL server health check has different performance metrics than an application server, or a mail server. An infrastructure check which exercises servers and network security and functionality should deliver the following:

Hardware metrics – Fans, power supply, disk drives, CPU, storage, memory, environmental conditions
Reports – Information on procurement, usages, and status to use with future purchases
Alarms – Notifications of changes in server health for faster resolution
Baselines – Historical metrics for setting thresholds for alerts
Visualization – Graphical representation instead of just reports to provide a quick server health assessment

After establishing the measurable metrics, thresholds are set using historical data to enable alarms to be triggered.

From the data, informed decisions can be made to improve network performance.

What Should Be Evaluated

Although the checks may vary, here are some essential server health assessments to conduct.

Uptime Checks

Servers are part of the network’s infrastructure, so their ability to connect is a critical metric to check. These checks may be performed using a load balancer or external monitoring agent. At a minimum, the tests should involve:

Confirming that the server is listening on the expected port and that those new connections are being established
Performing HTTP requests to ensure server responds within baseline parameters
Checking that basic statuses are being sent
Pinging the server can be a simple test to see if the configuration is viable.

Local Health Checks

These health checks go beyond uptime checks. They verify that applications can operate on the server. Local health checks establish that resources are available to ensure application performance. Their checks include:

Read and Write To Disk – Most applications write to disk for logging or error tracking. Assuming that disk access is not required can lead to fatal consequences when software attempts to access a resource that is not available.
Processes Functioning – Liveness checks may test proxy processes, but they may not check the proxy and application link. Local health checks go beyond the basic check to ensure that the processes are running and responding correctly.
Missing Processes – Ensure that support processes are operational. If monitoring doesn’t go deep enough to check support processes, organizations run the risk of having a service fail. Sometimes these failures are difficult to detect and take longer to remedy.

Performing checks local to the server ensure that the server is executing as it should.

Dependency Checks

Dependency health checks inspect the interactions among servers. For example, an application may need to send data to the SQL server. If the two servers fail to interact, the application may fail. Dependency checks can catch expired credentials or misconfigured servers that prevent an application from interacting with a database server. Dependency checks may include:

Configuration or Metadata – Checking for misconfigurations can catch disconnects that can lead to unpredictable behavior. For example, automated updates are no longer working on a dependency server, but the server cannot determine why updates stopped. Finding misconfigurations or missing metadata can ensure that servers continue to perform as needed.
Communication – When servers can’t communicate with other servers, network behavior may result in difficult-to-detect discrepancies resulting in network instability.
Software Flaws – Faulty software applications can lead to memory leaks or data corruption that impacts server performance. Checking to ensure that server performance is being maintained reduces the chances of fatal errors.

As networks become more complex, the interdependency of servers becomes critical to successful operations. Ignoring that dependency can have ramifications that far exceed the server-specific error.

Anomaly Checks

Checking to see if a server is behaving differently from its baseline or similar servers in the network should be a part of any type of monitoring server performance.

These checks can identify such anomalies as:

Clock Skewing – Many server and application functions depend on the server’s clock for executing code. If the clock is off, the system may fail, or the application may return an invalid response. For example, time limits on resetting passwords can result in user frustration if the clocks do not agree. In some instances, the difference may result in a system shutdown.
Outdated Software – Bringing a server online, especially one that has been disconnected for a while can introduce errors. Making sure that the server is up-to-date may not detect all possible errors. Checking for anomalies can help identify possible outdated software.
Failures – Anomaly checking can be a last line of defense for problems that may impact performance. Although perfect performance is the ideal, hardware and software rarely reach that goal. As a result, it is always prudent to check for aberrant behavior.

Anomalies occur for multiple reasons, many of which may not even be defined. That’s why check for unusual behavior is essential to server performance.

How Often to Check a Server’s Health

The short answer to how often to check a server’s health is — continuously. Unfortunately, 24/7 monitoring can absorb an entire IT support team’s resources. Easy server monitoring tools are available to help with monitoring and troubleshooting servers. Whether it is a web or a file server, monitoring tools exist that can result in optimum performance.

Tools for Server Health Checks

Monitoring a server’s health should be part of any maintenance plan. Without the details that come from monitoring and checking the infrastructure, companies are leaving themselves open for system failure or compromise. The following three tools are examples of available solutions.

PRTG

This solution is a network monitoring tool suitable for companies of all sizes with its ability to scale to meet the needs of an enterprise. PRTG does more than monitor a network’s infrastructure. It can check:

CPU load
Hard disk capacity
Overall performance
RAM usage
Bandwidth

Customizable dashboards and reports let administrators see their server environment in one place. Adding graphs and analytics displays makes it easy for IT personnel to respond to deviations.

Templates can speed the creation of dashboards and reports.

In addition to its health checks, the monitoring tool delivers the following:

Flexible alerts
Customizable user interfaces
Failover-tolerant monitoring
Distributed monitoring
Customizable mapping
Dynamic setup

The solution’s monitoring capabilities are designed to adjust to a company’s business requirements.

Datadog

Datadog provides surveillance, analytics, and safety tools for developers, security engineers, IT departments, and cloud-based infrastructures. It combines and automates application performance tracking, log management, and infrastructure surveillance. It offers dashboards, customizable alerts, and integration.

The solution is a cloud-hosted model with the following features:

Customizable views
Aggregated metrics and events
Automation tools
Source control
Bug tracking
Common server components
Monitoring and instrumentation
Database monitoring

Datadog provides a server monitoring solution for development shops looking to incorporate source control and bug tracking into a single system.

Observium

Observium monitors network equipment and servers. Once configured, it detects network devices and can collect and display information on each port. The tool supports a long list of devices using the SNMP protocol. The solution requires its own server with a dedicated URL.

The graphical user interface offers statistical displays, diagrams, and graphs. It shows information on:

CPU
RAM
Storage
Power supply
Temperature

Data collection can extend to Apache, MySQL, BIND, and Postfix.

With its auto-discovery capabilities, it expedites the installation and configuration of networks as well as the addition of devices.

Navigating the world of server health checks can be overwhelming. Whether you perform them yourself or partner with an experienced IT service provider, health checks are a vital part of maintaining a safe and secure operating environment. Contact us to discuss how we can get server health checking tools in place.

Are you looking for a managed IT services company in Worcester? Contact us today!

Recent Technology News You Can Use

Check out our updates on the latest data breaches (and other cybersecurity challenges), how-to guides, and other info on trendy tech stuff.

What are Swatting Attacks? A Comprehensive Guide for Your Business

How Can We Connect with You?

We love to connect, so pick up the phone, reach out for personalized support, or stop by our office and meet us in person!

Let’s Talk

You have questions. We love to answer.

(508) 453-4700

Customer Support

Need help? Your help desk is ready.

Open Ticket

Plan a Visit

32 Franklin Street, Suite 500
Worcester, MA 01608

Get Directions

How to Conduct a Server Health Check and 3 Tools to Make it Easier

What is Server Health?

Why Are Health Checks Important?

Who Should Perform Server Health Checks?

FREE OFFER: If you have concerns or issues with your server, and would like help with a server health check, contact us for a FREE Server Health Check today. You’ll get a full report of your network and server health, along with an action plan for improvements.

How to Conduct a Server Health Check

What Should Be Evaluated

Uptime Checks

Local Health Checks

Dependency Checks

Anomaly Checks

How Often to Check a Server’s Health

Tools for Server Health Checks

PRTG

Datadog

Observium

Recent Posts

Recent Technology News You Can Use

What are Swatting Attacks? A Comprehensive Guide for Your Business

New SEC Rules on Cybersecurity: An Essential Guide for SMBs

How to Secure a Small Business Network: an 8-Step Guide

How Can We Connect with You?

Let’s Talk

Customer Support

Plan a Visit

Sign Up For The Tech 5

IT Solutions

Services

Company

Recent Technology News

Machado Continues to Soar in 2024 Channel Futures MSP 501 Rankings

What are Swatting Attacks? A Comprehensive Guide for Your Business

New SEC Rules on Cybersecurity: An Essential Guide for SMBs