Home Blogs Data Center Explorer The biggest risk to uptime? Your staff

The biggest risk to uptime? Your staff

News Analysis

Oct 09, 20192 mins

Data CenterNetworking

Human error is the chief cause of downtime, a new study finds. Imagine that.

9 how well do you know your staff head in clouds anonymous cloud computing

Credit: Getty Images

There was an old joke: “To err is human, but to really foul up you need a computer.” Now it seems the reverse is true. The reliability of data center equipment is vastly improved but the humans running them have not kept up and it’s a threat to uptime.

The Uptime Institute has surveyed thousands of IT professionals throughout the year on outages and said the vast majority of data center failures are caused by human error, from 70 percent to 75 percent.

And some of them are severe. It found more than 30 percent of IT service and data center operators experienced downtime that they called a “severe degradation of service” over the last year, with 10 percent of the 2019 respondents reporting that their most recent incident cost more than $1 million.

In Uptime’s April 2019 survey, 60 percent of respondents believed that their most recent significant downtime incident could have been prevented with better management/processes or configuration. For outages that cost greater than $1 million, this figure jumped to 74 percent.

However, the end fault is not necessarily with the staff, Uptime argues, but with management that has failed them.

“Perhaps there is simply a limit to what can be achieved in an industry that still relies heavily on people to perform many of the most basic and critical tasks and thus is subject to human error, which can never be completely eliminated,” wrote Kevin Heslin, chief editor of the Uptime Institute Journal in a blog post.

“However, a quick survey of the issues suggests that management failure — not human error — is the main reason that outages persist. By under-investing in training, failing to enforce policies, allowing procedures to grow outdated, and underestimating the importance of qualified staff, management sets the stage for a cascade of circumstances that leads to downtime,” Heslin went on to say.

Uptime noted that the complexity of a company’s infrastructure, especially the distributed nature of it, can increase the risk that simple errors will cascade into a service outage and said companies need to be aware of the greater risk involved with greater complexity.

On the staffing side, it cautioned against expanding critical IT capacity faster than the company can attract and apply the resources to manage that infrastructure and to be aware of any staffing and skills shortage before they start to impair mission-critical operations.

by Andy Patrizio

Andy Patrizio is a freelance journalist based in southern California who has covered the computer industry for 20 years and has built every x86 PC he’s ever owned, laptops not included.

The opinions expressed in this blog are those of the author and do not necessarily represent those of ITworld, Network World, its parent, subsidiary or affiliated companies.

Americas

Topics

About

Policies

Our Network

More

The biggest risk to uptime? Your staff

Human error is the chief cause of downtime, a new study finds. Imagine that.

More from this author

Supermicro unveils AI-optimized storage powered by Nvidia

Intel, AMD forge x86 alliance

Vertiv and Nvidia define liquid cooling reference architecture

HPE, Dell launch another round of AI servers

AMD unveils new generation of Epyc, Instinct chips

Intel launches Xeon 6 processors and Gaudi 3 AI accelerators

Intel’s Altera spinout launches FPGA products, software

Intel rumored to be working on major core update

Show me more

Billion-dollar fine against Intel annulled, says EU Court of Justice

How to examine files on Linux

Nvidia to power India’s AI factories with tens of thousands of AI chips

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command

The biggest risk to uptime? Your staff

Human error is the chief cause of downtime, a new study finds. Imagine that.

Related content

F5, Nvidia team to boost AI, cloud security

AWS, Google Cloud certs command highest pay

Why enterprises should care more about net neutrality

Network jobs watch: Hiring, skills and certification trends

Newsletter Promo Module Test

More from this author

Supermicro unveils AI-optimized storage powered by Nvidia

Intel, AMD forge x86 alliance

Vertiv and Nvidia define liquid cooling reference architecture

HPE, Dell launch another round of AI servers

AMD unveils new generation of Epyc, Instinct chips

Intel launches Xeon 6 processors and Gaudi 3 AI accelerators

Intel’s Altera spinout launches FPGA products, software

Intel rumored to be working on major core update

Show me more

Billion-dollar fine against Intel annulled, says EU Court of Justice

How to examine files on Linux

Nvidia to power India’s AI factories with tens of thousands of AI chips

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the diff3 command

How to use the colordiff command

How to use the CMP command