Americas

  • United States

Getting your enterprise ready for the real AI

Opinion
Sep 04, 20247 mins
Data Center

We’re in the infancy of in-house AI with many unknowns and plenty of questions. On the infrastructure front, enterprises agree that AI deployments should be designed as a new cluster with its own fast cluster network.

Credit: Shutterstock / SuPatMaN

If you’re a CxO or tech planner, I’d be willing to bet that your attitudes on AI have changed over just the last couple of months. You no longer carefully watch your GPU chips for signs of hostile intent, you don’t expect generative AI services offered by the giants to turn you into the email/text equivalent of a Hemingway, and you don’t expect those services to boost your overall profits and stock price. That doesn’t mean you’ve given up on AI, just that you’ve faced reality. But what, in terms of actions, does reality look like? What is the “real” AI?

For more and more enterprises, it’s an application you run in house. Of 292 enterprises who’ve commented to me on AI plans, 164 say that they believe their real AI benefits will accrue from self-hosting AI, not from public generative services. Of that group, only 105 even thought they knew what that would involve, and only 47 were confident. To say we’re in the infancy of in-house AI is clearly accurate. Why is this such a challenge? There are a lot of parts to an AI deployment, and every one of them is confusing.

Challenge notwithstanding, vendors seem to love the self-hosting option. Cisco and Juniper (whose acquisition by HPE seems on track) have both announced intentions to focus more on AI in the enterprise data center. The AI model providers (with one exception noted below) are also eager to promote licensing of their generative AI tools. All either group needs is buyers to line up, but the confusion I just mentioned means most don’t even know how to start.

Enterprises overall think that you start AI hosting plans with GPUs and data center equipment, but every one of the “confident” enterprises say that’s wrong. “You can’t buy hardware in anticipation of your application needs,” one CIO said. “You have to start with what you want AI to do, and then ask what AI software is needed. Then you can start doing data center planning.”

LLMs versus SLMs

The enterprises who think they have a handle on self-hosting AI say they start by assuming they need to host a private version of a big large language model (LLM) used in one of those public AI services, and most start with ChatGPT. About one-third progress along that path, but two-thirds say they now believe self-hosted AI should be based on an “open source” model. However, most of those two-thirds now say that what you really need to look for is an AI model that gets broad support by firms that “specialize” the model to a specific mission.

Right now, AI chatbot projects are the most likely to be proposed for in-house implementation, and by far the most likely to succeed in making a business case. These are directed at presale and post-sale missions, meaning marketing/sales and customer support. The enterprises that see this set of applications as the primary are the most likely to think in terms of public AI services or cloud hosting, and so they make up most of the one-third of enterprises that stay on the proprietary model track.

Business analytics and intelligence is the next AI application area most likely to make a business case, and the one that leads most enterprises to believe that they need to self-host AI in the first place. IBM accounts tend to rely on IBM’s watsonx strategy here, and of all enterprises show the most confidence in their approach to selecting a model. Meta’s Llama is now the favored strategy for other enterprises, surpassing BLOOM and Falcon models. But the shift was fairly recent, so Llama is still a bit behind in deployment though ahead in planning.

Business users of chatbots in customer-facing missions, those in the healthcare vertical, and even many planning AI in business analytics are increasingly interested in small language models (SLM) as opposed to LLMs. SLMs are smaller in terms of number of rules, and they’re trained for a specific mission on specialized data, even your own data. This training scope radically reduces the risk of hallucinations and generates more useful results in specialized areas. Some SLMs are essentially LLMs adapted to special missions, so the best way to find one is to search for an LLM for the mission you’re looking to support. If you have a vendor you trust in AI strategy, talking with them about mission-specific SLMs is a wise step. Enterprises who have used specialized SLMs (14 overall) agree that the SLM was a smart move, and one that can save you a lot of money in hosting.

GPUs and Ethernet networks

How about hosting? Enterprises tend to think of Nvidia GPUs, but they actually buy servers with GPUs included – so companies like Dell, HPE, and Supermicro may dictate GPU policy for enterprises. The number of GPUs enterprises commit to hosting has varied from about 50 to almost 600, but two-thirds of enterprises with less than 100 GPUs have reported adding them during early testing, and some with over 500 say they now believe they have too many. Most enterprise self-hosting planners expect to deploy between 200 and 400, and only two enterprises said they thought they’d use more than 450.

The fact that enterprises are unlikely to try to install GPUs on boards in computers, and most aren’t in favor of buying GPU boards for standard servers, links in part to their realization that you can’t put a Corvette engine into a stock 1958 Edsel and expect to win many races. Good GPUs need fast memory, a fast bus architecture, and fast I/O and network adapters.

Ah, networks. The old controversy over whether to use Ethernet or Infiniband has been settled for the enterprises either using or planning for self-hosted AI. They agree that Ethernet is the answer, and they also agree it should be as fast as possible. 800G Ethernet with both Priority Flow Control and Explicit Congestion Notification is recommended by enterprises, and it is even offered as a white-box device. Enterprises agree that AI shouldn’t be mixed with standard servers, so think of AI deployment as a new cluster with its own fast cluster network. It’s also important to have a fast connection to the data center for access to company data, either for training or prompts, and to the VPN for user access.

If you expect to have multiple AI applications, you may need more than one AI cluster. It’s possible to load an SLM or LLM onto a cluster as needed, but more complicated to have multiple models running at the same time in the same cluster while protecting the data. Some enterprises had thought they might pick one LLM tool, train it for customer support, financial analysis, and other applications, and then use it for them all in parallel. The problem, they report, is the difficulty in keeping the responses isolated. Do you want your support chatbot to answer questions about your financial strategy? If not, it’s probably not smart to mix missions within a model.

The final recommendation? Test…test…test. Take time assessing model options. Take time picking configurations, and test as often as possible, especially when you can kick the AI tires before you commit. Once you’ve gotten your AI strategy to work, keep on testing to ensure that you keep your model up to date with changes in your product, your business, and the tax and regulatory framework you operate in. AI isn’t out for your job, but like you, it needs refresher courses as things change. And we need a fresh look at AI.

Read more about enterprise AI from Tom Nolle

tom_nolle

Tom Nolle is founder and principal analyst at Andover Intel, a unique consulting and analysis firm that looks at evolving technologies and applications first from the perspective of the buyer and the buyers’ needs. Tom is a programmer, software architect, and manager of large software and network products by background, and he has been providing consulting services and technology analysis for decades. He’s a regular author of articles on networking, software development, and cloud computing, as well as emerging technologies like IoT, AI, and the metaverse.

More from this author