For large language models, do you have to build or buy one?

last summer Especially with the explosion of large language models, it can only be described as ‘AI summer’. None are more famous than his GPT-3 at OpenAI and its newer, much-talked-about offspring, his ChatGPT.

Companies of all shapes and sizes in all industries are scrambling to figure out how to embrace and extract value from this new technology. But OpenAI’s business model was just as transformative as its contribution to natural language processing. Unlike nearly all previous releases of our flagship model, this model does not come with open-source pre-trained weights. This means that machine learning teams cannot simply download the model and fine-tune it for their own use case.

Instead, you’ll have to pay to use it as-is, or pay four times the current usage to fine-tune your model and then use it. Of course, companies can also choose other peer open source models.

This raises an old enterprise (but completely new to ML) question. Is it better to buy or build this technology?

It’s important to note that there is no one-size-fits-all answer to this question. I’m not trying to provide a comprehensive answer. It highlights the strengths and weaknesses of both routes, and provides a framework to help companies assess what works for them, while providing several intermediate paths that attempt to include components from both worlds. This means that

Purchasing: Fast, but with distinct pitfalls

Buildings look attractive in the long run, but they require leadership with a strong appetite for risk and ample financial resources to support that appetite.

Let’s start by buying. There are many model-as-a-service providers that offer custom models as APIs and charge per request. This approach is fast, reliable, and requires little or no upfront investment. Effectively, this approach de-risks machine learning projects. In particular, companies entering this space require only limited in-house expertise other than software engineers.

Given that the project can be started without the need for experienced machine learning personnel, and the ML components are purchased with a set of guarantees about the output, the model results are reasonably predictable.

Unfortunately, this approach has some very definite pitfalls. When you buy a model that anyone can buy and integrate into your system, it’s not too much of a stretch to assume that your competitors can achieve product parity just as quickly and reliably. This is true unless the upstream moat can be created by non-reproducible data collection techniques and the downstream moat can be created by integration.

Moreover, for high-throughput solutions, this approach can become very expensive at scale. For context, OpenAI’s DaVinci costs $0.02 per 1,000 tokens. Assuming a conservative response size of 250 tokens per request and a similar size, you’re paying $0.01 per request. For a product with 100,000 requests per day, you’re paying over $300,000 annually. Clearly, text-intensive applications (trying to generate articles or chat) lead to even higher costs.

It should also explain the limitations of flexibility that come with this approach. You have to use the model as is or pay a lot of money to fine-tune it. It’s worth remembering that the latter approach involves an implicit “lock-in” period with the provider. This is because the fine-tuned model is held in the provider’s digital custody, not yours.

Buildings: flexible and defensible, but expensive and risky

On the other hand, building your own technology avoids some of these challenges.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button