While JSON mode improves model reliability for generating valid JSON outputs, it does not guarantee that the model’s response will conform to a particular schema.
而現在新的 model 可以了:
Today we’re introducing Structured Outputs in the API, a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers.
新的 model 代碼是 gpt-4o-2024-08-06 這組,而且又降價了:
By switching to the new gpt-4o-2024-08-06, developers save 50% on inputs ($2.50/1M input tokens) and 33% on outputs ($10.00/1M output tokens) compared to gpt-4o-2024-05-13.
從 NVIDIA 這邊的新聞稿列出來的則比較合理,是透過硬體的觀點提到這個 12B model 可以跑在一張 4090 上 (24GB VRAM):
Designed to fit on the memory of a single NVIDIA L40S, NVIDIA GeForce RTX 4090 or NVIDIA RTX 4500 GPU, the Mistral NeMo NIM offers high efficiency, low compute cost, and enhanced security and privacy.
不過即使可以這樣跑,目前比較有效率的跑法應該是應該都會找 quantization 版本來跑,通常 model 會變小不少,而且損失應該也還在能接受的範圍。
I'm sure that it's sheerly coincidental that in the ten days since I added one line of 4 point, white-on-white text to my resume, I've had four times more contacts from recruiters than in the preceding month.
— Gothic Charm School 🎃 (@CupcakeGoth) May 25, 2024
Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.