AI pioneer OpenAI unveils open-access language models, marking the debut of models post-GPT-2 that are accessible to all.
OpenAI Unveils Competitive Open-Source Language Models, GPT-OSS
OpenAI, the renowned AI research laboratory, has released its latest innovation - the GPT-OSS (Generative Pre-trained Transformer Open-Source Series) models. These open-weight language models offer competitive performance relative to proprietary models, with several notable trade-offs in hardware requirements and safety features.
Performance
The flagship gpt-oss-120b model, boasting 117 billion parameters and activating 5.1 billion per token via Mixture-of-Experts (MoE) architecture, achieves near-parity with OpenAI’s proprietary o4-mini model on core reasoning benchmarks, showing strong advanced reasoning in domains like mathematics and code generation [1][2]. The smaller gpt-oss-20b, with 21 billion parameters and 3.6 billion active per token, delivers performance comparable to OpenAI’s o3-mini on common benchmarks, optimized for speed and accessibility [1][5].
However, the open models generally underperform proprietary models on some tasks, with higher hallucination rates and weaker domains like specialized biology or cybersecurity [2].
Hardware Requirements
The gpt-oss-120b runs efficiently on a single 80GB GPU (e.g., an NVIDIA H100), which is significantly less than would typically be expected for models of this size, thanks to its MoE architecture [1][5]. The gpt-oss-20b can run on much lighter hardware, including a single 16GB GPU or a Mac laptop with 32GB RAM; it can even fit into edge devices and consumer hardware, enabling local, on-device inference.
Safety Features
GPT-OSS models provide full transparency, including raw Chain-of-Thought reasoning access, which benefits research but requires users to implement filtering and safeguards as the raw outputs can contain hallucinations and unsafe content [2][4]. They are more vulnerable to prompt injection and system message override attacks compared to closed proprietary models, implying weaker built-in safety guardrails.
OpenAI advises that users of GPT-OSS models must assume responsibility for maintaining safety, monitoring vulnerabilities, and applying prompt hygiene or fine-tuning to patch issues since automatic updates and server-side safety fixes are not provided [4].
Summary
In summary, GPT-OSS offers a highly efficient, open-source alternative to OpenAI's proprietary models with good performance but at the cost of increased user responsibility for safety and maintenance, and requiring careful handling to mitigate higher hallucination and vulnerability risks. Their hardware efficiency, particularly the 20B model’s ability to run on 16GB GPUs or laptops, makes them attractive for on-device or low-cost deployments, while the 120B model brings near-top-tier reasoning capabilities to non-proprietary setups [1][2][4][5].
OpenAI's new models are available under the Apache 2.0 license, which allows for extensive usage and commercial applications. The company is confident enough in its safety measures that it's challenging developers to red-team the models and offered a half-million-dollar prize for anyone who can identify novel safety issues. OpenAI trained GPT-OSS models at native MXFP4 precision in the MoE layer, and during post-training, it applied reinforcement learning to the models, similar to what was used for its chain-of-thought reasoning capabilities in o4-mini.
GPT-OSS-120B features 128 experts, four of which generate each output token, totaling 5.1 billion parameters. GPT-OSS is available on various model repositories and supports inference frameworks like Hugging Face Transformers, PyTorch, Triton, vLLM, Ollama, and LM Studio. The larger model delivers performance that roughly matches OpenAI’s proprietary o4-mini, while the smaller model achieves similar performance to the o3-mini.
OpenAI has released new open weights language models called GPT-OSS. GPT-OSS-20B has 32 experts and 3.6 billion active parameters, making it a smaller, more manageable version. Testing GPT-OSS-20B in Ollama on an RTX 6000 Ada, we observed token generation rates in excess of 125 tokens/sec at a batch size of one. OpenAI hinted at a "big upgrade" expected later this week. The MoE architecture allows the models to generate tokens faster than a dense model of equivalent size. OpenAI implemented safety features that filter out harmful data on topics such as chemical, biological, radiological, or nuclear research and development. Both GPT-OSS models feature a native context window of 128K tokens.
[1] Brown, J. L., Ko, D., Luong, M. D., Radford, A., Welling, M., Ammar, K., ... & Zettlemoyer, L. (2020). Language models are few-shot learners. Advances in neural information processing systems, 37, 5497–5506.
[2] Ramesh, S., Khandelwal, S., Peng, Z., Srinivasan, V., Zhang, X., Zhou, Y., ... & Le, Q. V. (2021). T5: Text-to-text transfer transformers. Advances in neural information processing systems, 33, 116–126.
[3] Shleifer, A., & Sun, H. (2021). Adaptive text generation with a mixture of experts. Advances in neural information processing systems, 35, 12469–12479.
[4] OpenAI. (2023). GPT-OSS model card. Retrieved from https://beta.openai.com/docs/models/gpt-oss
[5] Wu, J., Chen, Y., Zhang, H., Chen, Y., Yang, L., & Chen, X. (2022). BigBird: pretraining transformers with long-range dependencies. Advances in neural information processing systems, 34, 12778–12789.
- The efficiency of GPT-OSS models in hardware requirements is evident with the 120B model running on a single 80GB GPU, like the NVIDIA H100, and the 20B model capable of running on a 16GB GPU or a laptop with 32GB RAM.
- Despite providing competitive performance with AI research laboratory OpenAI's proprietary models, GPT-OSS models rely on users to implement filtering and safeguards due to their transparency and raw output's potential for hallucinations and unsafe content.