Posts

Microsoft PHI-3 Demystified: Everything You Need to Know for Seamless Operations

Microsoft PHI-3 Demystified: Everything You Need to Know for Seamless Operations



imageEnlarge Getty ImagesOn Tuesday, Microsoft announced a new, freely available lightweight AI language model named Phi-3-mini, which is simpler and less expensive to operate than traditional large language models (LLMs) like OpenAI's GPT-4 Turbo. Its small size is ideal for running locally, which could bring an AI model of similar capability to the free version of ChatGPT to a smartphone without needing an Internet connection to run it.

Further Reading

The AI field typically measures AI language model size by parameter count. Parameters are numerical values in a neural network that determine how the language model processes and generates text. They are learned during training on large datasets and essentially encode the model's knowledge into quantified form. More parameters generally allow the model to capture more nuanced and complex language-generation capabilities but also require more computational resources to train and run.

Some of the largest language models today, like Google's PaLM 2, have hundreds of billions of parameters. OpenAI's GPT-4 is rumored to have over a trillion parameters but spread over eight 220-billion parameter models in a mixture-of-experts configuration. Both models require heavy-duty data center GPUs (and supporting systems) to run properly.

In contrast, Microsoft aimed small with Phi-3-mini, which contains only 3.8 billion parameters and was trained on 3.3 trillion tokens. That makes it ideal to run on consumer GPU or AI-acceleration hardware that can be found in smartphones and laptops. It's a follow-up of two previous small language models from Microsoft: Phi-2, released in December, and Phi-1, released in June 2023.

 


A chart provided by Microsoft showing Phi-3 performance on various benchmarks. MicrosoftPhi-3-mini features a 4,000-token context window, but Microsoft also introduced a 128K-token version called "phi-3-mini-128K." Microsoft has also created 7-billion and 14-billion parameter versions of Phi-3 that it plans to release later that it claims are "significantly more capable" than phi-3-mini.

Microsoft says that Phi-3 features overall performance that "rivals that of models such as Mixtral 8x7B and GPT-3.5," as detailed in a paper titled "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone." Mixtral 8x7B, from French AI company Mistral, utilizes a mixture-of-experts model, and GPT-3.5 powers the free version of ChatGPT.

"[Phi-3] looks like it's going to be a shockingly good small model if their benchmarks are reflective of what it can actually do," said AI researcher Simon Willison in an interview with Ars. Shortly after providing that quote, Willison downloaded Phi-3 to his Macbook laptop locally and said, "I got it working, and it's GOOD" in a text message sent to Ars.



 A screenshot of Phi-3-mini running locally on Simon Willison's Macbook."Most models that run on a local device still need hefty hardware," says Willison. "Phi-3-mini runs comfortably with less than 8GB of RAM, and can churn out tokens at a reasonable speed even on just a regular CPU. It's licensed MIT and should work well on a $55 Raspberry Pi—and the quality of results I've seen from it so far are comparable to models 4x larger."

How did Microsoft cram a capability potentially similar to GPT-3.5, which has at least 175 billion parameters, into such a small model? Its researchers found the answer by using carefully curated, high-quality training data they initially pulled from textbooks. "The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data," writes Microsoft. "The model is also further aligned for robustness, safety, and chat format."

Much has been written about the potential environmental impact of AI models and datacenters themselves, including on Ars. With new techniques and research, it's possible that machine learning experts may continue to increase the capability of smaller AI models, replacing the need for larger ones—at least for everyday tasks. That would theoretically not only save money in the long run but also require far less energy in aggregate, dramatically decreasing AI's environmental footprint. AI models like Phi-3 may be a step toward that future if the benchmark results hold up to scrutiny.

Phi-3 is immediately available on Microsoft's cloud service platform Azure, as well as through partnerships with machine learning model platform Hugging Face and Ollama, a framework that allows models to run locally on Macs and PCs.

imageEnlarge Getty ImagesOn Monday at the OpenAI DevDay event, company CEO Sam Altman announced a major update to its GPT-4 language model called GPT-4 Turbo, which can process a much larger amount of text than GPT-4 and features a knowledge cutoff of April 2023. He also introduced APIs for DALL-E 3, GPT-4 Vision, and text-to-speech—and launched an "Assistants API" that makes it easier for developers to build assistive AI apps.

Further Reading

Anthropic’s Claude AI can now digest an entire book like The Great Gatsby in seconds

OpenAI hosted its first-ever developer event on November 6 in San Francisco called DevDay. During the opening keynote delivered by Altman in front of a small audience, the CEO showcased the wider impacts of its AI technology in the world, including helping people with tech accessibility. Altman shared some stats, saying that over 2 million developers are building apps using its APIs, over 92 percent of Fortune 500 companies are building on their platform, and that ChatGPT has over 100 million active weekly users.

At one point, Microsoft CEO Satya Nadella made a surprise appearance on the stage, talking with Altman about the deepening partnership between Microsoft and OpenAI and sharing some general thoughts about the future of the technology, which he thinks will empower people.

The OpenAI DevDay 2023 keynote from Sam Altman. ## GPT-4 gets an upgrade

During the keynote, Altman dropped several major announcements, including "GPTs," which are custom, shareable, user-defined ChatGPT AI roles that we covered separately in another article. He also launched the aforementioned GPT-4 Turbo model, which is perhaps most notable for three properties: context length, more up-to-date knowledge, and price.

Further Reading

OpenAI’s GPT-4 exhibits “human-level performance” on professional benchmarks

Large language models (LLM) like GPT-4 rely on a context length or "context window" that defines how much text they can process at once. That window is often measured in tokens, which are chunks of words. According to OpenAI, one token corresponds roughly to about four characters of English text, or about three-quarters of a word. That means GPT-4 Turbo can consider around 96,000 words in one go, which is longer than many novels. Also, a 128K context length can lead to much longer conversations without having the AI assistant lose its short-term memory of the topic at hand.

Previously, GPT-4 featured an 8,000-token context window, with a 32K model available through an API for some developers. Extended context windows aren't completely new to GPT-4 Turbo: Anthropic announced a 100K token version of its Claude language model in May, and Claude 2 continued that tradition.

For most of the past year, ChatGPT and GPT-4 only officially incorporated knowledge of events up to September 2021 (although judging by reports, OpenAI has been silently testing models with more recent cutoffs at various times). GPT-4 Turbo has knowledge of events up to April 2023, making it OpenAI's most up-to-date language model yet.

And regarding cost, running GPT-4 Turbo as an API reportedly costs one-third less than GPT-4 for input tokens (at $0.01 per 1,000 tokens) and one-half less than GPT-4 for output tokens (at $0.03 per 1,000 tokens). Relatedly, OpenAI also dropped prices for its GPT-3.5 Turbo API models. And OpenAI announced it is doubling the tokens-per-minute limit for all paying GPT-4 customers, allowing requests for increased rate limits as well.

More capabilities come to API

APIs, or application programming interfaces, are ways that programs can talk to each other. They let software developers integrate OpenAI's models into their apps. Starting Monday, OpenAI now offers access to APIs for: GPT-4 Turbo with vision, which can analyze images and use them in conversations; DALL-E 3, which can generate images using AI image synthesis; and OpenAI's text-to-speech model, which has made a splash in the ChatGPT app with its realistic voices.

Further Reading

OpenAI launches GPT-4 API for all paying customers

OpenAI also debuted the "Assistants API," which can help developers build "agent-like experiences" within their own apps. It's similar to an API version of OpenAI's new "GPTs" product that allows for custom instructions and external tool use.

The key to Assistants API, OpenAI says, is "persistent and infinitely long threads," which allow developers to forego keeping track of an existing conversation history themselves and manually manage context window limitations. Instead, developers can add each new message in the conversation to an existing thread. In contrast to "stateless" AI, which means the AI model approaches each chat session as a blank slate with no knowledge of previous interactions, people often call this threaded approach "stateful" AI.

Odds and ends

Also on Monday, OpenAI introduced what it calls "Copyright Shield," which is the company's commitment to protect its enterprise and API customers from legal claims related to copyright infringement due to using its text or image generators. The shield does not apply to ChatGPT free or Plus users. And OpenAI announced the launch of version 3 of its open source Whisper model, which handles speech recognition.

Further Reading

OpenAI introduces custom AI assistants called “GPTs” that play different roles

While closing out his keynote address, Altman emphasized his company's iterative approach toward introducing AI features with more agency (referring to GPTs) and expressed optimism that AI will create abundance. "As intelligence is integrated everywhere, we will all have superpowers on demand," he said.

While inviting attendees to return to DevDay next year, Altman dropped a hint at what's to come: "What we launched today is going to look very quaint compared to what we're creating for you now."


Post a Comment

© Blazezoneai. All rights reserved. Premium By Raushan Design