- Tune AI
- Posts
- 🚀GPT-4 in Trouble as Nvidia Enters Multi-Modal Race
🚀GPT-4 in Trouble as Nvidia Enters Multi-Modal Race
🔍New GPT Fundings, Tool Behind Google Search, and Non-Transformer Foundational Models
Hello Tuners,
In a week dominated by AI shakeups, OpenAI made headlines with its record-breaking $6.6 billion funding round, valuing the company at $157 billion. This comes despite mounting scepticism and a wave of high-profile departures, including ex-CTO Mira Murati last week and co-founder Durk Kingma's recent move to rival Anthropic.
Elsewhere in the AI space, Google Cloud rolled out significant updates to its AlloyDB and Memorystore offerings, enhancing their capabilities for AI-driven applications. At the same time, new developments in Non-Transformer Foundation Models catch the keen’s eyes.
OpenAI’s recent $6.6 billion funding round, valuing the company at $157 billion, highlights investor confidence despite a wave of executive departures and growing competition. While major players like Nvidia and Microsoft contributed, critics question the company’s reliance on ChatGPT subscriptions and high-profile exits, such as CTO Mira Murati. Meanwhile, competitors like Anthropic, which just hired ex-OpenAI Co-Founder Durk Kingma, are making strides, raising doubts about why OpenAI continues to receive such massive funding despite being late on features compared to rivals like Anthropic and Meta’s LLaMA.
OpenAI’s track record of reclaiming the lead through updates like the o1 preview series may explain the investor faith as it continues to top performance benchmarks. However, with stiff competition from Anthropic, Google, and Meta, the pressure is on OpenAI to prove it can maintain its edge in the rapidly evolving AI landscape.
As generative AI technology advances, enterprises are demanding more than simple chatbot solutions, pushing cloud hyperscalers to enhance their tools for operational data deployment. Google Cloud has responded with significant updates to its database services, particularly AlloyDB, a fully managed PostgreSQL-compatible database. One of the key updates is the general availability of ScaNN (scalable nearest neighbor) vector indexing, a technology also used in Google Search and YouTube. ScaNN enables faster index creation and vector queries, cutting memory usage significantly, making it ideal for handling large-scale AI workloads like real-time search and recommendation systems.
Vector databases are essential for supporting advanced AI applications by managing and processing vector embeddings used in similarity searches. While AlloyDB already supported vector search through pgvector with an HNSW algorithm, performance could suffer with massive vector workloads, leading to latency and high memory consumption. ScaNN, however, improves performance with up to four times faster queries and eight times quicker index builds. This advancement ensures AlloyDB can handle over a billion vectors efficiently, supporting enterprises as they build AI-driven, intelligent applications.
Liquid AI has introduced its new Liquid Foundation Models (LFMs), which break away from the traditional transformer architecture used by most generative AI models. Built from first principles rooted in dynamical systems and signal processing, the LFMs outperform transformer-based models like Meta’s Llama 3.1 and Microsoft’s Phi-3.5 while using much less memory. For instance, the LFM-3B requires only 16 GB of memory compared to Meta's 48 GB, making it highly efficient for real-world applications.
These models are designed for multimodal tasks, handling text, audio, video, and time series data. They have already achieved superior results on benchmarks like the Massive Multitask Language Understanding (MMLU). Available in three sizes, the LFMs are optimized for the finance, biotech, and consumer electronics industries. Liquid AI invites early testers before the October official launch at MIT’s Kresge Auditorium.
Nvidia has disrupted the AI landscape by releasing its NVLM 1.0 family of multimodal large language models as open-source, directly competing with proprietary models from industry giants like OpenAI and Google. Leading the pack is the NVLM-D-72B, a 72 billion-parameter model that excels in vision and language tasks and has notable advancements in text-only benchmarks. By openly sharing the model weights and promising the release of its training code, Nvidia offers unprecedented access to cutting-edge AI technology. This bold move could significantly accelerate research and development across the field.
The NVLM-D-72B demonstrates impressive versatility, excelling at tasks ranging from meme interpretation to mathematical problem-solving while improving text performance after multimodal training, which is rarely seen in similar models. Nvidia's decision to open-source such a powerful model may level the playing field, enabling smaller teams and researchers to contribute significantly to AI development. However, this move also raises questions about the future of AI business models and the ethical challenges of making advanced AI widely available. Nvidia's bold step could redefine the industry, sparking new waves of innovation and collaboration.
Weekly Research Spotlight 🔍
Logic of Thought
Large Language Models (LLMs) have shown impressive capabilities across various tasks; however, their performance in complex logical reasoning remains inadequate. Although prompting techniques like Chain-of-Thought can enhance reasoning abilities to some extent, they often lead to inconsistencies where conclusions do not align with the reasoning presented. Some research has sought to improve LLMs' logical reasoning by incorporating propositional logic. Still, these methods can result in information loss due to potential omissions in extracting logical expressions, leading to incorrect outcomes.
This new research introduces Logic-of-Thought (LoT) prompting to tackle these challenges, which leverages propositional logic to expand logical information derived from the input context. This generated logical information is then used as an additional augmentation to the input prompts, significantly enhancing logical reasoning capabilities. The LoT approach complements existing prompting methods, allowing for seamless integration. Extensive experiments reveal that LoT markedly improves the performance of various prompting techniques across five logical reasoning tasks. Notably, LoT boosts Chain-of-Thought's performance on the ReClor dataset by 4.35%, enhances Chain-of-Thought with Self-Consistency on LogiQA by 5%, and elevates the performance of Tree-of-Thoughts on the ProofWriter dataset by 8%.
LLM Of The Week
Molmo
In the current landscape of advanced multimodal models, most remain proprietary, creating a gap in foundational knowledge for building effective vision-language models (VLMs) from scratch. Enter Molmo, a new family of VLMs that sets a high standard for openness. Its key innovation is a meticulously collected image caption dataset created entirely by human annotators using speech-based descriptions.
Molmo also features a diverse fine-tuning dataset that includes in-the-wild question-and-answer pairs and innovative 2D pointing data to enhance user interactions. The Molmo 72B model outperforms other open-weight models and holds its own against proprietary systems like GPT-4o, Claude 3.5, and Gemini 1.5 on academic benchmarks and human evaluations. The developers plan to release all model weights, captioning and fine-tuning data, and source code soon, with select resources already available online.
Best Prompt of the Week 🎨
A dynamic food photography scene with a teal backdrop. A bowl of pho noodles, featuring sliced meat, fresh herbs, bean sprouts, and noodles, is in the foreground. One hand holds chopsticks with noodles, while another hand pours steaming broth from a cup into the bowl. Lime wedge, chili slice, and vibrant colors add to the playful composition. Bright, clean, and appetizing aesthetic.
Today's Goal: Try new things 🧪
Acting as a Product Development Planner
Prompt: I want you to act as a product development planner. You will create a structured daily plan specifically designed to help a developer build and launch a SaaS product. You will identify key milestones, develop strategies and action steps for product development, select the tools and technologies required, and outline any additional activities needed to ensure smooth development and a successful launch. My first suggestion request is: "I need help creating a daily activity plan for a developer who is planning to build a SaaS product from scratch.
This Week’s Must-Watch Gem 💎
This Week's Must Read Gem 💎
How did you find today's email? |