Google Research at I/O 2023 – Google AI Blog

Wednesday May 10th was an exciting day for the Google Research community as we watched the results of months and years of our fundamental and applied work announced on the Google I/O stage. With the rapid pace of on-stage announcements, it can be difficult to convey the significant effort and unique innovations behind the technologies we present. So today we’re excited to reveal more about the research efforts behind many of the exciting announcements at this year’s I/O.

Palm 2:

Our next-generation large language model (LLM), PaLM 2, is built on a blend of computationally-optimal scaling, scalable instruction refinement, and improved database mixing. By tuning and instructing the model for different purposes, we’ve been able to integrate the latest capabilities into more than 25 Google products and functions, where it’s already helping to inform, assist and delight users. For example:

  • Bard is an early experience that enables collaboration with generative AI and helps increase productivity, accelerate ideas and fuel curiosity. It builds on advances in deep learning performance and uses reinforcement learning from human feedback to provide more relevant responses and increase the model’s ability to follow instructions. Bard is now available in 180 countries where users can interact with it in English, Japanese and Korean, and thanks to the multilingual capabilities provided by PaLM 2, there will soon be support for 40 languages.
  • With the Search Generative Experience, we’re taking more of the work out of search, so you’ll be able to understand content faster, discover new perspectives and insights, and get things done more easily. As part of this experience, you’ll see a snapshot of key AI-powered information to consider, with links to dig deeper.
  • MakerSuite is an easy-to-use prototyping environment for the PaLM API powered by PaLM 2. In fact, internal user engagement with early MakerSuite prototypes accelerated the development of our PaLM 2 model. MakerSuite grew out of research focused on prompting tools or tools expressly designed for customizing and supervising LLMs. This line of research includes PromptMaker (a precursor to MakerSuite) and AI Chains and PromptChainer (one of the first research efforts to demonstrate the utility of the LLM chain).
  • Project Tailwind also used MakerSuite’s early research prototypes to develop features to help writers and researchers discover ideas and improve their prose. The first AI laptop prototype used PaLM 2 to allow users to ask the model questions based on documents they defined.
  • Med-PaLM 2 is our modern medical LLM built on PaLM 2. Med-PaLM 2 achieved 86.5% performance on US medical licensing exam-style questions, demonstrating its exciting potential for healthcare. We are now exploring multimodal capabilities to synthesize X-ray-like inputs.
  • Codey is a version of PaLM 2 tweaked to the source code to act as a developer assistant. It supports a wide range of Code AI features, including code completions, code explanation, bug fixes, source code migration, error explanations, and more. Codey is available through our trusted tester program via IDEs (Colab, Android Studio, Duet AI for Cloud, Firebase) and 3P oriented API.

Perhaps more exciting for developers, we’ve opened up the PaLM APIs and MakerSuite to enable the community to innovate using this breakthrough technology.

PaLM 2 has advanced coding capabilities that allow you to find code errors and make suggestions in a number of different languages.

image

Our Imagen family of image creation and editing models builds on advances in large transformer-based language models and diffusion models. This model family is included in many Google products, including:

  • Image generation in Google Slides and Generative AI wallpaper on Android is created with our text-to-image functions.
  • Google Cloud’s Vertex AI enables image creation, image editing, image enhancement and refinement to help enterprise customers meet their business needs.
  • I/O Flip, a digital version of the classic card game, features mascots from Google developers on cards that are completely AI-generated. This game featured a fine-tuning technique called “DreamBooth” to adjust the pre-made rendering models. Using just a few images as input for customization, it allows users to create personalized images in minutes. With DreamBooth, users can synthesize an object in different scenes, positions, scenery and lighting conditions that cannot be seen in reference images.
    I/O Flip features custom card decks designed using DreamBooth.

Fenaki

Phenaki, Google’s Transformers-based text-to-video creation model, was shown at the I/O preview. Phenaki is a model that can synthesize realistic videos from text cue sequences using two main components: an encoder-decoder model that compresses videos to discrete embeddings, and a transformer model that translates text embeddings into video symbols.

ARCore and the Scene Semantic API

Among the new ARCore features announced by the AR team at I/O, the Scene Semantic API can recognize pixel semantics in an outdoor scene. This helps users create custom AR experiences based on the features of the surrounding area. This API is powered by an outdoor semantic segmentation model using our recent work on the DeepLab architecture and the egocentric database for outdoor scene perception. The latest release of ARCore also includes an improved monolithic depth model that provides greater accuracy in outdoor scenes.

The Scene Semantics API uses a DeepLab-based semantic segmentation model to provide pixel-accurate outdoor scene labels.

Chirp

Chirp is Google’s family of state-of-the-art Universal Speech models, trained on 12 million hours of speech to enable Automatic Speech Recognition (ASR) for 100+ languages. Models can perform ASR in under-resourced languages ​​such as Amharic, Cebuano, and Assamese, in addition to widely spoken languages ​​such as English and Mandarin. Chirp is able to cover such a wide range of languages ​​using self-supervised learning on an unlabeled multilingual dataset, with refinement on a smaller set of labeled data. Chirp is now available in the Google Cloud Speech-to-Text API, which allows users to make inferences about the model through a simple interface. You can get started with Chirp here.

MusicLM:

At I/O, we launched MusicLM, a text-to-music model that generates 20 seconds of music from a text prompt. You can try it out for yourself in the AI ​​Test Kitchen or see it demoed at the I/O show where electronic musician and composer Dan Deacon used MusicLM in his performance.

MusicLM, consisting of models powered by AudioLM and MuLAN, can make music (from text, hum, images or videos) and musical accompaniments to songs. AudioLM produces high-quality sound with long-term consistency. It maps sound to a sequence of discrete symbols and records it as a language modeling task. To efficiently synthesize the longer results, it used a new approach we developed called SoundStorm.

Universal Translator backup

Our dubbing efforts leverage dozens of ML technologies to translate the full expressive range of video content, making videos accessible to audiences around the world. These technologies have been used to replicate videos for a variety of products and content types, including educational content, advertising campaigns, and creator content, with more to come. We use deep learning technology to achieve voice preservation and lip-matching and provide high-quality video translation. We’ve built this product to include human quality review, security checks to prevent abuse, and we only make it available to authorized partners.

AI for global social good

We apply our AI technologies to solve the biggest global challenges, such as mitigating climate change, adapting to a warming planet, and improving human health and well-being. For example:

  • Traffic engineers use our Green Light recommendations to reduce congestion at intersections and improve traffic flow from Bangalore to Rio de Janeiro and Hamburg. Green Light models each intersection, analyzing traffic patterns to develop recommendations that will make traffic lights more efficient, such as better synchronizing the timing between adjacent lights or adjusting the “green time” for a given street and direction.
  • We’ve also expanded Flood Hub’s global coverage to 80 countries as part of our efforts to predict river floods and warn affected people before disaster strikes. Our flood forecasting efforts rely on hydrologic models informed by satellite observations, weather forecasts, and insitu measurements.

Technologies for Inclusive and Equitable ML Applications

With our continued investment in AI technologies, we emphasize responsible AI development, aiming to make our models and tools useful and impactful, while ensuring fairness, safety and alignment with AI principles. Some of these efforts were highlighted at I/O, including:

  • Release of the Monk Skin Tone Examples (MST-E) dataset to help practitioners gain a deeper understanding of the MST scale and train human annotators for more consistent, inclusive, and meaningful skin tone annotations. You can read more about these and other developments on our website. This is a step forward in the open source release of the Monk Skin Tone (MST) scale that we launched last year to enable developers to create products that are more inclusive and better represent their diverse users.
  • A new Kaggle competition (open until August 10) challenges the ML community to build a model that can quickly and accurately identify American Sign Language (ASL) fingerspelling; manually rather than using special symbols for whole words and translating it into written text. Learn more about the Kaggle fingerspelling contest featuring a song by deaf musician and rapper Sean Forbes. We also showcased at I/O the winning algorithm for last year’s competition of powers, PopSign, an ASL learning app for parents of children who are deaf or hard of hearing, created by Georgia Tech and the Rochester Institute of Technology (RIT).

Building the future of AI together

It’s inspiring to be part of a community of so many talented individuals who are leading the way in developing cutting-edge technologies, responsive AI approaches, and exciting user experiences. We are in the midst of a period of incredible and transformative change for AI. Stay tuned for more updates on how the Google Research community is boldly exploring the boundaries of these technologies and using them responsibly to benefit people’s lives around the world. We hope you’re just as excited about the future of AI technologies, and we invite you to engage with our teams through the links, sites, and tools we’ve highlighted here.

Source link