Natural Language Processing (NLP)
With the advent of Transformers, NLP is the hottest of all of the AI research areas and I’m seeing a lot of our clients taking roles as researchers or applied scientists on NLP teams.
Feryal Behbahani's team at DeepMind is doing some really exciting work on adaptive agents that can do hypothesis-driven explorations for tasks in 3D spaces. This feels like one of the hottest areas of research and a novel approach.
If agents completing tasks autonomously and with limited examples is something that excites you as much as me, I'd recommend checking out this paper and then reaching out to Feryal (or someone on her team) for a coffee chat. It’s worth noting the approach is heavily Reinforcement Learning-based but I often find there is limited good research on RL and papers like this are a great read.
The team will be presenting at the ICML next month - catching their presentations would be an easy opportunity to learn more about the team and their research (see the session schedule here).
Another company I am excited about in the 3D real-world task space is 1x who is building a humanoid and was recently backed by OpenAI (remote friendly).
Additionally, the team at Aleph Alpha came out with an exciting paper called AtMan that adds explainability and sources to LLM outputs to help expert workers (e.g. lawyers, accountants) use LLMs more easily. Since releasing this in early 2023, it seems like many LLM-based companies (e.g. search and chat) are moving towards a source-driven approach.
If you’re interested in Aleph Alpha, I'd recommend reaching out to Samuel Weinbach - the technical co-founder - because at a <50 person company he's likely to still read your message.
Computer Vision
Aleksander Holynski (Google) and Tim Brooks (OpenAI) did some really interesting research on picture editing. In many stable diffusion-based systems, in order to add a new prompt idea (e.g. Van Gogh's painting style), you need to modify the CLIP model, which connects images and text together in vector space.
Traditionally, systems are trained on a huge number of images and associated text to build a reliable CLIP model but Aleksander and Tim propose a conditional diffusion model that generalizes better. Some of the details are beyond me but you can check their work out here.
Meta has been sharing a string of AI advancements, one of which is DINOv2 - a new method for training computer vision models. It's a non-fine tuned version that you can think of as a foundation for computer vision tasks. If you're excited by DINOv2, I'd recommend reaching out to Vasu Sharma to learn more about their research (and any potential openings on their team).
One last update in the computer vision space, I got pretty excited about SceneScape, a new text-driven approach to synthesizing long-term videos. I'd recommend connecting with Yoni Kasten (NVIDIA) to learn more or Tali Dekel from Google Research to learn about the work they’re doing with SceneScape.
Tali will also be at ICML next month to present a different research at one of the poster sessions.
Speech & Audio
I’m excited about AI advancements in speech and audio because it feels like one of the new paradigms for how users will communicate and interact with computers. The rise of LLMs gives so many new opportunities for speech and audio to be used across the application stack.
Petar Veličković is working on Connected Graph Neural Networks at DeepMind. His work isn't tied to just speech and audio, but rather any problem that can be represented as a graph. His paper helped me think better about how to represent many of the world's problems as graphing problems. If NNs and graphs are exciting to you too, I'd recommend reaching out to see if his team at DeepMind is a good fit.
We found he’ll also be at 2 poster sessions at the ICML this year to present other works he's also involved in so you’ll have the chance to learn about those, too.
Music has been heating up as an area of research and AI startups. The elegant approach offered by Meta in the MusicGen paper caught my eye. They use a single-step transformer which is a simpler, faster, and cheaper architecture.
If you are in Europe, I'd recommend reaching out to Jade Copet, a Research Engineering Manager at Meta, who did some amazing work on MusicGen. If you are closer to a PST timezone then David Kant at Meta is a good choice (fun fact - he did his doctorate in Algorithmic Music).
Another awesome speech and audio paper is Voicebox which uses text to guide multilingual speech generation. Mathew Le at FAIR is a good person to reach out to learn more. You can also catch him at ICML this year where he’ll also be presenting one of his works.
Conclusion
That's a wrap! Friendly reminder to be polite and personalized when you reach out to these researchers (or any for that matter) - they've got full-time jobs so be respectful of their time. If you want any help on how to reach out or structure the call, get in touch at hi@teamrora.com and we’re happy to chat!