🎙️ How AI breaks & shapes the audio industry, and top takeaways from AI trends

The image features a painting of a humanoid robot on the left, and God on the right side. They are both extending their arms about to touch their fingers in the style of the creation of Adam from Michelangelo. There is also a potent surge of golden energy appearing behind the area where they are about to touch their fingers as if God is about to give life to a new being. The image has a title at the top left: "The Practical Edge of AI", and Juan Delgadillo's branding at the bottom right. The image acts as the main Hero image for The Practical Edge of AI newsletter.

Welcome back and happy Wednesday! A new week brings fresh inspiration and innovative insights from the world of artificial intelligence (AI).

💬 In this week's letter:

  • 6 ways AI is reshaping the audio creation process.
  • Top digestible takeaways from AI trends in 2024

Was this email forwarded to you? Subscribe here!

How AI breaks & shapes the audio industry

Audio plays a fundamental role in many forms of media, from movies to podcasts, audiobooks, and video games. But producing quality audio can often be a challenging process that requires access to extensive sound libraries as well as deep domain expertise (sound engineering, foley, voice acting, etc.) to produce optimal results —expertise and resources that not everyone may have.

The Fundamental AI Research (FAIR) team at Meta shared Audiobox, their audio generative model for lowering the barrier of accessibility for audio creation and getting closer to a future where anyone can become an audio content creator.

Let's explore Audiobox's 6 game changing capabilities:

  • Your voice: Generate speech in the style of of your voice or any audio sample. The AI model will use this to generate speech in the style of this reference vocal style.
  • Described voices: Describe the characteristics of the vocal style, as well as the acoustic environment (e.g. ‘in a large cathedral’) to generate speech with novel voice styles using text description.
  • Restyled voices: Describe how you want to modify the style of the voice. The model will use this new voice style to narrate your script. E.g. A middle-aged person speaking with a relaxed, friendly voice.
  • Sound effects: Describe the characteristics of the sound you would like to create. The model will use this to generate your sound effect.
  • Magic eraser: Erase unwanted transient noise from voice recordings while leaving the speech intact.
  • Sound editing with generative infilling: You can crop an audio segment and regenerate it. By providing a text description, Audiobox can insert sound effects like “a dog barking” into an audio clip of the sound of rain.

Audiobox is an important step toward democratizing audio generation. FAIR's team envision a future where everyone can more easily and efficiently create audio that is tailored to their use cases. Their hope is that they can see the same creativity sparked by advancements in text and image generation happen for audio as well, for both professionals and hobbyists. Content creation, narration, sound editing, game development, and even AI chatbots can all benefit from the capabilities of audio generation models.

You can explore a series of interactive audio demos to help you understand the unique capabilities of Audiobox. You can experiment with each capability individually through this link.

What would you use a tool like Audiobox for?

Useful stories & ideas 💡

A decade ago, the best AI systems in the world were unable to classify objects in images at a human level. AI struggled with language comprehension and could not solve math problems.

Progress accelerated in 2023. New systems can generate fluent text in dozens of languages, process audio, and even explain memes. Even though no one can exactly say what the AI space will look like a few years from now, we can ask ourselves: What are the trends and where do they likely lead to next?

The institute for Human-Centric AI at Stanford shared an insightful detailed snapshot of AI as it advances at an unprecedented rate and shows potential to revolutionize every field of human endeavor.

Here are the digestible takeaways:

1. AI makes workers more productive and leads to higher quality work

  • In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more quickly and to improve the quality of their output. They also demonstrated AI’s potential to bridge the skill gap between low-and high-skilled workers.

2. Generative AI investment skyrockets

  • Despite a decline in overall AI private investment last year, funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds.

3. AI beats humans on some tasks, but not on all

  • AI has surpassed human performance on several benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it falls behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.

4. People across the globe are more aware of AI’s potential impact—and more nervous

  • A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests that 52% of Americans report feeling more concerned than excited about AI, rising from 38% in 2022.

Want to work together?

  • Work 1:1 with me – book a coaching or strategy session.
  • Advertise – showcase your brand, product, or service to our engaged audience of entrepreneurs, creators and investors developing a competitive advantage on AI.

Enjoy this newsletter?

Forward to a friend, sharing is caring.

Anything else? Hit reply to send us feedback or say hello. We don't bite!