Chapters

Hide chapters

Practical Android AI

First Edition · Android 13 · Kotlin 2.0 · Android Studio Otter

8. Building Interactive App with Gemini Live
Written by Zahidur Rahman Faisal

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

Have you checked out the Gemini Live API? It’s a total game-changer for building real-time, interactive experiences in Android. Forget managing a whole backend just to stream audio or video to an LLM — Gemini Live makes it effortless.

Imagine building an app where the user can talk to a chatbot and get instant responses, just like a real conversation. That’s what the Live API enables.

What Makes an App ‘Interactive’ ?

When we talk about an interactive app in this context, especially with the Gemini Live API, we’re talking about an application that doesn’t just listen and reply — it actually acts on the user’s instructions. It goes beyond a simple question-and-answer chatbot.

Think of it this way:

  • Standard Chatbot App: You say, “What’s the weather like?” The model figures out the answer and replies. That’s a back-and-forth conversation.

  • Interactive App (with Function Calling): You say, “Please add coffee to my shopping list.”

    • The model doesn’t just say, “Okay, I’ve added coffee.”

    • It recognizes that “add to shopping list” is an action this app can perform.

    • It executes the function call that triggers the addListItem function in the actual Android code.

    • The app’s internal state (the shopping list), actually changes.

    • Then, the model gets confirmation and tells you: “Done. I’ve added coffee to your shopping list.”

The key is that the user’s voice prompt is translated directly into app-logic execution. The app is no longer just a passive interface; it’s an agent that can manipulate its own data and features based on a natural language command. It creates a seamless, hands-free experience where the AI is integrated directly into the core functionality of the app — that’s what makes it truly ‘interactive’ in the most powerful sense.

The Gemini Live API

When I first worked with the Gemini Live API, I realized it’s a major leap for mobile generative AI. Instead of the old request–response model, it now supports real-time, two-way streaming. That means the client and model can send and receive data simultaneously — creating a live conversation rather than a sequence of turns.

Hands On Gemini Live

Let’s extend the Firebase AI Logic app from the previous chapter with Gemini Live bidirectional streaming.

Project Setup and Dependencies

First things first, ensure you’re targeting Android API level 23+ and the app is connected to Firebase.

// Firebase AI Logic: Gemini Live Dependency
var firebaseAiLogicVersion = "17.6.0"
implementation "com.google.firebase:firebase-ai:$firebaseAiLogicVersion"
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
Before Initialization
Gelusa Imiyaaqoxexuev

Model Initialization and Configuration

The first step in using Gemini Live is initializing the backend service and creating a LiveGenerativeModel instance. The Live API configuration is handled through the liveGenerationConfig object, which determines the model’s behavior and the nature of the streaming output.

// The core Gemini Live model instance.
private lateinit var liveModel: LiveGenerativeModel

// Mutable state flow holding the current state of the live session.
private val _liveSessionState = MutableStateFlow<LiveSessionState>(LiveSessionState.Unknown())
val liveSessionState = _liveSessionState.asStateFlow()
fun initializeGeminiLive(activity: Activity) {
  requestAudioPermissionIfNeeded(activity)

  coroutineScope.launch {
    try {
      val liveGenerationConfig = liveGenerationConfig {
        speechConfig = SpeechConfig(voice = Voice("FENRIR"))
        responseModality = ResponseModality.AUDIO
      }

      liveModel = Firebase.ai(backend = googleAI()).liveModel(
        modelName = "gemini-live-2.5-flash-preview",
        generationConfig = liveGenerationConfig,
      )

      _liveSessionState.value = LiveSessionState.Ready()
    } catch (e: Exception) {
      _liveSessionState.value = LiveSessionState.Error(message = e.localizedMessage)
    }
  }
}
_liveSessionState.value = LiveSessionState.Ready()
_liveSessionState.value = LiveSessionState.Error(message = e.localizedMessage)
sealed interface LiveSessionState {
  data class Unknown(val message: String = "UNKNOWN: Gemini Live Not Initialized") : LiveSessionState

  data class Ready(val message: String = "READY: Ask Gemini Live") : LiveSessionState

  data class Running(val message: String = "RUNNING: Gemini Live Speaking...") : LiveSessionState

  data class Error(val message: String = "ERROR: Failed to initiate lGemini Live") : LiveSessionState
}
private val liveModelManager = LiveModelManager(
  context = application,
  coroutineScope = viewModelScope,
)
val liveSessionState = liveModelManager.liveSessionState
fun initializeGeminiLive(activity: Activity) {
  liveModelManager.initializeGeminiLive(activity)
}
viewModel.initializeGeminiLive(activity = this@MainActivity)
Audio Permission
Eurui Zaxtaszouq

Model Initialized
Hekut Uhicoujamoq

Real-Time Connection: Starting The Live Session

At this point, the app can connect to Gemini and start the live session. You need to use LiveModelManager for that.

private var session: LiveSession? = null
@RequiresPermission(Manifest.permission.RECORD_AUDIO)
fun startSessionFromText(catBreed: String) {
  val text = "Tell me about $catBreed cats in maximum 80 words."

  coroutineScope.launch(Dispatchers.IO) {
    try {
      // Start the conversation
      session = liveModel.connect()
      session?.send(text)
      session?.startAudioConversation()
        
      // Update State
      _liveSessionState.value = LiveSessionState.Running()
      } catch (e: Exception) {
        _liveSessionState.value = LiveSessionState.Error(message = e.localizedMessage)
    }
  }
}

Lifecycle Management: Toggling Session Start/Stop

You learned how to start a session, but you also need to know how to stop the session. The session should be explicitly closed when the microphone is deactivated or when the user navigates away from the screen. Even when you start a new session, the right approach is to stop any ongoing session before starting a new one.

fun stopSession() {
    session?.apply {
      stopAudioConversation()
      stopReceiving()
    }
    _liveSessionState.value = LiveSessionState.Ready()
}
@RequiresPermission(Manifest.permission.RECORD_AUDIO)
fun askAbout(catBreed: String) {
  when (val state = liveSessionState.value) {
    is LiveSessionState.Ready -> {
      liveModelManager.startSessionFromText(catBreed)
    }

    is LiveSessionState.Running -> {
      liveModelManager.stopSession()
    }

    else -> {
      Log.d(TAG, "Live session state: $state")
    }
  }
}
Model Running
Tozij Bebfoyv

Function Calling: Making Gemini Your App’s Agent

Now you know how to turn your app into a voice assistant using the Gemini Live API. The next big step is Function Calling - the superpower that lets the model actually interact with the logic and functionality of an Android app. It’s what makes the voice assistant an agent for your app.

Step 1: Define the App Function and its Declaration

First, you need the actual function in your app that you want the model to be able to call. In the sample app, you may want the user to ask for pictures of a specific cat breed - which means opening a Google Image search.

fun showPicture(catBreed: String) {
  coroutineScope.launch(Dispatchers.Default) {
    val query = Uri.encode("$catBreed cat pictures")
    val url = "https://www.google.com/search?q=$query&tbm=isch"

    val intent = Intent(Intent.ACTION_VIEW)
    intent.data = Uri.parse(url)
    intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK)

    try {
      context.startActivity(intent)
    } catch (e: Exception) {
      Log.e(TAG, "Error opening Google Images", e)
    }
  }
}
// The FunctionDeclaration for the model
val showPictureFunctionDeclaration = FunctionDeclaration(
    name = "showPicture",
    description = "Function to show picture of cat breed",
    parameters = mapOf(
        "catBreed" to Schema.string(
            description = "A short string describing the cat breed to show picture"
        )
    )
)

Step 2: Pass the Tool to the LiveModel

The Gemini model needs to know what tools (functions) it has available before the conversation even starts. You need to package the FunctionDeclaration into a Tool object and pass it to the liveModel initialization.

// Packaging the declaration into a Tool
val functionHandlerTool = Tool.functionDeclarations(listOf(showPictureFunctionDeclaration))
// Initializing the LiveGenerativeModel
liveModel = Firebase.ai(backend = googleAI()).liveModel(
    modelName = "gemini-live-2.5-flash-preview",
    generationConfig = liveGenerationConfig,
    tools = listOf(functionHandlerTool), // Passing the tool here!
)

Step 3: Implement the Handler Function

When the user says something that triggers the function, the model sends a FunctionCallPart to the app. You need a special function — a handler, to intercept this call, execute the app logic, and send the result back to the model.

fun functionCallHandler(functionCall: FunctionCallPart): FunctionResponsePart {
  return when (functionCall.name) {
    "showPicture" -> {
      val catBreed = functionCall.args["catBreed"]!!.jsonPrimitive.content
      showPicture(catBreed = catBreed)
      val response = JsonObject(
        mapOf(
          "success" to JsonPrimitive(true),
          "message" to JsonPrimitive("Showing pictures of $catBreed")
        )
      )
      FunctionResponsePart(functionCall.name, response)
    }

    else -> {
      val response = JsonObject(
        mapOf(
          "error" to JsonPrimitive("Unknown function: ${functionCall.name}")
        )
      )
      FunctionResponsePart(functionCall.name, response)
    }
  }
}

Step 4: Start the Conversation with a Function Handler

Finally, when you start or continue the live session, pass the handler function to the startAudioConversation() call. This tells the LiveSession which function to invoke when the model decides to use a tool.

// Start the conversation
session = liveModel.connect()
session?.send(text)
session?.startAudioConversation(::functionCallHandler) // Pass the handler here!
Function Calling
Meswjeev Fewlujz

Conclusion

To wrap this up, what you’ve done with the Gemini Live API and Function Calling isn’t just an evolutionary step; it’s a massive leap forward in how we build mobile AI experiences.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2026 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now