Google has recently started installing their Gemini Nano model as part of Chrome. And while having a local model sure is handy for privacy and client-side prompting, there have also been concerns around silently installing 4GB (!) of weights.

But since it has already been installed on my machine anyways…

// chrome://on-device-internals/
Foundational model state: Ready
Model Name: v3Nano
Version: 2025.06.30.1229

… let’s try it out.

Keep in mind that this is very new and experimental. The API documentation has disclaimers all over it.

We first check whether the API is available

const availability = await LanguageModel?.availability({
  expectedInputs: [{ type: 'text', languages: ['en']}],
  expectedOutputs: [{ type: 'text', languages: ['en']}],
})
// availability is now undefined, 'unavailable', 'downloadable', 'downloading' or 'available'
const available = availability === 'available'

Notice that we do not simply check general availability, but also whether our requested input and output is supported. Gemini Nano is actually multi-modal for inputs and supports multiple languages. English, Japanese and Spanish are supported at the time of writing.

  • input types: text, image, audio
  • output types: text
  • languages: en, ja, es

Next we create our session. Ideally using the same parameters for which we already checked the availability.

const session = await LanguageModel.create({
  expectedInputs: [{ type: 'text', languages: ['en']}],
  expectedOutputs: [{ type: 'text', languages: ['en']}],
})

session.contextUsage   // 0
session.contextWindow  // 9216

The session is your typical chat session where the model can access the full history. Context window and current token usage are also accessible.

Let’s say hi.

const result = await session.prompt('hi')

// Hi there! 😊 How can I help you today? 
//
// I'm ready for anything – answering questions, generating text, brainstorming ideas, or just chatting. Just let me know what you have in mind! ✨

session.contextUsage // 55

What a helpful model.

The prompt method is quite easy to use but I’ve found that it can take a while… especially since the model tends to output a short-form novel on each reply. So for many use cases you’d rather want to stream the response.

const stream = session.promptStreaming('hi');
for await (const chunk of stream) {
  console.log(chunk);
}

Logging every chunk makes it hard to read, but when we use one of the more typical output forms for text, we get there:

const output = document.getElementById('stream-output')
const stream = session.promptStreaming('hi');

output.textContent = ''
for await (const chunk of stream) {
  output.textContent += chunk
}


This may already be enough for many use cases, but we can add a few creature comforts to create our very own local assistant:

  • chat history of user and assistant messages
  • abort button and sending corrections (abort + new prompt)
  • session history - resumable and persisted in indexedDB
  • token statistics
  • system prompt and message formatting

The result…


You can find the full code at Akatuoro/on-device-chat.

Displaying the chat history is easy, we simply remember each user message and the response in a messages array for rendering.

// on send
putMessage({ by: 'user', text: 'hi' })
putMessage({ by: 'assistant', text: '' })

// updating the message text
putMessage({...message, text: message.text + chunk })

const messages = [{
  by: 'user',
  text: 'hi',
}, {
  by: 'assistant',
  text: 'Hi there! 😊 How can I help you today?'
}]