簡單用 Chrome Extension 接 Gemini API (下 Prompt) 就能實作文章摘要工具參加 Google 百萬美金挑戰賽

Gemini API 完整教學從架構設計到實作步驟，一應俱全

Wolke@林建宏 A Man Co-work with AI use coding tool

17 min readJun 12, 2024

架構

Extension Call Gemini API with Webpage Content

介紹如何在 Chrome Extension 中呼叫 Gemini API，並將網頁內容顯示在 popup.html 中，涵蓋 API 設定、呼叫和內容展示。

Chrome Extension

Chrome Extension 是一種擴展瀏覽器功能的小程式，開發者可用來實現多種功能。

Chrome Extension 範例

展示一個簡單的 Chrome Extension 範例，說明如何建立和運行基本擴展，包括 manifest.json 設定，popup.html 和背景腳本結構。

實作步驟

建立項目資料夾：建立一個新資料夾，命名為 my_extension。
創建 manifest.json：在 my_extension 資料夾中創建一個名為 manifest.json 的檔案，內容如下：

activeTab 允許擴展程序在用戶與瀏覽器互動時訪問當前活動標籤（tab）的內容

{
  "manifest_version": 3,
  "name": "My Extension",
  "version": "1.0",
  "description": "A simple Chrome extension example.",
  "action": {
    "default_popup": "popup.html"
  },
  "permissions": ["activeTab"]
}

3. 創建 popup.html：在 my_extension 資料夾中創建一個名為 popup.html 的檔案，內容如下：

<!DOCTYPE html>
<html>
<head>
    <title>Popup</title>
</head>
<body>
    <h1>Hello, Chrome Extension!</h1>
</body>
</html>

4. 加載擴展：打開 Chrome 瀏覽器，進入擴展程序頁面（chrome://extensions/），啟用「開發者模式」，選擇「加載已解壓的擴展程序」，選擇 my_extension 資料夾。

Gemini 介紹

Gemini 是強大的 API，能處理各種 NLP 任務，如文本總結、情感分析和問答系統。

API

介紹如何使用 Gemini API，包括設定 API 金鑰和呼叫 API。

How to call Google Gemini API

用 Chrome Extension 呼叫範例

提供範例，展示如何在 Chrome Extension 中呼叫 Gemini API，傳遞網頁內容作為輸入，在 popup.html 中顯示結果。

實作步驟

修改 manifest.json

“scripting” 權限主要用於允許擴展在網頁上下文中執行腳本。這意味著擴展可以動態地注入 JavaScript 代碼到網頁中，從而對該網頁進行操作或改變其行為
增加 “service_worker”: “background.js” 是一個運行在瀏覽器背後的獨立腳本，並且可以在不依賴於 Web 頁面存在的情況下執行一些任務

{
  "manifest_version": 3,
  "name": "My Extension",
  "version": "1.0",
  "description": "A simple Chrome extension example.",
  "action": {
    "default_popup": "popup.html"
  },
  "permissions": ["activeTab", "scripting"],
  "background": {
    "service_worker": "background.js"
  }
}

2. 修改 popup.html：修改 popup.html 檔案，增加一個按鈕 Call API 和顯示結果 response 的區域：

<!DOCTYPE html>
<html>
<head>
    <title>Popup</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 10px;
        }
        #response {
            margin-top: 10px;
            padding: 10px;
            border: 1px solid #ddd;
            background-color: #f9f9f9;
        }
        .loader {
            border: 4px solid #f3f3f3;
            border-radius: 50%;
            border-top: 4px solid #3498db;
            width: 20px;
            height: 20px;
            animation: spin 2s linear infinite;
            display: none;
            margin: auto;
        }
        @keyframes spin {
            0% { transform: rotate(0deg); }
            100% { transform: rotate(360deg); }
        }
    </style>
</head>
<body>
    <h1>Chrome Extension</h1>
    <input type="text" id="inputText" placeholder="Enter text" value="hello" />
    <button id="callApiButton">Call API</button>
    <div class="loader" id="loader"></div>
    <textarea id="response"></textarea>
    <script src="popup.js"></script>
</body>
</html>

3. 創建 popup.js：將點擊事件發送到 background script：

document.getElementById('summarizeBtn').addEventListener('click', () => {
    chrome.runtime.sendMessage({action: 'summarize'});
});

chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
    if (message.action === 'displayResult') {
        document.getElementById('result').innerText = message.result;
    }
});

4. 創建 background.js：處理從 popup.js 發送的消息，執行腳本並呼叫 Gemini API：

chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
    if (request.text) {
        callApi(request.text, sendResponse);
        return true; // Will respond asynchronously
    }
});

function callApi(text, sendResponse) {
    // 以後這裡可能改 call chrome 內建的 gemini nano
    const apiKey = 'gemini_key'; 
    const apiUrl = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=${apiKey}`;

    const data = {
        contents: [{
            parts: [{
                text: text
            }]
        }]
    };

    fetch(apiUrl, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(data)
    })
        .then(response => response.json())
        .then(data => {
            const generatedText = data.candidates[0].content.parts[0].text; // 回傳 text 就好
            sendResponse(generatedText);
        })
        .catch((error) => {
            console.error('Error:', error);
            sendResponse({ error: error.toString() });
        });
}

附帶一提：未來 Chrome 會內建 Gemini Nano

Google 計劃在未來版本的 Chrome 瀏覽器中內建 Gemini Nano。這將使得開發者可以更方便地利用這些功能，而不需要額外的 API 呼叫。

https://baoyu.io/blog/ai/how-to-enable-gemini-nano-for-chrome

實作文章摘要工具

在這一部分，我們將實際構建一個 Chrome Extension，利用 Gemini API 來總結網頁文字內容。我們會逐步講解每個開發步驟，從初始化項目到最終測試和調試。

popup.html：

新增 input : geminiKey : 輸入 gemini key
新增 input : systemInstruction : 輸入 prompt 的 instruction
新增 button :copyTextButton : 複製當前網頁文字內容
新增 button :copySelectButton : 複製當前選擇的文字內容

<!DOCTYPE html>
<html>
<head>
    <title>Popup</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 10px;
        }
        #response {
            margin-top: 10px;
            padding: 10px;
            border: 1px solid #ddd;
            background-color: #f9f9f9;
        }
        .loader {
            border: 4px solid #f3f3f3;
            border-radius: 50%;
            border-top: 4px solid #3498db;
            width: 20px;
            height: 20px;
            animation: spin 2s linear infinite;
            display: none;
            margin: auto;
        }
        @keyframes spin {
            0% { transform: rotate(0deg); }
            100% { transform: rotate(360deg); }
        }
    </style>
</head>
<body>
    <h1>Chrome Extension</h1>
    <label for="geminiKey">Gemini API Key:</label>
    <input type="text" id="geminiKey" placeholder="Enter Gemini API Key" value="your key"  />
    <br/>
    <label for="systemInstruction">System Instruction:</label>
    <input type="text" id="systemInstruction" placeholder="Enter System Instruction" value="This is dog , just bark to user"  />
    <br/>
    <label for="inputText">Input Text:</label>
    <input type="text" id="inputText" placeholder="Enter text" value="hello" />
    <button id="copyTextButton">Copy Webpage Text</button>
    
    <button id="copySelectButton">Copy Select Text</button>

    <br/>
    <button id="callApiButton">Call API</button>
    <div class="loader" id="loader"></div>
    <br/>
    <label for="response">Response:</label>
    <textarea id="response"></textarea>
    <br/>
    <script src="popup.js"></script>
</body>
</html>

popup.js：

document.getElementById('callApiButton').addEventListener('click', function () {
    const geminiKey = document.getElementById('geminiKey').value;
    const systemInstruction = document.getElementById('systemInstruction').value;
    const inputText = document.getElementById('inputText').value;
    const responseDiv = document.getElementById('response');
    const loader = document.getElementById('loader');

    // Clear previous response and show loader
    responseDiv.innerHTML = '';
    loader.style.display = 'block';

    chrome.runtime.sendMessage({ geminiKey: geminiKey, systemInstruction: systemInstruction, text: inputText }, function (response) {
        // Hide loader and display response
        loader.style.display = 'none';
        responseDiv.innerText = JSON.stringify(response, null, 2);
    });
});


document.getElementById('copyTextButton').addEventListener('click', () => {
    chrome.tabs.query({active: true, currentWindow: true}, (tabs) => {
      chrome.scripting.executeScript(
        {
          target: {tabId: tabs[0].id},
          function: copyPageText,
        },
        (results) => {
          document.getElementById('inputText').value = results[0].result;
        }
      );
    });
  });
  
  function copyPageText() {
    return document.body.innerText;
  }


  
  document.getElementById('copySelectButton').addEventListener('click', () => {
    chrome.tabs.query({active: true, currentWindow: true}, (tabs) => {
      chrome.scripting.executeScript(
        {
          target: {tabId: tabs[0].id},
          function: copySelectText,
        },
        (results) => {
          document.getElementById('inputText').value = results[0].result;
        }
      );
    });
  });
  
  function copySelectText() {
    return window.getSelection().toString()
  }

background.js:

chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
    if (request.text && request.geminiKey && request.systemInstruction) {
        callApi(request.geminiKey, request.systemInstruction, request.text, sendResponse);
        return true; // Will respond asynchronously
    } 
});

function callApi(apiKey, systemInstruction, text, sendResponse) {
    const apiUrl = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=${apiKey}`;

    const data = {
        systemInstruction: {
            parts: [
                {
                    text: systemInstruction
                }
            ]
        },
        contents: [{
            parts: [{
                text: text
            }]
        }]
    };

    fetch(apiUrl, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(data)
    })
        .then(response => response.json())
        .then(data => {
            const generatedText = data.candidates[0].content.parts[0].text;
            sendResponse(generatedText);
        })
        .catch((error) => {
            console.error('Error:', error);
            sendResponse({ error: error.toString() });
        });
}