This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title \\\"ChatBot\\\" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there\\'s an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the \\\"Send\\\" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

body {    background-color: #f5f5f5;    font-family: \\'Arial\\', sans-serif;    margin: 0;    padding: 0;    display: flex;    justify-content: center;    align-items: center;    height: 100vh;}/* ChatBot container */.container.chatBot {    background-color: #ffffff;    width: 50%;    max-width: 600px;    border-radius: 8px;    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);    display: flex;    flex-direction: column;    justify-content: space-between;    padding: 20px;    position: relative;}/* Header styling */.header {    font-size: 24px;    color: #333;    text-align: center;    margin-bottom: 15px;}/* Chat history styling */.chatHistory {    height: 300px;    overflow-y: auto;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    margin-bottom: 20px;}.chatHistory::-webkit-scrollbar {    width: 8px;}.chatHistory::-webkit-scrollbar-thumb {    background-color: #ccc;    border-radius: 4px;}/* Input session styling */.inputSession {    display: flex;    align-items: center;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    justify-content: space-between;}/* Input field styling */#textInput {    width: 80%;    padding: 10px;    border: none;    border-radius: 8px;    background-color: #fff;    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);    margin-right: 10px;    font-size: 16px;    flex-grow: 1;     border: 1px solid #ddd;    border-radius: 4px;    padding: 8px;    margin-right: 10px;     width: 100%;}/* Button for sending messages */#btnSend {    color: #fff;    border: none;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    cursor: pointer;    font-size: 20px;    transition: background-color 0.3s;}#btnSend:hover {    background-color: #363e47;}/* Image preview styling */.imagePreview {    display: flex;    align-items: center;    flex-grow: 1;    margin-bottom: 10px;}#previewImage {    max-width: 80px;    max-height: 80px;    border-radius: 5px;    margin-right: 10px;    object-fit: cover;    margin-right: 10px;}/* Label for file input */label[for=\\\"imageInput\\\"] {    color: #fff;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    font-size: 20px;    cursor: pointer;    margin-right: 10px;}label[for=\\\"imageInput\\\"]:hover {    background-color: #363e47;}/* Styling for user messages */.userMessage {    display: flex;    align-items: flex-start;    margin-bottom: 10px;    padding: 10px;    background-color: #e9ecef;    border-radius: 8px;    border: 1px solid #ddd;    max-width: 100%;}/* Container for image and text */.messageContent {    display: flex;    flex-direction: column;    align-items: flex-start;}/* Styling for images within user messages */.userMessage img {    max-width: 100px;     max-height: 100px;     border-radius: 5px;    margin-bottom: 5px;     object-fit: cover;}/* Styling for text within user messages */.userMessage .text {    text-align: left;}/* Modal styling */.modal {    display: none;     position: fixed;     z-index: 1000;     left: 0;    top: 0;    width: 100%;    height: 100%;    overflow: auto;     background-color: rgb(0,0,0);     background-color: rgba(0,0,0,0.8); }.modal-content {    margin: auto;    display: block;    width: 80%;    max-width: 700px;}.close {    position: absolute;    top: 15px;    right: 35px;    color: #f1f1f1;    font-size: 40px;    font-weight: bold;}.close:hover,.close:focus {    color: #bbb;    text-decoration: none;    cursor: pointer;}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

\\\"Creating

4.2 Implementing Chatbot Functionalities

4.1 Frontend

\\\"建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});
const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

const modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

imageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

btnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to \\\"none\\\".

textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

4.2 Backend (server)

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);
const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};
const storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });
async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}
const app = express();const port = 3001;app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));
app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  try {    const { message } = req.body;    const imagePath = req.file ? req.file.path : null;    let generatedText = \\\"\\\";    if (imagePath) {      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      res.status(200).json({ reply: result.response.text() });      next(message);    }    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      const aiResponse = result.response.text();      res.status(200).json({ reply: aiResponse });    }    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});
app.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single(\\\"image\\\") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file\\'s URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};const app = express();const port = 3001;// Setup multer for file uploadsconst storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });/** * Uploads the given file to Gemini. * * See https://ai.google.dev/gemini-api/docs/prompting_with_media */async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));// API endpoint for generating a responseapp.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  // console.log(\\\"api/upload\\\");  try {    const { message } = req.body; // This is the accompanying text (if any)    const imagePath = req.file ? req.file.path : null; // This is the image (if any)    let generatedText = \\\"\\\";    console.log(\\\"after generatedText \\\"    generatedText);    // Check if an image is provided and send the image URL to the Gemini API    if (imagePath) {      console.log(imagePath);      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      console.log(files);      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.text() );      res.status(200).json({ reply: result.response.text() });      next(message);      // Extract the AI\\'s response related to the image      // const imageAIResponse = imageResult?.candidates?.[0]?.content || \\\"Could not analyze the image.\\\";      // console.log(\\\"after imageAIResponse\\\");      // generatedText  = `Image Analysis: ${imageAIResponse}`;      // console.log(\\\"after generatedText \\\"    generatedText);    }    // Handle the accompanying text and send it to the Gemini API    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.candidates);      console.log(result.response.text());      // Extract the AI\\'s generated response      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || \\\"Could not generate a response.\\\";      res.status(200).json({ reply: aiResponse });    }    // If neither image nor text is provided, return an error    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    // Send the generated text or description back to the frontend    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});// Start the serverapp.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

\\\"Creating

\\\"Creating

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application\\'s robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

\\\"Creating

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

","image":"http://www.luping.net/uploads/20241011/17286282126708c5f4d2936.png","datePublished":"2024-11-09T00:59:07+08:00","dateModified":"2024-11-09T00:59:07+08:00","author":{"@type":"Person","name":"luping.net","url":"https://www.luping.net/articlelist/0_1.html"}}
」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

發佈於2024-11-09
瀏覽:151

Table of Contents

  • Introduction
    • What is a Chatbot?
    • Understanding the Problem
    • Setting Up the Development Environment
  • Understanding the Problem
    • What Problems Does the Chatbot Solve?
    • What Should the Chatbot Do?
  • Setting Up the Development Environment
    • Tools and Technologies
    • Prerequisites
    • Setting Up the Environment
  • Implementing Chatbot Functionalities
    • Frontend
    • Backend (server)
  • Testing and Debugging
    • Text queries
    • Image queries
  • Problems faced and conclusions
    • Some difficulties Faced
    • Conclusion

1. Introduction

1.1 What is a Chatbot?

A chatbot is a type of software that mimics conversations with people. Most chatbots communicate through text, but some can also use voice. They use artificial intelligence (AI) to understand what users are asking and provide answers quickly. This makes chatbots useful for handling routine tasks and giving information efficiently.

The main job of a chatbot is to talk with users. It does this through a messaging platform, which can be as simple as answering straightforward questions or managing more complex conversations. By using natural language processing (NLP), chatbots can understand user questions and provide relevant responses, making interactions smoother and more effective.

1.2 Why build a chatbot?
Building an information chatbot helps people quickly find answers and details they need without waiting or searching for them manually. For example, if you’re looking for information on scholarships, a chatbot can instantly provide the details you need, saving you time and effort. It can handle many questions at once, is available 24/7, and can make finding the right information much easier for everyone.

2. Understanding the Problem

2.1 What Problems Does the Chatbot Solve?
A chatbot helps solve the problem of finding information by making it easier to get answers quickly. Instead of spending a lot of time searching online or waiting for help, users can ask the chatbot their questions and get instant responses. This means users don’t have to search through multiple websites or wait for office hours; the information is available anytime, making it more accessible and convenient for everyone.

2.2 What Should the Chatbot Do?

  • Display text queries on the UI
  • Display image queries on the UI
  • Provide text-based responses from text queries
  • Handle image-based queries
  • Preview image before and after sending to the UI

3. Setting Up the Development Environment

3.1 Tools and Technologies

  • HTML & CSS: Basic web design
  • JavaScript: Adding interactivity
  • API: Fetching information
  • Node.js & Express: Server-side handling

3.2 Prerequisites

  • Basic understanding of HTML, CSS, and JavaScript
  • A code editor (e.g., Visual Studio Code)
  • Web browser (for testing)

3.3 Setting up the environment

  1. Install Node.js and npm: Make sure you have Node.js installed on your system. If not, download and install it from Node.js official site. Verify the installation:
node -v
npm -v
  1. Create a folder and open it on your code editor

Creating an Information Bot: A Beginner

  1. Initialize a Node.js project This creates a package.json file for managing dependencies.
npm init -y

Creating an Information Bot: A Beginner

  1. Install Required Dependencies: You will need express, axios, dotenv, @google-ai/generativelanguage, @google/generative-ai/server, multer, body-parser, and cors for this setup:
npm install express axios dotenv cors @google-ai/generativelanguage @google/generative-ai/server multer body-parser 

  • express: This tool helps build a web server that listens for and responds to requests. For example, it manages everything from showing web pages to accepting images or text from users.

  • axios: This tool is used to make requests to other servers or APIs (like calling another website to get data). It sends and receives data over the internet, making it easy to connect your app to external services.

  • dotenv: This tool is used to store important secrets (like API keys or passwords) in a hidden file called .env. It helps keep sensitive information safe, so you don't accidentally share it with others.

  • @google-ai/generativelanguage: This package is used to connect with Google’s AI language services. It helps send user inputs (like text) to Google's AI and get back smart, AI-generated responses for your app to use.

  • @google/generative-ai/server: This tool works with Google's AI to handle files like images. It helps upload images to Google's AI for processing and analysis, and then receive useful insights or responses from the AI.

  • multer: This tool is used to handle file uploads, like when users send images or other files to your server. It saves these files in a specific folder so your server can use them.

  • body-parser: This tool allows the server to easily understand data (like text or form data) sent from the user’s browser. It helps grab that data and make it usable in the code.

  • cors: This tool allows your server to accept requests from different websites or apps. Normally, browsers block certain requests for safety, but cors enables you to safely handle requests from other sites.

  1. Creating the API Key
  2. What is an API Key? An API key is like a special password that lets programs talk to each other. It keeps things secure by making sure only allowed users can access a service.
  • Why use an API key?
    An API key is like a password for using a service or app. It keeps things secure by making sure only the right people can access certain features or data. This helps prevent misuse and keeps your information safe. It also helps the service provider see how much the service is being used, so they can manage it better.

  • Gemini API Key
    The Gemini API key is crucial for my chatbot project as it allows the bot to access advanced AI features. This key enables the chatbot to understand and generate responses based on user inputs and uploaded images. By using this API, I can enhance the chatbot's intelligence and provide a better experience for users seeking assistance.

  • How to create a Gemini API Key
    Go to Gemini AI Studio. If you don’t have an account, sign up for one.
    Scroll down to where you can see what is on the image below

Creating an Information Bot: A Beginner
Click on Get your API key. It will lead you to this page

Creating an Information Bot: A Beginner
Once on this page, click the blue button with the label Create API key. This will lead you to another page. On this page, you will either create an API key for a new project or an existing one.
I clicked on creating for a new project, since I am working on a new project.

Creating an Information Bot: A Beginner

Once your AP key is created, you can now copy it and use in your project.

Creating an Information Bot: A Beginner#

Remember the tip here, to use your API key securely.
You should keep your API key secret because it acts like a password for your application. If someone finds it, they could misuse it to access your data or services, leading to security issues or extra costs. Keeping it private helps protect your project and ensures it runs smoothly.

  1. Creating the Server and Hiding the API Key Create a server.js file: This will contain your backend code to handle incoming requests from the chatbot, make calls to the Gemini API, and respond with the generated messages. Create the .env File: In the root of your project, create an .env file to store the Gemini API key. This will be the final project structure

Creating an Information Bot: A Beginner

4. Building the Chatbot

4.1 Designing the Chatbot Interface

  • Creating the HTML structure


    
    
    
    
    ImageBot


    

ChatBot

建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title "ChatBot" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there's an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the "Send" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

  • Styling with CSS
body {
    background-color: #f5f5f5;
    font-family: 'Arial', sans-serif;
    margin: 0;
    padding: 0;
    display: flex;
    justify-content: center;
    align-items: center;
    height: 100vh;
}

/* ChatBot container */
.container.chatBot {
    background-color: #ffffff;
    width: 50%;
    max-width: 600px;
    border-radius: 8px;
    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);
    display: flex;
    flex-direction: column;
    justify-content: space-between;
    padding: 20px;
    position: relative;
}

/* Header styling */
.header {
    font-size: 24px;
    color: #333;
    text-align: center;
    margin-bottom: 15px;
}

/* Chat history styling */
.chatHistory {
    height: 300px;
    overflow-y: auto;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    margin-bottom: 20px;
}

.chatHistory::-webkit-scrollbar {
    width: 8px;
}

.chatHistory::-webkit-scrollbar-thumb {
    background-color: #ccc;
    border-radius: 4px;
}

/* Input session styling */
.inputSession {
    display: flex;
    align-items: center;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    justify-content: space-between;
}

/* Input field styling */
#textInput {
    width: 80%;
    padding: 10px;
    border: none;
    border-radius: 8px;
    background-color: #fff;
    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);
    margin-right: 10px;
    font-size: 16px;
    flex-grow: 1; 
    border: 1px solid #ddd;
    border-radius: 4px;
    padding: 8px;
    margin-right: 10px; 
    width: 100%;
}

/* Button for sending messages */
#btnSend {
    color: #fff;
    border: none;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    cursor: pointer;
    font-size: 20px;
    transition: background-color 0.3s;
}

#btnSend:hover {
    background-color: #363e47;
}

/* Image preview styling */
.imagePreview {
    display: flex;
    align-items: center;
    flex-grow: 1;
    margin-bottom: 10px;
}

#previewImage {
    max-width: 80px;
    max-height: 80px;
    border-radius: 5px;
    margin-right: 10px;
    object-fit: cover;

    margin-right: 10px;
}

/* Label for file input */
label[for="imageInput"] {
    color: #fff;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    font-size: 20px;
    cursor: pointer;
    margin-right: 10px;
}

label[for="imageInput"]:hover {
    background-color: #363e47;
}

/* Styling for user messages */
.userMessage {
    display: flex;
    align-items: flex-start;
    margin-bottom: 10px;
    padding: 10px;
    background-color: #e9ecef;
    border-radius: 8px;
    border: 1px solid #ddd;
    max-width: 100%;
}

/* Container for image and text */
.messageContent {
    display: flex;
    flex-direction: column;
    align-items: flex-start;
}

/* Styling for images within user messages */
.userMessage img {
    max-width: 100px; 
    max-height: 100px; 
    border-radius: 5px;
    margin-bottom: 5px; 
    object-fit: cover;
}

/* Styling for text within user messages */
.userMessage .text {
    text-align: left;
}

/* Modal styling */
.modal {
    display: none; 
    position: fixed; 
    z-index: 1000; 
    left: 0;
    top: 0;
    width: 100%;
    height: 100%;
    overflow: auto; 
    background-color: rgb(0,0,0); 
    background-color: rgba(0,0,0,0.8); 
}

.modal-content {
    margin: auto;
    display: block;
    width: 80%;
    max-width: 700px;
}

.close {
    position: absolute;
    top: 15px;
    right: 35px;
    color: #f1f1f1;
    font-size: 40px;
    font-weight: bold;
}

.close:hover,
.close:focus {
    color: #bbb;
    text-decoration: none;
    cursor: pointer;
}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

Creating an Information Bot: A Beginner

4.2 Implementing Chatbot Functionalities

4.1 Frontend

建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

  • Element Selection
const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

  • Modal Elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

  • Image Preview
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

  • Send Image and Text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }
});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

  • Add Message to Chat History
function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

  • Modal Handling
closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to "none".

  • Send Message on Enter Key Press
textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

4.2 Backend (server)

  • Importing dependencies
const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

  • Configuring Environment Variables
dotenv.config();
const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

  • dotenv.config(): Loads environment variables from the .env file.
  • apiKey: Retrieves the API key from environment variables to authenticate requests to the Google Generative AI API.
  • genAI: Initializes the GoogleGenerativeAI instance with the API key.
  • fileManager: Initializes the GoogleAIFileManager instance with the same API key for handling file uploads.

  • Setting Up AI Model and Configuration

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

  • model: Configures and initializes the generative model (Gemini 1.5 Pro) from Google AI, specifying which model to use for generating responses.
  • generationConfig: Defines parameters for generating responses, including temperature (controls creativity), topP and topK (control the diversity of responses), and maxOutputTokens (maximum length of the response).

  • Configuring Multer for File Uploads

const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

  • multer.diskStorage(): Configures how files are stored.
  • destination: Specifies the directory (uploads/) where files should be saved.
  • filename: Renames the file by appending the current timestamp to ensure uniqueness.
  • upload: Creates a Multer instance with the defined storage configuration.

  • Uploading Files to Gemini

async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

  • uploadToGemini(): A function to upload a file to the Google Gemini API.
  • fileManager.uploadFile(): Uploads the file to the API and logs the result.
  • file: Contains the details of the uploaded file returned from the API.

  • Configuring Express and Middleware

const app = express();
const port = 3001;

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

  • app: Initializes an Express application.
  • port: Sets the port on which the server will listen (3001).
  • app.use(cors()): Enables CORS for the server.
  • app.use(bodyParser.json()): Parses JSON bodies.
  • app.use(bodyParser.urlencoded({ extended: true })): Parses URL-- encoded bodies.
  • app.use(express.static("public")): Serves static files like HTML, CSS, and JS from the public directory.

  • API Endpoint for Handling Image and Text

app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  try {
    const { message } = req.body;
    const imagePath = req.file ? req.file.path : null;

    let generatedText = "";

    if (imagePath) {
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      res.status(200).json({ reply: result.response.text() });
      next(message);
    }

    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      const aiResponse = result.response.text();
      res.status(200).json({ reply: aiResponse });
    }

    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});

  • app.post("/api/upload"): Defines a POST endpoint for handling file and text uploads.
  • upload.single("image"): Middleware to handle single file upload (named image).
  • req.body: Contains the text message.
  • req.file: Contains the uploaded image file.
  • uploadToGemini(): Uploads the image to the Gemini API.
  • model.startChat(): Starts a chat session with the model.
  • chatSession.sendMessage(message): Sends the message (and image if provided) to the model.
  • res.status(200).json({ reply: result.response.text() }): Sends the generated response back to the client.
  • res.status(400): Handles cases where neither image nor text is provided.
  • res.status(500): Handles server errors.

  • Starting the Server

app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • app.listen(port): Starts the server and listens on the specified port (3001).
  • console.log: Confirms that the server is running and accessible at http://localhost:3001.

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post("/api/upload", upload.single("image"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single("image") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file's URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

dotenv.config();

const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

const app = express();
const port = 3001;

// Setup multer for file uploads
const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

/**
 * Uploads the given file to Gemini.
 *
 * See https://ai.google.dev/gemini-api/docs/prompting_with_media
 */
async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

// API endpoint for generating a response
app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  // console.log("api/upload");
  try {
    const { message } = req.body; // This is the accompanying text (if any)
    const imagePath = req.file ? req.file.path : null; // This is the image (if any)

    let generatedText = "";
    console.log("after generatedText "    generatedText);
    // Check if an image is provided and send the image URL to the Gemini API
    if (imagePath) {
      console.log(imagePath);
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];
      console.log(files);

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.text() );
      res.status(200).json({ reply: result.response.text() });
      next(message);

      // Extract the AI's response related to the image
      // const imageAIResponse = imageResult?.candidates?.[0]?.content || "Could not analyze the image.";
      // console.log("after imageAIResponse");
      // generatedText  = `Image Analysis: ${imageAIResponse}`;
      // console.log("after generatedText "    generatedText);
    }

    // Handle the accompanying text and send it to the Gemini API
    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.candidates);
      console.log(result.response.text());

      // Extract the AI's generated response
      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || "Could not generate a response.";
      res.status(200).json({ reply: aiResponse });
    }

    // If neither image nor text is provided, return an error
    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    // Send the generated text or description back to the frontend
    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});



// Start the server
app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • Removing markdown
    In the chat application, I used the Marked library to convert Markdown text into HTML for bot messages by including . When the bot sends a message, the code checks the class name and applies textContainer.innerHTML = marked.parse(text); to render Markdown as HTML. For user messages, I used textContainer.textContent = text; to display plain text, ensuring clarity in interactions.

  • Markdown Text

Creating an Information Bot: A Beginner

  • After markdown has been removed

Creating an Information Bot: A Beginner

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application's robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

Creating an Information Bot: A Beginner

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

  • Gemini API Integration: I developed skills in API integration, particularly using the Gemini API for generating responses based on inputs.

  • Problem-solving: I learned how to systematically debug and troubleshoot issues, improving my resilience in overcoming project obstacles.

  • Time Management: The delays caused by power outages and bugs helped me practice time management and adaptability under pressure.

  • Collaborating for Solutions: Reaching out for help when needed and learning from others was an important takeaway in this project.

  • Practical Experience: The hands-on experience with API and front-end integration improved my proficiency in JavaScript.

版本聲明 本文轉載於:https://dev.to/kedjuprecious/creating-an-information-bot-a-beginners-guide-htmlcss-javascript-gemini-api-260f?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • PHP SimpleXML解析帶命名空間冒號的XML方法
    PHP SimpleXML解析帶命名空間冒號的XML方法
    在php 很少,請使用該限制很大,很少有很高。例如:這種技術可確保可以通過遍歷XML樹和使用兒童()方法()方法的XML樹和切換名稱空間來訪問名稱空間內的元素。
    程式設計 發佈於2025-07-13
  • 如何在Java的全屏獨家模式下處理用戶輸入?
    如何在Java的全屏獨家模式下處理用戶輸入?
    Handling User Input in Full Screen Exclusive Mode in JavaIntroductionWhen running a Java application in full screen exclusive mode, the usual event ha...
    程式設計 發佈於2025-07-13
  • 如何將來自三個MySQL表的數據組合到新表中?
    如何將來自三個MySQL表的數據組合到新表中?
    mysql:從三個表和列的新表創建新表 答案:為了實現這一目標,您可以利用一個3-way Join。 選擇p。 *,d.content作為年齡 來自人為p的人 加入d.person_id = p.id上的d的詳細信息 加入T.Id = d.detail_id的分類法 其中t.taxonomy ...
    程式設計 發佈於2025-07-13
  • 人臉檢測失敗原因及解決方案:Error -215
    人臉檢測失敗原因及解決方案:Error -215
    錯誤處理:解決“ error:((-215)!empty()in Function Multultiscale中的“ openCV 要解決此問題,必須確保提供給HAAR CASCADE XML文件的路徑有效。在提供的代碼片段中,級聯分類器裝有硬編碼路徑,這可能對您的系統不准確。相反,OPENCV提...
    程式設計 發佈於2025-07-13
  • 使用jQuery如何有效修改":after"偽元素的CSS屬性?
    使用jQuery如何有效修改":after"偽元素的CSS屬性?
    在jquery中了解偽元素的限制:訪問“ selector 嘗試修改“:”選擇器的CSS屬性時,您可能會遇到困難。 This is because pseudo-elements are not part of the DOM (Document Object Model) and are th...
    程式設計 發佈於2025-07-13
  • 如何在其容器中為DIV創建平滑的左右CSS動畫?
    如何在其容器中為DIV創建平滑的左右CSS動畫?
    通用CSS動畫,用於左右運動 ,我們將探索創建一個通用的CSS動畫,以向左和右移動DIV,從而到達其容器的邊緣。該動畫可以應用於具有絕對定位的任何div,無論其未知長度如何。 問題:使用左直接導致瞬時消失 更加流暢的解決方案:混合轉換和左 [並實現平穩的,線性的運動,我們介紹了線性的轉換。...
    程式設計 發佈於2025-07-13
  • 為什麼我會收到MySQL錯誤#1089:錯誤的前綴密鑰?
    為什麼我會收到MySQL錯誤#1089:錯誤的前綴密鑰?
    mySQL錯誤#1089:錯誤的前綴鍵錯誤descript [#1089-不正確的前綴鍵在嘗試在表中創建一個prefix鍵時會出現。前綴鍵旨在索引字符串列的特定前綴長度長度,可以更快地搜索這些前綴。 了解prefix keys `這將在整個Movie_ID列上創建標準主鍵。主密鑰對於唯一識...
    程式設計 發佈於2025-07-13
  • 如何使用不同數量列的聯合數據庫表?
    如何使用不同數量列的聯合數據庫表?
    合併列數不同的表 當嘗試合併列數不同的數據庫表時,可能會遇到挑戰。一種直接的方法是在列數較少的表中,為缺失的列追加空值。 例如,考慮兩個表,表 A 和表 B,其中表 A 的列數多於表 B。為了合併這些表,同時處理表 B 中缺失的列,請按照以下步驟操作: 確定表 B 中缺失的列,並將它們添加到表的...
    程式設計 發佈於2025-07-13
  • 您如何在Laravel Blade模板中定義變量?
    您如何在Laravel Blade模板中定義變量?
    在Laravel Blade模板中使用Elegance 在blade模板中如何分配變量對於存儲以後使用的數據至關重要。在使用“ {{}}”分配變量的同時,它可能並不總是最優雅的解決方案。 幸運的是,Blade通過@php Directive提供了更優雅的方法: $ old_section =...
    程式設計 發佈於2025-07-13
  • 大批
    大批
    [2 數組是對象,因此它們在JS中也具有方法。 切片(開始):在新數組中提取部分數組,而無需突變原始數組。 令ARR = ['a','b','c','d','e']; // USECASE:提取直到索引作...
    程式設計 發佈於2025-07-13
  • 如何克服PHP的功能重新定義限制?
    如何克服PHP的功能重新定義限制?
    克服PHP的函數重新定義限制在PHP中,多次定義一個相同名稱的函數是一個no-no。嘗試這樣做,如提供的代碼段所示,將導致可怕的“不能重新列出”錯誤。 但是,PHP工具腰帶中有一個隱藏的寶石:runkit擴展。它使您能夠靈活地重新定義函數。 runkit_function_renction_...
    程式設計 發佈於2025-07-13
  • 如何使用node-mysql在單個查詢中執行多個SQL語句?
    如何使用node-mysql在單個查詢中執行多個SQL語句?
    Multi-Statement Query Support in Node-MySQLIn Node.js, the question arises when executing multiple SQL statements in a single query using the node-mys...
    程式設計 發佈於2025-07-13
  • CSS可以根據任何屬性值來定位HTML元素嗎?
    CSS可以根據任何屬性值來定位HTML元素嗎?
    靶向html元素,在CSS 中使用任何屬性值,在CSS中,可以基於特定屬性(如下所示)基於特定屬性的基於特定屬性的emants目標元素: 字體家庭:康斯拉斯(Consolas); } 但是,出現一個常見的問題:元素可以根據任何屬性值而定位嗎?本文探討了此主題。 的目標元素有任何任何屬性值,...
    程式設計 發佈於2025-07-13
  • Python高效去除文本中HTML標籤方法
    Python高效去除文本中HTML標籤方法
    在Python中剝離HTML標籤,以獲取原始的文本表示 僅通過Python的MlStripper 來簡化剝離過程,Python Standard庫提供了一個專門的功能,MLSTREPERE,MLSTREPERIPLE,MLSTREPERE,MLSTREPERIPE,MLSTREPERCE,MLST...
    程式設計 發佈於2025-07-13
  • Java的Map.Entry和SimpleEntry如何簡化鍵值對管理?
    Java的Map.Entry和SimpleEntry如何簡化鍵值對管理?
    A Comprehensive Collection for Value Pairs: Introducing Java's Map.Entry and SimpleEntryIn Java, when defining a collection where each element com...
    程式設計 發佈於2025-07-13

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3