This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title \\\"ChatBot\\\" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there\\'s an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the \\\"Send\\\" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

body {    background-color: #f5f5f5;    font-family: \\'Arial\\', sans-serif;    margin: 0;    padding: 0;    display: flex;    justify-content: center;    align-items: center;    height: 100vh;}/* ChatBot container */.container.chatBot {    background-color: #ffffff;    width: 50%;    max-width: 600px;    border-radius: 8px;    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);    display: flex;    flex-direction: column;    justify-content: space-between;    padding: 20px;    position: relative;}/* Header styling */.header {    font-size: 24px;    color: #333;    text-align: center;    margin-bottom: 15px;}/* Chat history styling */.chatHistory {    height: 300px;    overflow-y: auto;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    margin-bottom: 20px;}.chatHistory::-webkit-scrollbar {    width: 8px;}.chatHistory::-webkit-scrollbar-thumb {    background-color: #ccc;    border-radius: 4px;}/* Input session styling */.inputSession {    display: flex;    align-items: center;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    justify-content: space-between;}/* Input field styling */#textInput {    width: 80%;    padding: 10px;    border: none;    border-radius: 8px;    background-color: #fff;    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);    margin-right: 10px;    font-size: 16px;    flex-grow: 1;     border: 1px solid #ddd;    border-radius: 4px;    padding: 8px;    margin-right: 10px;     width: 100%;}/* Button for sending messages */#btnSend {    color: #fff;    border: none;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    cursor: pointer;    font-size: 20px;    transition: background-color 0.3s;}#btnSend:hover {    background-color: #363e47;}/* Image preview styling */.imagePreview {    display: flex;    align-items: center;    flex-grow: 1;    margin-bottom: 10px;}#previewImage {    max-width: 80px;    max-height: 80px;    border-radius: 5px;    margin-right: 10px;    object-fit: cover;    margin-right: 10px;}/* Label for file input */label[for=\\\"imageInput\\\"] {    color: #fff;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    font-size: 20px;    cursor: pointer;    margin-right: 10px;}label[for=\\\"imageInput\\\"]:hover {    background-color: #363e47;}/* Styling for user messages */.userMessage {    display: flex;    align-items: flex-start;    margin-bottom: 10px;    padding: 10px;    background-color: #e9ecef;    border-radius: 8px;    border: 1px solid #ddd;    max-width: 100%;}/* Container for image and text */.messageContent {    display: flex;    flex-direction: column;    align-items: flex-start;}/* Styling for images within user messages */.userMessage img {    max-width: 100px;     max-height: 100px;     border-radius: 5px;    margin-bottom: 5px;     object-fit: cover;}/* Styling for text within user messages */.userMessage .text {    text-align: left;}/* Modal styling */.modal {    display: none;     position: fixed;     z-index: 1000;     left: 0;    top: 0;    width: 100%;    height: 100%;    overflow: auto;     background-color: rgb(0,0,0);     background-color: rgba(0,0,0,0.8); }.modal-content {    margin: auto;    display: block;    width: 80%;    max-width: 700px;}.close {    position: absolute;    top: 15px;    right: 35px;    color: #f1f1f1;    font-size: 40px;    font-weight: bold;}.close:hover,.close:focus {    color: #bbb;    text-decoration: none;    cursor: pointer;}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

\\\"Creating

4.2 Implementing Chatbot Functionalities

4.1 Frontend

\\\"建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});
const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

const modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

imageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

btnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to \\\"none\\\".

textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

4.2 Backend (server)

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);
const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};
const storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });
async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}
const app = express();const port = 3001;app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));
app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  try {    const { message } = req.body;    const imagePath = req.file ? req.file.path : null;    let generatedText = \\\"\\\";    if (imagePath) {      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      res.status(200).json({ reply: result.response.text() });      next(message);    }    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      const aiResponse = result.response.text();      res.status(200).json({ reply: aiResponse });    }    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});
app.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single(\\\"image\\\") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file\\'s URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};const app = express();const port = 3001;// Setup multer for file uploadsconst storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });/** * Uploads the given file to Gemini. * * See https://ai.google.dev/gemini-api/docs/prompting_with_media */async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));// API endpoint for generating a responseapp.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  // console.log(\\\"api/upload\\\");  try {    const { message } = req.body; // This is the accompanying text (if any)    const imagePath = req.file ? req.file.path : null; // This is the image (if any)    let generatedText = \\\"\\\";    console.log(\\\"after generatedText \\\"    generatedText);    // Check if an image is provided and send the image URL to the Gemini API    if (imagePath) {      console.log(imagePath);      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      console.log(files);      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.text() );      res.status(200).json({ reply: result.response.text() });      next(message);      // Extract the AI\\'s response related to the image      // const imageAIResponse = imageResult?.candidates?.[0]?.content || \\\"Could not analyze the image.\\\";      // console.log(\\\"after imageAIResponse\\\");      // generatedText  = `Image Analysis: ${imageAIResponse}`;      // console.log(\\\"after generatedText \\\"    generatedText);    }    // Handle the accompanying text and send it to the Gemini API    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.candidates);      console.log(result.response.text());      // Extract the AI\\'s generated response      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || \\\"Could not generate a response.\\\";      res.status(200).json({ reply: aiResponse });    }    // If neither image nor text is provided, return an error    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    // Send the generated text or description back to the frontend    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});// Start the serverapp.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

\\\"Creating

\\\"Creating

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application\\'s robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

\\\"Creating

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

","image":"http://www.luping.net/uploads/20241011/17286282126708c5f4d2936.png","datePublished":"2024-11-09T00:59:07+08:00","dateModified":"2024-11-09T00:59:07+08:00","author":{"@type":"Person","name":"luping.net","url":"https://www.luping.net/articlelist/0_1.html"}}
」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

發佈於2024-11-09
瀏覽:793

Table of Contents

  • Introduction
    • What is a Chatbot?
    • Understanding the Problem
    • Setting Up the Development Environment
  • Understanding the Problem
    • What Problems Does the Chatbot Solve?
    • What Should the Chatbot Do?
  • Setting Up the Development Environment
    • Tools and Technologies
    • Prerequisites
    • Setting Up the Environment
  • Implementing Chatbot Functionalities
    • Frontend
    • Backend (server)
  • Testing and Debugging
    • Text queries
    • Image queries
  • Problems faced and conclusions
    • Some difficulties Faced
    • Conclusion

1. Introduction

1.1 What is a Chatbot?

A chatbot is a type of software that mimics conversations with people. Most chatbots communicate through text, but some can also use voice. They use artificial intelligence (AI) to understand what users are asking and provide answers quickly. This makes chatbots useful for handling routine tasks and giving information efficiently.

The main job of a chatbot is to talk with users. It does this through a messaging platform, which can be as simple as answering straightforward questions or managing more complex conversations. By using natural language processing (NLP), chatbots can understand user questions and provide relevant responses, making interactions smoother and more effective.

1.2 Why build a chatbot?
Building an information chatbot helps people quickly find answers and details they need without waiting or searching for them manually. For example, if you’re looking for information on scholarships, a chatbot can instantly provide the details you need, saving you time and effort. It can handle many questions at once, is available 24/7, and can make finding the right information much easier for everyone.

2. Understanding the Problem

2.1 What Problems Does the Chatbot Solve?
A chatbot helps solve the problem of finding information by making it easier to get answers quickly. Instead of spending a lot of time searching online or waiting for help, users can ask the chatbot their questions and get instant responses. This means users don’t have to search through multiple websites or wait for office hours; the information is available anytime, making it more accessible and convenient for everyone.

2.2 What Should the Chatbot Do?

  • Display text queries on the UI
  • Display image queries on the UI
  • Provide text-based responses from text queries
  • Handle image-based queries
  • Preview image before and after sending to the UI

3. Setting Up the Development Environment

3.1 Tools and Technologies

  • HTML & CSS: Basic web design
  • JavaScript: Adding interactivity
  • API: Fetching information
  • Node.js & Express: Server-side handling

3.2 Prerequisites

  • Basic understanding of HTML, CSS, and JavaScript
  • A code editor (e.g., Visual Studio Code)
  • Web browser (for testing)

3.3 Setting up the environment

  1. Install Node.js and npm: Make sure you have Node.js installed on your system. If not, download and install it from Node.js official site. Verify the installation:
node -v
npm -v
  1. Create a folder and open it on your code editor

Creating an Information Bot: A Beginner

  1. Initialize a Node.js project This creates a package.json file for managing dependencies.
npm init -y

Creating an Information Bot: A Beginner

  1. Install Required Dependencies: You will need express, axios, dotenv, @google-ai/generativelanguage, @google/generative-ai/server, multer, body-parser, and cors for this setup:
npm install express axios dotenv cors @google-ai/generativelanguage @google/generative-ai/server multer body-parser 

  • express: This tool helps build a web server that listens for and responds to requests. For example, it manages everything from showing web pages to accepting images or text from users.

  • axios: This tool is used to make requests to other servers or APIs (like calling another website to get data). It sends and receives data over the internet, making it easy to connect your app to external services.

  • dotenv: This tool is used to store important secrets (like API keys or passwords) in a hidden file called .env. It helps keep sensitive information safe, so you don't accidentally share it with others.

  • @google-ai/generativelanguage: This package is used to connect with Google’s AI language services. It helps send user inputs (like text) to Google's AI and get back smart, AI-generated responses for your app to use.

  • @google/generative-ai/server: This tool works with Google's AI to handle files like images. It helps upload images to Google's AI for processing and analysis, and then receive useful insights or responses from the AI.

  • multer: This tool is used to handle file uploads, like when users send images or other files to your server. It saves these files in a specific folder so your server can use them.

  • body-parser: This tool allows the server to easily understand data (like text or form data) sent from the user’s browser. It helps grab that data and make it usable in the code.

  • cors: This tool allows your server to accept requests from different websites or apps. Normally, browsers block certain requests for safety, but cors enables you to safely handle requests from other sites.

  1. Creating the API Key
  2. What is an API Key? An API key is like a special password that lets programs talk to each other. It keeps things secure by making sure only allowed users can access a service.
  • Why use an API key?
    An API key is like a password for using a service or app. It keeps things secure by making sure only the right people can access certain features or data. This helps prevent misuse and keeps your information safe. It also helps the service provider see how much the service is being used, so they can manage it better.

  • Gemini API Key
    The Gemini API key is crucial for my chatbot project as it allows the bot to access advanced AI features. This key enables the chatbot to understand and generate responses based on user inputs and uploaded images. By using this API, I can enhance the chatbot's intelligence and provide a better experience for users seeking assistance.

  • How to create a Gemini API Key
    Go to Gemini AI Studio. If you don’t have an account, sign up for one.
    Scroll down to where you can see what is on the image below

Creating an Information Bot: A Beginner
Click on Get your API key. It will lead you to this page

Creating an Information Bot: A Beginner
Once on this page, click the blue button with the label Create API key. This will lead you to another page. On this page, you will either create an API key for a new project or an existing one.
I clicked on creating for a new project, since I am working on a new project.

Creating an Information Bot: A Beginner

Once your AP key is created, you can now copy it and use in your project.

Creating an Information Bot: A Beginner#

Remember the tip here, to use your API key securely.
You should keep your API key secret because it acts like a password for your application. If someone finds it, they could misuse it to access your data or services, leading to security issues or extra costs. Keeping it private helps protect your project and ensures it runs smoothly.

  1. Creating the Server and Hiding the API Key Create a server.js file: This will contain your backend code to handle incoming requests from the chatbot, make calls to the Gemini API, and respond with the generated messages. Create the .env File: In the root of your project, create an .env file to store the Gemini API key. This will be the final project structure

Creating an Information Bot: A Beginner

4. Building the Chatbot

4.1 Designing the Chatbot Interface

  • Creating the HTML structure


    
    
    
    
    ImageBot


    

ChatBot

建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title "ChatBot" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there's an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the "Send" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

  • Styling with CSS
body {
    background-color: #f5f5f5;
    font-family: 'Arial', sans-serif;
    margin: 0;
    padding: 0;
    display: flex;
    justify-content: center;
    align-items: center;
    height: 100vh;
}

/* ChatBot container */
.container.chatBot {
    background-color: #ffffff;
    width: 50%;
    max-width: 600px;
    border-radius: 8px;
    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);
    display: flex;
    flex-direction: column;
    justify-content: space-between;
    padding: 20px;
    position: relative;
}

/* Header styling */
.header {
    font-size: 24px;
    color: #333;
    text-align: center;
    margin-bottom: 15px;
}

/* Chat history styling */
.chatHistory {
    height: 300px;
    overflow-y: auto;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    margin-bottom: 20px;
}

.chatHistory::-webkit-scrollbar {
    width: 8px;
}

.chatHistory::-webkit-scrollbar-thumb {
    background-color: #ccc;
    border-radius: 4px;
}

/* Input session styling */
.inputSession {
    display: flex;
    align-items: center;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    justify-content: space-between;
}

/* Input field styling */
#textInput {
    width: 80%;
    padding: 10px;
    border: none;
    border-radius: 8px;
    background-color: #fff;
    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);
    margin-right: 10px;
    font-size: 16px;
    flex-grow: 1; 
    border: 1px solid #ddd;
    border-radius: 4px;
    padding: 8px;
    margin-right: 10px; 
    width: 100%;
}

/* Button for sending messages */
#btnSend {
    color: #fff;
    border: none;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    cursor: pointer;
    font-size: 20px;
    transition: background-color 0.3s;
}

#btnSend:hover {
    background-color: #363e47;
}

/* Image preview styling */
.imagePreview {
    display: flex;
    align-items: center;
    flex-grow: 1;
    margin-bottom: 10px;
}

#previewImage {
    max-width: 80px;
    max-height: 80px;
    border-radius: 5px;
    margin-right: 10px;
    object-fit: cover;

    margin-right: 10px;
}

/* Label for file input */
label[for="imageInput"] {
    color: #fff;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    font-size: 20px;
    cursor: pointer;
    margin-right: 10px;
}

label[for="imageInput"]:hover {
    background-color: #363e47;
}

/* Styling for user messages */
.userMessage {
    display: flex;
    align-items: flex-start;
    margin-bottom: 10px;
    padding: 10px;
    background-color: #e9ecef;
    border-radius: 8px;
    border: 1px solid #ddd;
    max-width: 100%;
}

/* Container for image and text */
.messageContent {
    display: flex;
    flex-direction: column;
    align-items: flex-start;
}

/* Styling for images within user messages */
.userMessage img {
    max-width: 100px; 
    max-height: 100px; 
    border-radius: 5px;
    margin-bottom: 5px; 
    object-fit: cover;
}

/* Styling for text within user messages */
.userMessage .text {
    text-align: left;
}

/* Modal styling */
.modal {
    display: none; 
    position: fixed; 
    z-index: 1000; 
    left: 0;
    top: 0;
    width: 100%;
    height: 100%;
    overflow: auto; 
    background-color: rgb(0,0,0); 
    background-color: rgba(0,0,0,0.8); 
}

.modal-content {
    margin: auto;
    display: block;
    width: 80%;
    max-width: 700px;
}

.close {
    position: absolute;
    top: 15px;
    right: 35px;
    color: #f1f1f1;
    font-size: 40px;
    font-weight: bold;
}

.close:hover,
.close:focus {
    color: #bbb;
    text-decoration: none;
    cursor: pointer;
}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

Creating an Information Bot: A Beginner

4.2 Implementing Chatbot Functionalities

4.1 Frontend

建立資訊機器人:初學者指南(HTML/CSS、JavaScript、Gemini API)

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

  • Element Selection
const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

  • Modal Elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

  • Image Preview
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

  • Send Image and Text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }
});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

  • Add Message to Chat History
function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

  • Modal Handling
closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to "none".

  • Send Message on Enter Key Press
textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

4.2 Backend (server)

  • Importing dependencies
const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

  • Configuring Environment Variables
dotenv.config();
const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

  • dotenv.config(): Loads environment variables from the .env file.
  • apiKey: Retrieves the API key from environment variables to authenticate requests to the Google Generative AI API.
  • genAI: Initializes the GoogleGenerativeAI instance with the API key.
  • fileManager: Initializes the GoogleAIFileManager instance with the same API key for handling file uploads.

  • Setting Up AI Model and Configuration

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

  • model: Configures and initializes the generative model (Gemini 1.5 Pro) from Google AI, specifying which model to use for generating responses.
  • generationConfig: Defines parameters for generating responses, including temperature (controls creativity), topP and topK (control the diversity of responses), and maxOutputTokens (maximum length of the response).

  • Configuring Multer for File Uploads

const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

  • multer.diskStorage(): Configures how files are stored.
  • destination: Specifies the directory (uploads/) where files should be saved.
  • filename: Renames the file by appending the current timestamp to ensure uniqueness.
  • upload: Creates a Multer instance with the defined storage configuration.

  • Uploading Files to Gemini

async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

  • uploadToGemini(): A function to upload a file to the Google Gemini API.
  • fileManager.uploadFile(): Uploads the file to the API and logs the result.
  • file: Contains the details of the uploaded file returned from the API.

  • Configuring Express and Middleware

const app = express();
const port = 3001;

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

  • app: Initializes an Express application.
  • port: Sets the port on which the server will listen (3001).
  • app.use(cors()): Enables CORS for the server.
  • app.use(bodyParser.json()): Parses JSON bodies.
  • app.use(bodyParser.urlencoded({ extended: true })): Parses URL-- encoded bodies.
  • app.use(express.static("public")): Serves static files like HTML, CSS, and JS from the public directory.

  • API Endpoint for Handling Image and Text

app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  try {
    const { message } = req.body;
    const imagePath = req.file ? req.file.path : null;

    let generatedText = "";

    if (imagePath) {
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      res.status(200).json({ reply: result.response.text() });
      next(message);
    }

    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      const aiResponse = result.response.text();
      res.status(200).json({ reply: aiResponse });
    }

    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});

  • app.post("/api/upload"): Defines a POST endpoint for handling file and text uploads.
  • upload.single("image"): Middleware to handle single file upload (named image).
  • req.body: Contains the text message.
  • req.file: Contains the uploaded image file.
  • uploadToGemini(): Uploads the image to the Gemini API.
  • model.startChat(): Starts a chat session with the model.
  • chatSession.sendMessage(message): Sends the message (and image if provided) to the model.
  • res.status(200).json({ reply: result.response.text() }): Sends the generated response back to the client.
  • res.status(400): Handles cases where neither image nor text is provided.
  • res.status(500): Handles server errors.

  • Starting the Server

app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • app.listen(port): Starts the server and listens on the specified port (3001).
  • console.log: Confirms that the server is running and accessible at http://localhost:3001.

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post("/api/upload", upload.single("image"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single("image") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file's URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

dotenv.config();

const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

const app = express();
const port = 3001;

// Setup multer for file uploads
const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

/**
 * Uploads the given file to Gemini.
 *
 * See https://ai.google.dev/gemini-api/docs/prompting_with_media
 */
async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

// API endpoint for generating a response
app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  // console.log("api/upload");
  try {
    const { message } = req.body; // This is the accompanying text (if any)
    const imagePath = req.file ? req.file.path : null; // This is the image (if any)

    let generatedText = "";
    console.log("after generatedText "    generatedText);
    // Check if an image is provided and send the image URL to the Gemini API
    if (imagePath) {
      console.log(imagePath);
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];
      console.log(files);

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.text() );
      res.status(200).json({ reply: result.response.text() });
      next(message);

      // Extract the AI's response related to the image
      // const imageAIResponse = imageResult?.candidates?.[0]?.content || "Could not analyze the image.";
      // console.log("after imageAIResponse");
      // generatedText  = `Image Analysis: ${imageAIResponse}`;
      // console.log("after generatedText "    generatedText);
    }

    // Handle the accompanying text and send it to the Gemini API
    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.candidates);
      console.log(result.response.text());

      // Extract the AI's generated response
      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || "Could not generate a response.";
      res.status(200).json({ reply: aiResponse });
    }

    // If neither image nor text is provided, return an error
    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    // Send the generated text or description back to the frontend
    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});



// Start the server
app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • Removing markdown
    In the chat application, I used the Marked library to convert Markdown text into HTML for bot messages by including . When the bot sends a message, the code checks the class name and applies textContainer.innerHTML = marked.parse(text); to render Markdown as HTML. For user messages, I used textContainer.textContent = text; to display plain text, ensuring clarity in interactions.

  • Markdown Text

Creating an Information Bot: A Beginner

  • After markdown has been removed

Creating an Information Bot: A Beginner

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application's robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

Creating an Information Bot: A Beginner

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

  • Gemini API Integration: I developed skills in API integration, particularly using the Gemini API for generating responses based on inputs.

  • Problem-solving: I learned how to systematically debug and troubleshoot issues, improving my resilience in overcoming project obstacles.

  • Time Management: The delays caused by power outages and bugs helped me practice time management and adaptability under pressure.

  • Collaborating for Solutions: Reaching out for help when needed and learning from others was an important takeaway in this project.

  • Practical Experience: The hands-on experience with API and front-end integration improved my proficiency in JavaScript.

版本聲明 本文轉載於:https://dev.to/kedjuprecious/creating-an-information-bot-a-beginners-guide-htmlcss-javascript-gemini-api-260f?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 如何使用“ JSON”軟件包解析JSON陣列?
    如何使用“ JSON”軟件包解析JSON陣列?
    parsing JSON與JSON軟件包 QUALDALS:考慮以下go代碼:字符串 } func main(){ datajson:=`[“ 1”,“ 2”,“ 3”]`` arr:= jsontype {} 摘要:= = json.unmarshal([] byte(...
    程式設計 發佈於2025-04-22
  • 如何在JavaScript對像中動態設置鍵?
    如何在JavaScript對像中動態設置鍵?
    在嘗試為JavaScript對象創建動態鍵時,如何使用此Syntax jsObj['key' i] = 'example' 1;不工作。正確的方法採用方括號: jsobj ['key''i] ='example'1; 在JavaScript中,數組是一...
    程式設計 發佈於2025-04-22
  • 解決MySQL錯誤1153:數據包超出'max_allowed_packet'限制
    解決MySQL錯誤1153:數據包超出'max_allowed_packet'限制
    mysql錯誤1153:故障排除比“ max_allowed_pa​​cket” bytes 更大的數據包,用於面對陰謀mysql錯誤1153,同時導入數據capase doft a Database dust?讓我們深入研究罪魁禍首並探索解決方案以糾正此問題。 理解錯誤此錯誤表明在導入過程中...
    程式設計 發佈於2025-04-22
  • 為什麼PHP的DateTime :: Modify('+1個月')會產生意外的結果?
    為什麼PHP的DateTime :: Modify('+1個月')會產生意外的結果?
    使用php dateTime修改月份:發現預期的行為在使用PHP的DateTime類時,添加或減去幾個月可能並不總是會產生預期的結果。正如文檔所警告的那樣,“當心”這些操作的“不像看起來那樣直觀。 考慮文檔中給出的示例:這是內部發生的事情: 現在在3月3日添加另一個月,因為2月在2001年只有2...
    程式設計 發佈於2025-04-22
  • 哪種方法更有效地用於點 - 填點檢測:射線跟踪或matplotlib \的路徑contains_points?
    哪種方法更有效地用於點 - 填點檢測:射線跟踪或matplotlib \的路徑contains_points?
    在Python Matplotlib's path.contains_points FunctionMatplotlib's path.contains_points function employs a path object to represent the polygon.它...
    程式設計 發佈於2025-04-22
  • 如何使用不同數量列的聯合數據庫表?
    如何使用不同數量列的聯合數據庫表?
    合併列數不同的表 當嘗試合併列數不同的數據庫表時,可能會遇到挑戰。一種直接的方法是在列數較少的表中,為缺失的列追加空值。 例如,考慮兩個表,表 A 和表 B,其中表 A 的列數多於表 B。為了合併這些表,同時處理表 B 中缺失的列,請按照以下步驟操作: 確定表 B 中缺失的列,並將它們添加到表的...
    程式設計 發佈於2025-04-22
  • 左連接為何在右表WHERE子句過濾時像內連接?
    左連接為何在右表WHERE子句過濾時像內連接?
    左JOIN CONUNDRUM:WITCHING小時在數據庫Wizard的領域中變成內在的加入很有趣,當將c.foobar條件放置在上面的Where子句中時,據說左聯接似乎會轉換為內部連接。僅當滿足A.Foo和C.Foobar標準時,才會返回結果。 為什麼要變形?關鍵在於其中的子句。當左聯接的右側...
    程式設計 發佈於2025-04-22
  • 如何在Chrome中居中選擇框文本?
    如何在Chrome中居中選擇框文本?
    選擇框的文本對齊:局部chrome-inly-ly-ly-lyly solument 您可能希望將文本中心集中在選擇框中,以獲取優化的原因或提高可訪問性。但是,在CSS中的選擇元素中手動添加一個文本 - 對屬性可能無法正常工作。 初始嘗試 state)</option> < o...
    程式設計 發佈於2025-04-22
  • Java數組中元素位置查找技巧
    Java數組中元素位置查找技巧
    在Java數組中檢索元素的位置 利用Java的反射API將數組轉換為列表中,允許您使用indexof方法。 (primitives)(鏈接到Mishax的解決方案) 用於排序陣列的數組此方法此方法返回元素的索引,如果發現了元素的索引,或一個負值,指示應放置元素的插入點。
    程式設計 發佈於2025-04-22
  • 為什麼我會收到MySQL錯誤#1089:錯誤的前綴密鑰?
    為什麼我會收到MySQL錯誤#1089:錯誤的前綴密鑰?
    mySQL錯誤#1089:錯誤的前綴鍵錯誤descript [#1089-不正確的前綴鍵在嘗試在表中創建一個prefix鍵時會出現。前綴鍵旨在索引字符串列的特定前綴長度長度,可以更快地搜索這些前綴。 了解prefix keys `這將在整個Movie_ID列上創建標準主鍵。主密鑰對於唯一識...
    程式設計 發佈於2025-04-22
  • Java的Map.Entry和SimpleEntry如何簡化鍵值對管理?
    Java的Map.Entry和SimpleEntry如何簡化鍵值對管理?
    A Comprehensive Collection for Value Pairs: Introducing Java's Map.Entry and SimpleEntryIn Java, when defining a collection where each element com...
    程式設計 發佈於2025-04-22
  • Java中如何使用觀察者模式實現自定義事件?
    Java中如何使用觀察者模式實現自定義事件?
    在Java 中創建自定義事件的自定義事件在許多編程場景中都是無關緊要的,使組件能夠基於特定的觸發器相互通信。本文旨在解決以下內容:問題語句我們如何在Java中實現自定義事件以促進基於特定事件的對象之間的交互,定義了管理訂閱者的類界面。 以下代碼片段演示瞭如何使用觀察者模式創建自定義事件: args...
    程式設計 發佈於2025-04-22
  • 如何從Python中的字符串中刪除表情符號:固定常見錯誤的初學者指南?
    如何從Python中的字符串中刪除表情符號:固定常見錯誤的初學者指南?
    從python import codecs import codecs import codecs 導入 text = codecs.decode('這狗\ u0001f602'.encode('utf-8'),'utf-8') 印刷(文字)#帶有...
    程式設計 發佈於2025-04-22
  • PHP陣列鍵值異常:了解07和08的好奇情況
    PHP陣列鍵值異常:了解07和08的好奇情況
    PHP數組鍵值問題,使用07&08 在給定數月的數組中,鍵值07和08呈現令人困惑的行為時,就會出現一個不尋常的問題。運行print_r($月份)返回意外結果:鍵“ 07”丟失,而鍵“ 08”分配給了9月的值。 此問題源於PHP對領先零的解釋。當一個數字帶有0(例如07或08)的前綴時,PHP...
    程式設計 發佈於2025-04-22
  • 如何使用Python理解有效地創建字典?
    如何使用Python理解有效地創建字典?
    在python中,詞典綜合提供了一種生成新詞典的簡潔方法。儘管它們與列表綜合相似,但存在一些顯著差異。 與問題所暗示的不同,您無法為鑰匙創建字典理解。您必須明確指定鍵和值。 For example:d = {n: n**2 for n in range(5)}This creates a dict...
    程式設計 發佈於2025-04-22

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3