This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title \\\"ChatBot\\\" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there\\'s an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the \\\"Send\\\" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

body {    background-color: #f5f5f5;    font-family: \\'Arial\\', sans-serif;    margin: 0;    padding: 0;    display: flex;    justify-content: center;    align-items: center;    height: 100vh;}/* ChatBot container */.container.chatBot {    background-color: #ffffff;    width: 50%;    max-width: 600px;    border-radius: 8px;    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);    display: flex;    flex-direction: column;    justify-content: space-between;    padding: 20px;    position: relative;}/* Header styling */.header {    font-size: 24px;    color: #333;    text-align: center;    margin-bottom: 15px;}/* Chat history styling */.chatHistory {    height: 300px;    overflow-y: auto;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    margin-bottom: 20px;}.chatHistory::-webkit-scrollbar {    width: 8px;}.chatHistory::-webkit-scrollbar-thumb {    background-color: #ccc;    border-radius: 4px;}/* Input session styling */.inputSession {    display: flex;    align-items: center;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    justify-content: space-between;}/* Input field styling */#textInput {    width: 80%;    padding: 10px;    border: none;    border-radius: 8px;    background-color: #fff;    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);    margin-right: 10px;    font-size: 16px;    flex-grow: 1;     border: 1px solid #ddd;    border-radius: 4px;    padding: 8px;    margin-right: 10px;     width: 100%;}/* Button for sending messages */#btnSend {    color: #fff;    border: none;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    cursor: pointer;    font-size: 20px;    transition: background-color 0.3s;}#btnSend:hover {    background-color: #363e47;}/* Image preview styling */.imagePreview {    display: flex;    align-items: center;    flex-grow: 1;    margin-bottom: 10px;}#previewImage {    max-width: 80px;    max-height: 80px;    border-radius: 5px;    margin-right: 10px;    object-fit: cover;    margin-right: 10px;}/* Label for file input */label[for=\\\"imageInput\\\"] {    color: #fff;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    font-size: 20px;    cursor: pointer;    margin-right: 10px;}label[for=\\\"imageInput\\\"]:hover {    background-color: #363e47;}/* Styling for user messages */.userMessage {    display: flex;    align-items: flex-start;    margin-bottom: 10px;    padding: 10px;    background-color: #e9ecef;    border-radius: 8px;    border: 1px solid #ddd;    max-width: 100%;}/* Container for image and text */.messageContent {    display: flex;    flex-direction: column;    align-items: flex-start;}/* Styling for images within user messages */.userMessage img {    max-width: 100px;     max-height: 100px;     border-radius: 5px;    margin-bottom: 5px;     object-fit: cover;}/* Styling for text within user messages */.userMessage .text {    text-align: left;}/* Modal styling */.modal {    display: none;     position: fixed;     z-index: 1000;     left: 0;    top: 0;    width: 100%;    height: 100%;    overflow: auto;     background-color: rgb(0,0,0);     background-color: rgba(0,0,0,0.8); }.modal-content {    margin: auto;    display: block;    width: 80%;    max-width: 700px;}.close {    position: absolute;    top: 15px;    right: 35px;    color: #f1f1f1;    font-size: 40px;    font-weight: bold;}.close:hover,.close:focus {    color: #bbb;    text-decoration: none;    cursor: pointer;}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

\\\"Creating

4.2 Implementing Chatbot Functionalities

4.1 Frontend

\\\"创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});
const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

const modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

imageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

btnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to \\\"none\\\".

textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

4.2 Backend (server)

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);
const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};
const storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });
async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}
const app = express();const port = 3001;app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));
app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  try {    const { message } = req.body;    const imagePath = req.file ? req.file.path : null;    let generatedText = \\\"\\\";    if (imagePath) {      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      res.status(200).json({ reply: result.response.text() });      next(message);    }    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      const aiResponse = result.response.text();      res.status(200).json({ reply: aiResponse });    }    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});
app.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single(\\\"image\\\") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file\\'s URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};const app = express();const port = 3001;// Setup multer for file uploadsconst storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });/** * Uploads the given file to Gemini. * * See https://ai.google.dev/gemini-api/docs/prompting_with_media */async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));// API endpoint for generating a responseapp.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  // console.log(\\\"api/upload\\\");  try {    const { message } = req.body; // This is the accompanying text (if any)    const imagePath = req.file ? req.file.path : null; // This is the image (if any)    let generatedText = \\\"\\\";    console.log(\\\"after generatedText \\\"    generatedText);    // Check if an image is provided and send the image URL to the Gemini API    if (imagePath) {      console.log(imagePath);      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      console.log(files);      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.text() );      res.status(200).json({ reply: result.response.text() });      next(message);      // Extract the AI\\'s response related to the image      // const imageAIResponse = imageResult?.candidates?.[0]?.content || \\\"Could not analyze the image.\\\";      // console.log(\\\"after imageAIResponse\\\");      // generatedText  = `Image Analysis: ${imageAIResponse}`;      // console.log(\\\"after generatedText \\\"    generatedText);    }    // Handle the accompanying text and send it to the Gemini API    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.candidates);      console.log(result.response.text());      // Extract the AI\\'s generated response      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || \\\"Could not generate a response.\\\";      res.status(200).json({ reply: aiResponse });    }    // If neither image nor text is provided, return an error    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    // Send the generated text or description back to the frontend    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});// Start the serverapp.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

\\\"Creating

\\\"Creating

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application\\'s robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

\\\"Creating

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

","image":"http://www.luping.net/uploads/20241011/17286282126708c5f4d2936.png","datePublished":"2024-11-09T00:59:07+08:00","dateModified":"2024-11-09T00:59:07+08:00","author":{"@type":"Person","name":"luping.net","url":"https://www.luping.net/articlelist/0_1.html"}}
”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

发布于2024-11-09
浏览:694

Table of Contents

  • Introduction
    • What is a Chatbot?
    • Understanding the Problem
    • Setting Up the Development Environment
  • Understanding the Problem
    • What Problems Does the Chatbot Solve?
    • What Should the Chatbot Do?
  • Setting Up the Development Environment
    • Tools and Technologies
    • Prerequisites
    • Setting Up the Environment
  • Implementing Chatbot Functionalities
    • Frontend
    • Backend (server)
  • Testing and Debugging
    • Text queries
    • Image queries
  • Problems faced and conclusions
    • Some difficulties Faced
    • Conclusion

1. Introduction

1.1 What is a Chatbot?

A chatbot is a type of software that mimics conversations with people. Most chatbots communicate through text, but some can also use voice. They use artificial intelligence (AI) to understand what users are asking and provide answers quickly. This makes chatbots useful for handling routine tasks and giving information efficiently.

The main job of a chatbot is to talk with users. It does this through a messaging platform, which can be as simple as answering straightforward questions or managing more complex conversations. By using natural language processing (NLP), chatbots can understand user questions and provide relevant responses, making interactions smoother and more effective.

1.2 Why build a chatbot?
Building an information chatbot helps people quickly find answers and details they need without waiting or searching for them manually. For example, if you’re looking for information on scholarships, a chatbot can instantly provide the details you need, saving you time and effort. It can handle many questions at once, is available 24/7, and can make finding the right information much easier for everyone.

2. Understanding the Problem

2.1 What Problems Does the Chatbot Solve?
A chatbot helps solve the problem of finding information by making it easier to get answers quickly. Instead of spending a lot of time searching online or waiting for help, users can ask the chatbot their questions and get instant responses. This means users don’t have to search through multiple websites or wait for office hours; the information is available anytime, making it more accessible and convenient for everyone.

2.2 What Should the Chatbot Do?

  • Display text queries on the UI
  • Display image queries on the UI
  • Provide text-based responses from text queries
  • Handle image-based queries
  • Preview image before and after sending to the UI

3. Setting Up the Development Environment

3.1 Tools and Technologies

  • HTML & CSS: Basic web design
  • JavaScript: Adding interactivity
  • API: Fetching information
  • Node.js & Express: Server-side handling

3.2 Prerequisites

  • Basic understanding of HTML, CSS, and JavaScript
  • A code editor (e.g., Visual Studio Code)
  • Web browser (for testing)

3.3 Setting up the environment

  1. Install Node.js and npm: Make sure you have Node.js installed on your system. If not, download and install it from Node.js official site. Verify the installation:
node -v
npm -v
  1. Create a folder and open it on your code editor

Creating an Information Bot: A Beginner

  1. Initialize a Node.js project This creates a package.json file for managing dependencies.
npm init -y

Creating an Information Bot: A Beginner

  1. Install Required Dependencies: You will need express, axios, dotenv, @google-ai/generativelanguage, @google/generative-ai/server, multer, body-parser, and cors for this setup:
npm install express axios dotenv cors @google-ai/generativelanguage @google/generative-ai/server multer body-parser 

  • express: This tool helps build a web server that listens for and responds to requests. For example, it manages everything from showing web pages to accepting images or text from users.

  • axios: This tool is used to make requests to other servers or APIs (like calling another website to get data). It sends and receives data over the internet, making it easy to connect your app to external services.

  • dotenv: This tool is used to store important secrets (like API keys or passwords) in a hidden file called .env. It helps keep sensitive information safe, so you don't accidentally share it with others.

  • @google-ai/generativelanguage: This package is used to connect with Google’s AI language services. It helps send user inputs (like text) to Google's AI and get back smart, AI-generated responses for your app to use.

  • @google/generative-ai/server: This tool works with Google's AI to handle files like images. It helps upload images to Google's AI for processing and analysis, and then receive useful insights or responses from the AI.

  • multer: This tool is used to handle file uploads, like when users send images or other files to your server. It saves these files in a specific folder so your server can use them.

  • body-parser: This tool allows the server to easily understand data (like text or form data) sent from the user’s browser. It helps grab that data and make it usable in the code.

  • cors: This tool allows your server to accept requests from different websites or apps. Normally, browsers block certain requests for safety, but cors enables you to safely handle requests from other sites.

  1. Creating the API Key
  2. What is an API Key? An API key is like a special password that lets programs talk to each other. It keeps things secure by making sure only allowed users can access a service.
  • Why use an API key?
    An API key is like a password for using a service or app. It keeps things secure by making sure only the right people can access certain features or data. This helps prevent misuse and keeps your information safe. It also helps the service provider see how much the service is being used, so they can manage it better.

  • Gemini API Key
    The Gemini API key is crucial for my chatbot project as it allows the bot to access advanced AI features. This key enables the chatbot to understand and generate responses based on user inputs and uploaded images. By using this API, I can enhance the chatbot's intelligence and provide a better experience for users seeking assistance.

  • How to create a Gemini API Key
    Go to Gemini AI Studio. If you don’t have an account, sign up for one.
    Scroll down to where you can see what is on the image below

Creating an Information Bot: A Beginner
Click on Get your API key. It will lead you to this page

Creating an Information Bot: A Beginner
Once on this page, click the blue button with the label Create API key. This will lead you to another page. On this page, you will either create an API key for a new project or an existing one.
I clicked on creating for a new project, since I am working on a new project.

Creating an Information Bot: A Beginner

Once your AP key is created, you can now copy it and use in your project.

Creating an Information Bot: A Beginner#

Remember the tip here, to use your API key securely.
You should keep your API key secret because it acts like a password for your application. If someone finds it, they could misuse it to access your data or services, leading to security issues or extra costs. Keeping it private helps protect your project and ensures it runs smoothly.

  1. Creating the Server and Hiding the API Key Create a server.js file: This will contain your backend code to handle incoming requests from the chatbot, make calls to the Gemini API, and respond with the generated messages. Create the .env File: In the root of your project, create an .env file to store the Gemini API key. This will be the final project structure

Creating an Information Bot: A Beginner

4. Building the Chatbot

4.1 Designing the Chatbot Interface

  • Creating the HTML structure


    
    
    
    
    ImageBot


    

ChatBot

创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title "ChatBot" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there's an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the "Send" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

  • Styling with CSS
body {
    background-color: #f5f5f5;
    font-family: 'Arial', sans-serif;
    margin: 0;
    padding: 0;
    display: flex;
    justify-content: center;
    align-items: center;
    height: 100vh;
}

/* ChatBot container */
.container.chatBot {
    background-color: #ffffff;
    width: 50%;
    max-width: 600px;
    border-radius: 8px;
    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);
    display: flex;
    flex-direction: column;
    justify-content: space-between;
    padding: 20px;
    position: relative;
}

/* Header styling */
.header {
    font-size: 24px;
    color: #333;
    text-align: center;
    margin-bottom: 15px;
}

/* Chat history styling */
.chatHistory {
    height: 300px;
    overflow-y: auto;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    margin-bottom: 20px;
}

.chatHistory::-webkit-scrollbar {
    width: 8px;
}

.chatHistory::-webkit-scrollbar-thumb {
    background-color: #ccc;
    border-radius: 4px;
}

/* Input session styling */
.inputSession {
    display: flex;
    align-items: center;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    justify-content: space-between;
}

/* Input field styling */
#textInput {
    width: 80%;
    padding: 10px;
    border: none;
    border-radius: 8px;
    background-color: #fff;
    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);
    margin-right: 10px;
    font-size: 16px;
    flex-grow: 1; 
    border: 1px solid #ddd;
    border-radius: 4px;
    padding: 8px;
    margin-right: 10px; 
    width: 100%;
}

/* Button for sending messages */
#btnSend {
    color: #fff;
    border: none;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    cursor: pointer;
    font-size: 20px;
    transition: background-color 0.3s;
}

#btnSend:hover {
    background-color: #363e47;
}

/* Image preview styling */
.imagePreview {
    display: flex;
    align-items: center;
    flex-grow: 1;
    margin-bottom: 10px;
}

#previewImage {
    max-width: 80px;
    max-height: 80px;
    border-radius: 5px;
    margin-right: 10px;
    object-fit: cover;

    margin-right: 10px;
}

/* Label for file input */
label[for="imageInput"] {
    color: #fff;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    font-size: 20px;
    cursor: pointer;
    margin-right: 10px;
}

label[for="imageInput"]:hover {
    background-color: #363e47;
}

/* Styling for user messages */
.userMessage {
    display: flex;
    align-items: flex-start;
    margin-bottom: 10px;
    padding: 10px;
    background-color: #e9ecef;
    border-radius: 8px;
    border: 1px solid #ddd;
    max-width: 100%;
}

/* Container for image and text */
.messageContent {
    display: flex;
    flex-direction: column;
    align-items: flex-start;
}

/* Styling for images within user messages */
.userMessage img {
    max-width: 100px; 
    max-height: 100px; 
    border-radius: 5px;
    margin-bottom: 5px; 
    object-fit: cover;
}

/* Styling for text within user messages */
.userMessage .text {
    text-align: left;
}

/* Modal styling */
.modal {
    display: none; 
    position: fixed; 
    z-index: 1000; 
    left: 0;
    top: 0;
    width: 100%;
    height: 100%;
    overflow: auto; 
    background-color: rgb(0,0,0); 
    background-color: rgba(0,0,0,0.8); 
}

.modal-content {
    margin: auto;
    display: block;
    width: 80%;
    max-width: 700px;
}

.close {
    position: absolute;
    top: 15px;
    right: 35px;
    color: #f1f1f1;
    font-size: 40px;
    font-weight: bold;
}

.close:hover,
.close:focus {
    color: #bbb;
    text-decoration: none;
    cursor: pointer;
}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

Creating an Information Bot: A Beginner

4.2 Implementing Chatbot Functionalities

4.1 Frontend

创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

  • Element Selection
const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

  • Modal Elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

  • Image Preview
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

  • Send Image and Text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }
});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

  • Add Message to Chat History
function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

  • Modal Handling
closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to "none".

  • Send Message on Enter Key Press
textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

4.2 Backend (server)

  • Importing dependencies
const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

  • Configuring Environment Variables
dotenv.config();
const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

  • dotenv.config(): Loads environment variables from the .env file.
  • apiKey: Retrieves the API key from environment variables to authenticate requests to the Google Generative AI API.
  • genAI: Initializes the GoogleGenerativeAI instance with the API key.
  • fileManager: Initializes the GoogleAIFileManager instance with the same API key for handling file uploads.

  • Setting Up AI Model and Configuration

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

  • model: Configures and initializes the generative model (Gemini 1.5 Pro) from Google AI, specifying which model to use for generating responses.
  • generationConfig: Defines parameters for generating responses, including temperature (controls creativity), topP and topK (control the diversity of responses), and maxOutputTokens (maximum length of the response).

  • Configuring Multer for File Uploads

const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

  • multer.diskStorage(): Configures how files are stored.
  • destination: Specifies the directory (uploads/) where files should be saved.
  • filename: Renames the file by appending the current timestamp to ensure uniqueness.
  • upload: Creates a Multer instance with the defined storage configuration.

  • Uploading Files to Gemini

async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

  • uploadToGemini(): A function to upload a file to the Google Gemini API.
  • fileManager.uploadFile(): Uploads the file to the API and logs the result.
  • file: Contains the details of the uploaded file returned from the API.

  • Configuring Express and Middleware

const app = express();
const port = 3001;

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

  • app: Initializes an Express application.
  • port: Sets the port on which the server will listen (3001).
  • app.use(cors()): Enables CORS for the server.
  • app.use(bodyParser.json()): Parses JSON bodies.
  • app.use(bodyParser.urlencoded({ extended: true })): Parses URL-- encoded bodies.
  • app.use(express.static("public")): Serves static files like HTML, CSS, and JS from the public directory.

  • API Endpoint for Handling Image and Text

app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  try {
    const { message } = req.body;
    const imagePath = req.file ? req.file.path : null;

    let generatedText = "";

    if (imagePath) {
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      res.status(200).json({ reply: result.response.text() });
      next(message);
    }

    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      const aiResponse = result.response.text();
      res.status(200).json({ reply: aiResponse });
    }

    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});

  • app.post("/api/upload"): Defines a POST endpoint for handling file and text uploads.
  • upload.single("image"): Middleware to handle single file upload (named image).
  • req.body: Contains the text message.
  • req.file: Contains the uploaded image file.
  • uploadToGemini(): Uploads the image to the Gemini API.
  • model.startChat(): Starts a chat session with the model.
  • chatSession.sendMessage(message): Sends the message (and image if provided) to the model.
  • res.status(200).json({ reply: result.response.text() }): Sends the generated response back to the client.
  • res.status(400): Handles cases where neither image nor text is provided.
  • res.status(500): Handles server errors.

  • Starting the Server

app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • app.listen(port): Starts the server and listens on the specified port (3001).
  • console.log: Confirms that the server is running and accessible at http://localhost:3001.

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post("/api/upload", upload.single("image"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single("image") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file's URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

dotenv.config();

const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

const app = express();
const port = 3001;

// Setup multer for file uploads
const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

/**
 * Uploads the given file to Gemini.
 *
 * See https://ai.google.dev/gemini-api/docs/prompting_with_media
 */
async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

// API endpoint for generating a response
app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  // console.log("api/upload");
  try {
    const { message } = req.body; // This is the accompanying text (if any)
    const imagePath = req.file ? req.file.path : null; // This is the image (if any)

    let generatedText = "";
    console.log("after generatedText "    generatedText);
    // Check if an image is provided and send the image URL to the Gemini API
    if (imagePath) {
      console.log(imagePath);
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];
      console.log(files);

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.text() );
      res.status(200).json({ reply: result.response.text() });
      next(message);

      // Extract the AI's response related to the image
      // const imageAIResponse = imageResult?.candidates?.[0]?.content || "Could not analyze the image.";
      // console.log("after imageAIResponse");
      // generatedText  = `Image Analysis: ${imageAIResponse}`;
      // console.log("after generatedText "    generatedText);
    }

    // Handle the accompanying text and send it to the Gemini API
    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.candidates);
      console.log(result.response.text());

      // Extract the AI's generated response
      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || "Could not generate a response.";
      res.status(200).json({ reply: aiResponse });
    }

    // If neither image nor text is provided, return an error
    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    // Send the generated text or description back to the frontend
    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});



// Start the server
app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • Removing markdown
    In the chat application, I used the Marked library to convert Markdown text into HTML for bot messages by including . When the bot sends a message, the code checks the class name and applies textContainer.innerHTML = marked.parse(text); to render Markdown as HTML. For user messages, I used textContainer.textContent = text; to display plain text, ensuring clarity in interactions.

  • Markdown Text

Creating an Information Bot: A Beginner

  • After markdown has been removed

Creating an Information Bot: A Beginner

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application's robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

Creating an Information Bot: A Beginner

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

  • Gemini API Integration: I developed skills in API integration, particularly using the Gemini API for generating responses based on inputs.

  • Problem-solving: I learned how to systematically debug and troubleshoot issues, improving my resilience in overcoming project obstacles.

  • Time Management: The delays caused by power outages and bugs helped me practice time management and adaptability under pressure.

  • Collaborating for Solutions: Reaching out for help when needed and learning from others was an important takeaway in this project.

  • Practical Experience: The hands-on experience with API and front-end integration improved my proficiency in JavaScript.

版本声明 本文转载于:https://dev.to/kedjuprecious/creating-an-information-bot-a-beginners-guide-htmlcss-javascript-gemini-api-260f?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 解决MySQL插入Emoji时出现的\\"字符串值错误\\"异常
    解决MySQL插入Emoji时出现的\\"字符串值错误\\"异常
    Resolving Incorrect String Value Exception When Inserting EmojiWhen attempting to insert a string containing emoji characters into a MySQL database us...
    编程 发布于2025-07-02
  • CSS强类型语言解析
    CSS强类型语言解析
    您可以通过其强度或弱输入的方式对编程语言进行分类的方式之一。在这里,“键入”意味着是否在编译时已知变量。一个例子是一个场景,将整数(1)添加到包含整数(“ 1”)的字符串: result = 1 "1";包含整数的字符串可能是由带有许多运动部件的复杂逻辑套件无意间生成的。它也可以是故意从单个真理...
    编程 发布于2025-07-02
  • 如何检查对象是否具有Python中的特定属性?
    如何检查对象是否具有Python中的特定属性?
    方法来确定对象属性存在寻求一种方法来验证对象中特定属性的存在。考虑以下示例,其中尝试访问不确定属性会引起错误: >>> a = someClass() >>> A.property Trackback(最近的最新电话): 文件“ ”,第1行, AttributeError: SomeClass...
    编程 发布于2025-07-02
  • PHP未来:适应与创新
    PHP未来:适应与创新
    PHP的未来将通过适应新技术趋势和引入创新特性来实现:1)适应云计算、容器化和微服务架构,支持Docker和Kubernetes;2)引入JIT编译器和枚举类型,提升性能和数据处理效率;3)持续优化性能和推广最佳实践。 引言在编程世界中,PHP一直是网页开发的中流砥柱。作为一个从1994年就开始发展...
    编程 发布于2025-07-02
  • Python环境变量的访问与管理方法
    Python环境变量的访问与管理方法
    Accessing Environment Variables in PythonTo access environment variables in Python, utilize the os.environ object, which represents a mapping of envir...
    编程 发布于2025-07-02
  • 如何有效地选择熊猫数据框中的列?
    如何有效地选择熊猫数据框中的列?
    在处理数据操作任务时,在Pandas DataFrames 中选择列时,选择特定列的必要条件是必要的。在Pandas中,选择列的各种选项。选项1:使用列名 如果已知列索引,请使用ILOC函数选择它们。请注意,python索引基于零。 df1 = df.iloc [:,0:2]#使用索引0和1 c...
    编程 发布于2025-07-02
  • 可以在纯CS中将多个粘性元素彼此堆叠在一起吗?
    可以在纯CS中将多个粘性元素彼此堆叠在一起吗?
    [2这里: https://webthemez.com/demo/sticky-multi-header-scroll/index.html </main> <section> { display:grid; grid-template-...
    编程 发布于2025-07-02
  • HTML格式标签
    HTML格式标签
    HTML 格式化元素 **HTML Formatting is a process of formatting text for better look and feel. HTML provides us ability to format text without us...
    编程 发布于2025-07-02
  • 版本5.6.5之前,使用current_timestamp与时间戳列的current_timestamp与时间戳列有什么限制?
    版本5.6.5之前,使用current_timestamp与时间戳列的current_timestamp与时间戳列有什么限制?
    在时间戳列上使用current_timestamp或MySQL版本中的current_timestamp或在5.6.5 此限制源于遗留实现的关注,这些限制需要对当前的_timestamp功能进行特定的实现。 创建表`foo`( `Productid` int(10)unsigned not n...
    编程 发布于2025-07-02
  • 为什么在我的Linux服务器上安装Archive_Zip后,我找不到“ class \” class \'ziparchive \'错误?
    为什么在我的Linux服务器上安装Archive_Zip后,我找不到“ class \” class \'ziparchive \'错误?
    class'ziparchive'在Linux Server上安装Archive_zip时找不到错误 commant in lin ins in cland ins in lin.11 on a lin.1 in a lin.11错误:致命错误:在... cass中找不到类z...
    编程 发布于2025-07-02
  • 查找当前执行JavaScript的脚本元素方法
    查找当前执行JavaScript的脚本元素方法
    如何引用当前执行脚本的脚本元素在某些方案中理解问题在某些方案中,开发人员可能需要将其他脚本动态加载其他脚本。但是,如果Head Element尚未完全渲染,则使用document.getElementsbytagname('head')[0] .appendChild(v)的常规方...
    编程 发布于2025-07-02
  • 如何使用替换指令在GO MOD中解析模块路径差异?
    如何使用替换指令在GO MOD中解析模块路径差异?
    在使用GO MOD时,在GO MOD 中克服模块路径差异时,可能会遇到冲突,其中3个Party Package将另一个PAXPANCE带有导入式套件之间的另一个软件包,并在导入式套件之间导入另一个软件包。如回声消息所证明的那样: go.etcd.io/bbolt [&&&&&&&&&&&&&&&&...
    编程 发布于2025-07-02
  • 为什么不使用CSS`content'属性显示图像?
    为什么不使用CSS`content'属性显示图像?
    在Firefox extemers属性为某些图像很大,&& && && &&华倍华倍[华氏华倍华氏度]很少见,却是某些浏览属性很少,尤其是特定于Firefox的某些浏览器未能在使用内容属性引用时未能显示图像的情况。这可以在提供的CSS类中看到:。googlepic { 内容:url(&#...
    编程 发布于2025-07-02
  • 在GO中构造SQL查询时,如何安全地加入文本和值?
    在GO中构造SQL查询时,如何安全地加入文本和值?
    在go中构造文本sql查询时,在go sql queries 中,在使用conting and contement和contement consem per时,尤其是在使用integer per当per当per时,per per per当per. [&​​&&&&&&&&&&&&&&&默元组方法在...
    编程 发布于2025-07-02
  • JavaScript计算两个日期之间天数的方法
    JavaScript计算两个日期之间天数的方法
    How to Calculate the Difference Between Dates in JavascriptAs you attempt to determine the difference between two dates in Javascript, consider this s...
    编程 发布于2025-07-02

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3