This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title \\\"ChatBot\\\" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there\\'s an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the \\\"Send\\\" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

body {    background-color: #f5f5f5;    font-family: \\'Arial\\', sans-serif;    margin: 0;    padding: 0;    display: flex;    justify-content: center;    align-items: center;    height: 100vh;}/* ChatBot container */.container.chatBot {    background-color: #ffffff;    width: 50%;    max-width: 600px;    border-radius: 8px;    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);    display: flex;    flex-direction: column;    justify-content: space-between;    padding: 20px;    position: relative;}/* Header styling */.header {    font-size: 24px;    color: #333;    text-align: center;    margin-bottom: 15px;}/* Chat history styling */.chatHistory {    height: 300px;    overflow-y: auto;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    margin-bottom: 20px;}.chatHistory::-webkit-scrollbar {    width: 8px;}.chatHistory::-webkit-scrollbar-thumb {    background-color: #ccc;    border-radius: 4px;}/* Input session styling */.inputSession {    display: flex;    align-items: center;    padding: 10px;    background-color: #f1f1f1;    border-radius: 8px;    border: 1px solid #ddd;    justify-content: space-between;}/* Input field styling */#textInput {    width: 80%;    padding: 10px;    border: none;    border-radius: 8px;    background-color: #fff;    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);    margin-right: 10px;    font-size: 16px;    flex-grow: 1;     border: 1px solid #ddd;    border-radius: 4px;    padding: 8px;    margin-right: 10px;     width: 100%;}/* Button for sending messages */#btnSend {    color: #fff;    border: none;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    cursor: pointer;    font-size: 20px;    transition: background-color 0.3s;}#btnSend:hover {    background-color: #363e47;}/* Image preview styling */.imagePreview {    display: flex;    align-items: center;    flex-grow: 1;    margin-bottom: 10px;}#previewImage {    max-width: 80px;    max-height: 80px;    border-radius: 5px;    margin-right: 10px;    object-fit: cover;    margin-right: 10px;}/* Label for file input */label[for=\\\"imageInput\\\"] {    color: #fff;    border-radius: 50%;    width: 50px;    height: 50px;    display: flex;    align-items: center;    justify-content: center;    font-size: 20px;    cursor: pointer;    margin-right: 10px;}label[for=\\\"imageInput\\\"]:hover {    background-color: #363e47;}/* Styling for user messages */.userMessage {    display: flex;    align-items: flex-start;    margin-bottom: 10px;    padding: 10px;    background-color: #e9ecef;    border-radius: 8px;    border: 1px solid #ddd;    max-width: 100%;}/* Container for image and text */.messageContent {    display: flex;    flex-direction: column;    align-items: flex-start;}/* Styling for images within user messages */.userMessage img {    max-width: 100px;     max-height: 100px;     border-radius: 5px;    margin-bottom: 5px;     object-fit: cover;}/* Styling for text within user messages */.userMessage .text {    text-align: left;}/* Modal styling */.modal {    display: none;     position: fixed;     z-index: 1000;     left: 0;    top: 0;    width: 100%;    height: 100%;    overflow: auto;     background-color: rgb(0,0,0);     background-color: rgba(0,0,0,0.8); }.modal-content {    margin: auto;    display: block;    width: 80%;    max-width: 700px;}.close {    position: absolute;    top: 15px;    right: 35px;    color: #f1f1f1;    font-size: 40px;    font-weight: bold;}.close:hover,.close:focus {    color: #bbb;    text-decoration: none;    cursor: pointer;}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

\\\"Creating

4.2 Implementing Chatbot Functionalities

4.1 Frontend

\\\"创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});
const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

const modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

imageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

btnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to \\\"none\\\".

textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById(\\\"btnSend\\\");const imageInput = document.getElementById(\\\"imageInput\\\");const textInput = document.getElementById(\\\"textInput\\\");const previewImage = document.getElementById(\\\"previewImage\\\");const chatHistory = document.getElementById(\\\"chatHistory\\\");// Modal elementsconst modal = document.getElementById(\\\"imageModal\\\");const modalImage = document.getElementById(\\\"modalImage\\\");const closeModal = document.querySelector(\\\".close\\\");// Function to preview the image when selectedimageInput.addEventListener(\\\"change\\\", function () {  const file = imageInput.files[0];  if (file) {    const reader = new FileReader();    reader.onload = function (e) {      previewImage.src = e.target.result; // Preview the image    };    reader.readAsDataURL(file); // Read the file as a data URL  }});// Function to send the image and textbtnSend.addEventListener(\\\"click\\\", async function (e) {  e.preventDefault();  const text = textInput.value.trim();  const file = imageInput.files[0];  // Clear inputs  textInput.value = \\\"\\\";  previewImage.src = \\\"\\\";  imageInput.value = null;  // Append the image and message to the chat immediately on the UI  if (file) {    addMessageToChatHistory(URL.createObjectURL(file), text, \\\"userMessage\\\");  } else if (text) {    addMessageToChatHistory(null, text, \\\"userMessage\\\");  }  // Send the image and text to the backend  if (file || text) {    const formData = new FormData();    if (file) formData.append(\\\"image\\\", file);    if (text) formData.append(\\\"message\\\", text);    try {      const response = await fetch(\\'http://localhost:3001/api/upload\\', {        method: \\'POST\\',        body: formData      });      if (!response.ok) {        throw new Error(\\'Failed to send image to the server.\\');      }      const data = await response.json();      // Display the bot\\'s message (response) based on the image      addMessageToChatHistory(null, data.reply, \\\"botMessage\\\");    } catch (error) {      console.error(\\'Error sending image or text:\\', error);      addMessageToChatHistory(null, \\'Error sending data. Please try again.\\', \\\"errorMessage\\\");    }  }});function addMessageToChatHistory(imageSrc, text, className) {  const messageContainer = document.createElement(\\\"div\\\");  messageContainer.classList.add(className);  if (imageSrc) {    const imageElement = document.createElement(\\\"img\\\");    imageElement.src = imageSrc;    imageElement.classList.add(\\\"previewed-image\\\");    messageContainer.appendChild(imageElement);  }  if (text) {    const textContainer = document.createElement(\\\"div\\\");    textContainer.classList.add(\\\"text\\\");    textContainer.textContent = text;    messageContainer.appendChild(textContainer);  }  chatHistory.appendChild(messageContainer);  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat}closeModal.addEventListener(\\\"click\\\", function (e) {  e.preventDefault();  modal.style.display = \\\"none\\\";});window.addEventListener(\\\"click\\\", function (e) {  if (e.target === modal) {    modal.style.display = \\\"none\\\";  }});textInput.addEventListener(\\\"keydown\\\", (e) => {  if (e.key === \\\"Enter\\\") {    btnSend.click();  }});

4.2 Backend (server)

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);
const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};
const storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });
async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}
const app = express();const port = 3001;app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));
app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  try {    const { message } = req.body;    const imagePath = req.file ? req.file.path : null;    let generatedText = \\\"\\\";    if (imagePath) {      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      res.status(200).json({ reply: result.response.text() });      next(message);    }    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      const aiResponse = result.response.text();      res.status(200).json({ reply: aiResponse });    }    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});
app.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single(\\\"image\\\") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file\\'s URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require(\\\"express\\\");const bodyParser = require(\\\"body-parser\\\");const cors = require(\\\"cors\\\");  // Enable CORSconst dotenv = require(\\\"dotenv\\\");const multer = require(\\\"multer\\\");const { GoogleGenerativeAI } = require(\\\"@google/generative-ai\\\");const { GoogleAIFileManager } = require(\\\"@google/generative-ai/server\\\");dotenv.config();const apiKey = process.env.GEMINI_API_KEY;const genAI = new GoogleGenerativeAI(apiKey);const fileManager = new GoogleAIFileManager(apiKey);const model = genAI.getGenerativeModel({  model: \\\"gemini-1.5-pro\\\",});const generationConfig = {  temperature: 1,  topP: 0.95,  topK: 64,  maxOutputTokens: 8192,  responseMimeType: \\\"text/plain\\\",};const app = express();const port = 3001;// Setup multer for file uploadsconst storage = multer.diskStorage({  destination: (req, file, cb) => {    cb(null, \\'uploads/\\'); // Define where the files should be stored  },  filename: (req, file, cb) => {    cb(null, Date.now()   \\'-\\'   file.originalname); // Rename the file to avoid duplicates  }});const upload = multer({ storage: storage });/** * Uploads the given file to Gemini. * * See https://ai.google.dev/gemini-api/docs/prompting_with_media */async function uploadToGemini(path, mimeType) {  const uploadResult = await fileManager.uploadFile(path, {    mimeType,    displayName: path,  });  const file = uploadResult.file;  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);  return file;}app.use(cors());  // Enable CORSapp.use(bodyParser.json());app.use(bodyParser.urlencoded({ extended: true }));// Serve static files (HTML, CSS, JS)app.use(express.static(\\\"public\\\"));// API endpoint for generating a responseapp.post(\\\"/api/upload\\\", upload.single(\\\"image\\\"), async (req, res, next) => {  // console.log(\\\"api/upload\\\");  try {    const { message } = req.body; // This is the accompanying text (if any)    const imagePath = req.file ? req.file.path : null; // This is the image (if any)    let generatedText = \\\"\\\";    console.log(\\\"after generatedText \\\"    generatedText);    // Check if an image is provided and send the image URL to the Gemini API    if (imagePath) {      console.log(imagePath);      const files = [        await uploadToGemini(imagePath, \\\"image/jpeg\\\"),      ];      console.log(files);      const chatSession = model.startChat({        generationConfig,        history: [          {            role: \\\"user\\\",            parts: [              {                fileData: {                  mimeType: files[0].mimeType,                  fileUri: files[0].uri,                },              },            ],          },        ],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.text() );      res.status(200).json({ reply: result.response.text() });      next(message);      // Extract the AI\\'s response related to the image      // const imageAIResponse = imageResult?.candidates?.[0]?.content || \\\"Could not analyze the image.\\\";      // console.log(\\\"after imageAIResponse\\\");      // generatedText  = `Image Analysis: ${imageAIResponse}`;      // console.log(\\\"after generatedText \\\"    generatedText);    }    // Handle the accompanying text and send it to the Gemini API    if (message) {      const chatSession = model.startChat({        generationConfig,        history: [],      });      const result = await chatSession.sendMessage(message);      console.log(result.response.candidates);      console.log(result.response.text());      // Extract the AI\\'s generated response      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || \\\"Could not generate a response.\\\";      res.status(200).json({ reply: aiResponse });    }    // If neither image nor text is provided, return an error    if (!imagePath && !message) {      return res.status(400).json({ error: \\\"No image or text provided\\\" });    }    // Send the generated text or description back to the frontend    res.json({ reply: generatedText });  } catch (error) {    console.error(\\\"Error processing image or text:\\\", error);    res.status(500).json({ error: \\\"Internal Server Error\\\" });  }});// Start the serverapp.listen(port, () => {  console.log(`Server running at http://localhost:${port}`);});

\\\"Creating

\\\"Creating

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application\\'s robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

\\\"Creating

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

","image":"http://www.luping.net/uploads/20241011/17286282126708c5f4d2936.png","datePublished":"2024-11-09T00:59:07+08:00","dateModified":"2024-11-09T00:59:07+08:00","author":{"@type":"Person","name":"luping.net","url":"https://www.luping.net/articlelist/0_1.html"}}
”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

发布于2024-11-09
浏览:678

Table of Contents

  • Introduction
    • What is a Chatbot?
    • Understanding the Problem
    • Setting Up the Development Environment
  • Understanding the Problem
    • What Problems Does the Chatbot Solve?
    • What Should the Chatbot Do?
  • Setting Up the Development Environment
    • Tools and Technologies
    • Prerequisites
    • Setting Up the Environment
  • Implementing Chatbot Functionalities
    • Frontend
    • Backend (server)
  • Testing and Debugging
    • Text queries
    • Image queries
  • Problems faced and conclusions
    • Some difficulties Faced
    • Conclusion

1. Introduction

1.1 What is a Chatbot?

A chatbot is a type of software that mimics conversations with people. Most chatbots communicate through text, but some can also use voice. They use artificial intelligence (AI) to understand what users are asking and provide answers quickly. This makes chatbots useful for handling routine tasks and giving information efficiently.

The main job of a chatbot is to talk with users. It does this through a messaging platform, which can be as simple as answering straightforward questions or managing more complex conversations. By using natural language processing (NLP), chatbots can understand user questions and provide relevant responses, making interactions smoother and more effective.

1.2 Why build a chatbot?
Building an information chatbot helps people quickly find answers and details they need without waiting or searching for them manually. For example, if you’re looking for information on scholarships, a chatbot can instantly provide the details you need, saving you time and effort. It can handle many questions at once, is available 24/7, and can make finding the right information much easier for everyone.

2. Understanding the Problem

2.1 What Problems Does the Chatbot Solve?
A chatbot helps solve the problem of finding information by making it easier to get answers quickly. Instead of spending a lot of time searching online or waiting for help, users can ask the chatbot their questions and get instant responses. This means users don’t have to search through multiple websites or wait for office hours; the information is available anytime, making it more accessible and convenient for everyone.

2.2 What Should the Chatbot Do?

  • Display text queries on the UI
  • Display image queries on the UI
  • Provide text-based responses from text queries
  • Handle image-based queries
  • Preview image before and after sending to the UI

3. Setting Up the Development Environment

3.1 Tools and Technologies

  • HTML & CSS: Basic web design
  • JavaScript: Adding interactivity
  • API: Fetching information
  • Node.js & Express: Server-side handling

3.2 Prerequisites

  • Basic understanding of HTML, CSS, and JavaScript
  • A code editor (e.g., Visual Studio Code)
  • Web browser (for testing)

3.3 Setting up the environment

  1. Install Node.js and npm: Make sure you have Node.js installed on your system. If not, download and install it from Node.js official site. Verify the installation:
node -v
npm -v
  1. Create a folder and open it on your code editor

Creating an Information Bot: A Beginner

  1. Initialize a Node.js project This creates a package.json file for managing dependencies.
npm init -y

Creating an Information Bot: A Beginner

  1. Install Required Dependencies: You will need express, axios, dotenv, @google-ai/generativelanguage, @google/generative-ai/server, multer, body-parser, and cors for this setup:
npm install express axios dotenv cors @google-ai/generativelanguage @google/generative-ai/server multer body-parser 

  • express: This tool helps build a web server that listens for and responds to requests. For example, it manages everything from showing web pages to accepting images or text from users.

  • axios: This tool is used to make requests to other servers or APIs (like calling another website to get data). It sends and receives data over the internet, making it easy to connect your app to external services.

  • dotenv: This tool is used to store important secrets (like API keys or passwords) in a hidden file called .env. It helps keep sensitive information safe, so you don't accidentally share it with others.

  • @google-ai/generativelanguage: This package is used to connect with Google’s AI language services. It helps send user inputs (like text) to Google's AI and get back smart, AI-generated responses for your app to use.

  • @google/generative-ai/server: This tool works with Google's AI to handle files like images. It helps upload images to Google's AI for processing and analysis, and then receive useful insights or responses from the AI.

  • multer: This tool is used to handle file uploads, like when users send images or other files to your server. It saves these files in a specific folder so your server can use them.

  • body-parser: This tool allows the server to easily understand data (like text or form data) sent from the user’s browser. It helps grab that data and make it usable in the code.

  • cors: This tool allows your server to accept requests from different websites or apps. Normally, browsers block certain requests for safety, but cors enables you to safely handle requests from other sites.

  1. Creating the API Key
  2. What is an API Key? An API key is like a special password that lets programs talk to each other. It keeps things secure by making sure only allowed users can access a service.
  • Why use an API key?
    An API key is like a password for using a service or app. It keeps things secure by making sure only the right people can access certain features or data. This helps prevent misuse and keeps your information safe. It also helps the service provider see how much the service is being used, so they can manage it better.

  • Gemini API Key
    The Gemini API key is crucial for my chatbot project as it allows the bot to access advanced AI features. This key enables the chatbot to understand and generate responses based on user inputs and uploaded images. By using this API, I can enhance the chatbot's intelligence and provide a better experience for users seeking assistance.

  • How to create a Gemini API Key
    Go to Gemini AI Studio. If you don’t have an account, sign up for one.
    Scroll down to where you can see what is on the image below

Creating an Information Bot: A Beginner
Click on Get your API key. It will lead you to this page

Creating an Information Bot: A Beginner
Once on this page, click the blue button with the label Create API key. This will lead you to another page. On this page, you will either create an API key for a new project or an existing one.
I clicked on creating for a new project, since I am working on a new project.

Creating an Information Bot: A Beginner

Once your AP key is created, you can now copy it and use in your project.

Creating an Information Bot: A Beginner#

Remember the tip here, to use your API key securely.
You should keep your API key secret because it acts like a password for your application. If someone finds it, they could misuse it to access your data or services, leading to security issues or extra costs. Keeping it private helps protect your project and ensures it runs smoothly.

  1. Creating the Server and Hiding the API Key Create a server.js file: This will contain your backend code to handle incoming requests from the chatbot, make calls to the Gemini API, and respond with the generated messages. Create the .env File: In the root of your project, create an .env file to store the Gemini API key. This will be the final project structure

Creating an Information Bot: A Beginner

4. Building the Chatbot

4.1 Designing the Chatbot Interface

  • Creating the HTML structure


    
    
    
    
    ImageBot


    

ChatBot

创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a

container, which holds all the chatbot content. The header displays the title "ChatBot" at the top. Below that, there is a chatHistory section where all previous conversations (messages sent and received) are displayed.

For interacting with the bot, there's an inputSession section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the "Send" button.

Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.

  • Styling with CSS
body {
    background-color: #f5f5f5;
    font-family: 'Arial', sans-serif;
    margin: 0;
    padding: 0;
    display: flex;
    justify-content: center;
    align-items: center;
    height: 100vh;
}

/* ChatBot container */
.container.chatBot {
    background-color: #ffffff;
    width: 50%;
    max-width: 600px;
    border-radius: 8px;
    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);
    display: flex;
    flex-direction: column;
    justify-content: space-between;
    padding: 20px;
    position: relative;
}

/* Header styling */
.header {
    font-size: 24px;
    color: #333;
    text-align: center;
    margin-bottom: 15px;
}

/* Chat history styling */
.chatHistory {
    height: 300px;
    overflow-y: auto;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    margin-bottom: 20px;
}

.chatHistory::-webkit-scrollbar {
    width: 8px;
}

.chatHistory::-webkit-scrollbar-thumb {
    background-color: #ccc;
    border-radius: 4px;
}

/* Input session styling */
.inputSession {
    display: flex;
    align-items: center;
    padding: 10px;
    background-color: #f1f1f1;
    border-radius: 8px;
    border: 1px solid #ddd;
    justify-content: space-between;
}

/* Input field styling */
#textInput {
    width: 80%;
    padding: 10px;
    border: none;
    border-radius: 8px;
    background-color: #fff;
    box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);
    margin-right: 10px;
    font-size: 16px;
    flex-grow: 1; 
    border: 1px solid #ddd;
    border-radius: 4px;
    padding: 8px;
    margin-right: 10px; 
    width: 100%;
}

/* Button for sending messages */
#btnSend {
    color: #fff;
    border: none;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    cursor: pointer;
    font-size: 20px;
    transition: background-color 0.3s;
}

#btnSend:hover {
    background-color: #363e47;
}

/* Image preview styling */
.imagePreview {
    display: flex;
    align-items: center;
    flex-grow: 1;
    margin-bottom: 10px;
}

#previewImage {
    max-width: 80px;
    max-height: 80px;
    border-radius: 5px;
    margin-right: 10px;
    object-fit: cover;

    margin-right: 10px;
}

/* Label for file input */
label[for="imageInput"] {
    color: #fff;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    display: flex;
    align-items: center;
    justify-content: center;
    font-size: 20px;
    cursor: pointer;
    margin-right: 10px;
}

label[for="imageInput"]:hover {
    background-color: #363e47;
}

/* Styling for user messages */
.userMessage {
    display: flex;
    align-items: flex-start;
    margin-bottom: 10px;
    padding: 10px;
    background-color: #e9ecef;
    border-radius: 8px;
    border: 1px solid #ddd;
    max-width: 100%;
}

/* Container for image and text */
.messageContent {
    display: flex;
    flex-direction: column;
    align-items: flex-start;
}

/* Styling for images within user messages */
.userMessage img {
    max-width: 100px; 
    max-height: 100px; 
    border-radius: 5px;
    margin-bottom: 5px; 
    object-fit: cover;
}

/* Styling for text within user messages */
.userMessage .text {
    text-align: left;
}

/* Modal styling */
.modal {
    display: none; 
    position: fixed; 
    z-index: 1000; 
    left: 0;
    top: 0;
    width: 100%;
    height: 100%;
    overflow: auto; 
    background-color: rgb(0,0,0); 
    background-color: rgba(0,0,0,0.8); 
}

.modal-content {
    margin: auto;
    display: block;
    width: 80%;
    max-width: 700px;
}

.close {
    position: absolute;
    top: 15px;
    right: 35px;
    color: #f1f1f1;
    font-size: 40px;
    font-weight: bold;
}

.close:hover,
.close:focus {
    color: #bbb;
    text-decoration: none;
    cursor: pointer;
}

This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.

Creating an Information Bot: A Beginner

4.2 Implementing Chatbot Functionalities

4.1 Frontend

创建信息机器人:初学者指南(HTML/CSS、JavaScript、Gemini API)

Here the .inputSession div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

  • Element Selection
const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

This section selects key HTML elements that will be used throughout the script. btnSend is the button used to send messages, imageInput is the input field for uploading images, textInput is where the user types their message, previewImage shows a preview of the selected image, and chatHistory is the area where the conversation is displayed.

  • Modal Elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage for the image inside the modal, and closeModal for the button to close the modal.

  • Image Preview
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

When a user selects an image file using imageInput, this event listener triggers. It uses a FileReader to read the image file and set the previewImage source to the result. This allows the user to see a preview of the image before sending it.

  • Send Image and Text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }
});

This code handles the click event of the btnSend button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.

  • Add Message to Chat History
function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.

  • Modal Handling
closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display to "none".

  • Send Message on Enter Key Press
textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.

const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");

// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");

// Function to preview the image when selected
imageInput.addEventListener("change", function () {
  const file = imageInput.files[0];
  if (file) {
    const reader = new FileReader();
    reader.onload = function (e) {
      previewImage.src = e.target.result; // Preview the image
    };
    reader.readAsDataURL(file); // Read the file as a data URL
  }
});

// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
  e.preventDefault();
  const text = textInput.value.trim();
  const file = imageInput.files[0];

  // Clear inputs
  textInput.value = "";
  previewImage.src = "";
  imageInput.value = null;

  // Append the image and message to the chat immediately on the UI
  if (file) {
    addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
  } else if (text) {
    addMessageToChatHistory(null, text, "userMessage");
  }

  // Send the image and text to the backend
  if (file || text) {
    const formData = new FormData();
    if (file) formData.append("image", file);
    if (text) formData.append("message", text);

    try {
      const response = await fetch('http://localhost:3001/api/upload', {
        method: 'POST',
        body: formData
      });

      if (!response.ok) {
        throw new Error('Failed to send image to the server.');
      }

      const data = await response.json();

      // Display the bot's message (response) based on the image
      addMessageToChatHistory(null, data.reply, "botMessage");
    } catch (error) {
      console.error('Error sending image or text:', error);
      addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
    }
  }


});

function addMessageToChatHistory(imageSrc, text, className) {
  const messageContainer = document.createElement("div");
  messageContainer.classList.add(className);

  if (imageSrc) {
    const imageElement = document.createElement("img");
    imageElement.src = imageSrc;
    imageElement.classList.add("previewed-image");
    messageContainer.appendChild(imageElement);
  }

  if (text) {
    const textContainer = document.createElement("div");
    textContainer.classList.add("text");
    textContainer.textContent = text;
    messageContainer.appendChild(textContainer);
  }

  chatHistory.appendChild(messageContainer);
  chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}

closeModal.addEventListener("click", function (e) {
  e.preventDefault();
  modal.style.display = "none";
});

window.addEventListener("click", function (e) {
  if (e.target === modal) {
    modal.style.display = "none";
  }
});

textInput.addEventListener("keydown", (e) => {
  if (e.key === "Enter") {
    btnSend.click();
  }
});

4.2 Backend (server)

  • Importing dependencies
const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.

  • Configuring Environment Variables
dotenv.config();
const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

  • dotenv.config(): Loads environment variables from the .env file.
  • apiKey: Retrieves the API key from environment variables to authenticate requests to the Google Generative AI API.
  • genAI: Initializes the GoogleGenerativeAI instance with the API key.
  • fileManager: Initializes the GoogleAIFileManager instance with the same API key for handling file uploads.

  • Setting Up AI Model and Configuration

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

  • model: Configures and initializes the generative model (Gemini 1.5 Pro) from Google AI, specifying which model to use for generating responses.
  • generationConfig: Defines parameters for generating responses, including temperature (controls creativity), topP and topK (control the diversity of responses), and maxOutputTokens (maximum length of the response).

  • Configuring Multer for File Uploads

const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

  • multer.diskStorage(): Configures how files are stored.
  • destination: Specifies the directory (uploads/) where files should be saved.
  • filename: Renames the file by appending the current timestamp to ensure uniqueness.
  • upload: Creates a Multer instance with the defined storage configuration.

  • Uploading Files to Gemini

async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

  • uploadToGemini(): A function to upload a file to the Google Gemini API.
  • fileManager.uploadFile(): Uploads the file to the API and logs the result.
  • file: Contains the details of the uploaded file returned from the API.

  • Configuring Express and Middleware

const app = express();
const port = 3001;

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

  • app: Initializes an Express application.
  • port: Sets the port on which the server will listen (3001).
  • app.use(cors()): Enables CORS for the server.
  • app.use(bodyParser.json()): Parses JSON bodies.
  • app.use(bodyParser.urlencoded({ extended: true })): Parses URL-- encoded bodies.
  • app.use(express.static("public")): Serves static files like HTML, CSS, and JS from the public directory.

  • API Endpoint for Handling Image and Text

app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  try {
    const { message } = req.body;
    const imagePath = req.file ? req.file.path : null;

    let generatedText = "";

    if (imagePath) {
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      res.status(200).json({ reply: result.response.text() });
      next(message);
    }

    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      const aiResponse = result.response.text();
      res.status(200).json({ reply: aiResponse });
    }

    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});

  • app.post("/api/upload"): Defines a POST endpoint for handling file and text uploads.
  • upload.single("image"): Middleware to handle single file upload (named image).
  • req.body: Contains the text message.
  • req.file: Contains the uploaded image file.
  • uploadToGemini(): Uploads the image to the Gemini API.
  • model.startChat(): Starts a chat session with the model.
  • chatSession.sendMessage(message): Sends the message (and image if provided) to the model.
  • res.status(200).json({ reply: result.response.text() }): Sends the generated response back to the client.
  • res.status(400): Handles cases where neither image nor text is provided.
  • res.status(500): Handles server errors.

  • Starting the Server

app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • app.listen(port): Starts the server and listens on the specified port (3001).
  • console.log: Confirms that the server is running and accessible at http://localhost:3001.

Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.

The endpoint for handling image and text uploads is defined with app.post("/api/upload", upload.single("image"), async (req, res, next) => { ... }). This code sets up a POST request handler at the /api/upload route. The upload.single("image") middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image.

When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path, and any accompanying text sent in the form is available through req.body.message.

The handler first checks if an image file was provided by examining req.file. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini() function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file's URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.

If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.

In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.

If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.

const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors");  // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");

dotenv.config();

const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
});

const generationConfig = {
  temperature: 1,
  topP: 0.95,
  topK: 64,
  maxOutputTokens: 8192,
  responseMimeType: "text/plain",
};

const app = express();
const port = 3001;

// Setup multer for file uploads
const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Define where the files should be stored
  },
  filename: (req, file, cb) => {
    cb(null, Date.now()   '-'   file.originalname); // Rename the file to avoid duplicates
  }
});
const upload = multer({ storage: storage });

/**
 * Uploads the given file to Gemini.
 *
 * See https://ai.google.dev/gemini-api/docs/prompting_with_media
 */
async function uploadToGemini(path, mimeType) {
  const uploadResult = await fileManager.uploadFile(path, {
    mimeType,
    displayName: path,
  });
  const file = uploadResult.file;
  console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
  return file;
}

app.use(cors());  // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));

// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));

// API endpoint for generating a response
app.post("/api/upload", upload.single("image"), async (req, res, next) => {
  // console.log("api/upload");
  try {
    const { message } = req.body; // This is the accompanying text (if any)
    const imagePath = req.file ? req.file.path : null; // This is the image (if any)

    let generatedText = "";
    console.log("after generatedText "    generatedText);
    // Check if an image is provided and send the image URL to the Gemini API
    if (imagePath) {
      console.log(imagePath);
      const files = [
        await uploadToGemini(imagePath, "image/jpeg"),
      ];
      console.log(files);

      const chatSession = model.startChat({
        generationConfig,
        history: [
          {
            role: "user",
            parts: [
              {
                fileData: {
                  mimeType: files[0].mimeType,
                  fileUri: files[0].uri,
                },
              },
            ],
          },
        ],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.text() );
      res.status(200).json({ reply: result.response.text() });
      next(message);

      // Extract the AI's response related to the image
      // const imageAIResponse = imageResult?.candidates?.[0]?.content || "Could not analyze the image.";
      // console.log("after imageAIResponse");
      // generatedText  = `Image Analysis: ${imageAIResponse}`;
      // console.log("after generatedText "    generatedText);
    }

    // Handle the accompanying text and send it to the Gemini API
    if (message) {
      const chatSession = model.startChat({
        generationConfig,
        history: [],
      });
      const result = await chatSession.sendMessage(message);
      console.log(result.response.candidates);
      console.log(result.response.text());

      // Extract the AI's generated response
      const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || "Could not generate a response.";
      res.status(200).json({ reply: aiResponse });
    }

    // If neither image nor text is provided, return an error
    if (!imagePath && !message) {
      return res.status(400).json({ error: "No image or text provided" });
    }

    // Send the generated text or description back to the frontend
    res.json({ reply: generatedText });
  } catch (error) {
    console.error("Error processing image or text:", error);
    res.status(500).json({ error: "Internal Server Error" });
  }
});



// Start the server
app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

  • Removing markdown
    In the chat application, I used the Marked library to convert Markdown text into HTML for bot messages by including . When the bot sends a message, the code checks the class name and applies textContainer.innerHTML = marked.parse(text); to render Markdown as HTML. For user messages, I used textContainer.textContent = text; to display plain text, ensuring clarity in interactions.

  • Markdown Text

Creating an Information Bot: A Beginner

  • After markdown has been removed

Creating an Information Bot: A Beginner

5. Testing and Debugging

Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application's robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.

5.1 Text queries

Creating an Information Bot: A Beginner

5.2 Image queries

6. Problems faced and conclusions

6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.

Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.

I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.

One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.

Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.

6.2 Conclusion
Working on this project taught me valuable lessons:

  • Gemini API Integration: I developed skills in API integration, particularly using the Gemini API for generating responses based on inputs.

  • Problem-solving: I learned how to systematically debug and troubleshoot issues, improving my resilience in overcoming project obstacles.

  • Time Management: The delays caused by power outages and bugs helped me practice time management and adaptability under pressure.

  • Collaborating for Solutions: Reaching out for help when needed and learning from others was an important takeaway in this project.

  • Practical Experience: The hands-on experience with API and front-end integration improved my proficiency in JavaScript.

版本声明 本文转载于:https://dev.to/kedjuprecious/creating-an-information-bot-a-beginners-guide-htmlcss-javascript-gemini-api-260f?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 如何基于唯一的电子邮件值合并和重塑对象数组?
    如何基于唯一的电子邮件值合并和重塑对象数组?
    在数据操作领域中合并和将对象的数组与唯一的电子邮件值 考虑需要合并两个对象阵列,每个对象都包含电子邮件属性。目标是创建一个包含所有唯一电子邮件值的新数组。 , (对象)[“电子邮件” =>“ [电子邮件  prected]”], (对象)[“电子邮件” =>“ wefe...
    编程 发布于2025-03-25
  • 为什么不使用CSS`content'属性显示图像?
    为什么不使用CSS`content'属性显示图像?
    在Firefox extemers属性为某些图像很大,&& && && &&华倍华倍[华氏华倍华氏度]很少见,却是某些浏览属性很少,尤其是特定于Firefox的某些浏览器未能在使用内容属性引用时未能显示图像的情况。这可以在提供的CSS类中看到:。googlepic { 内容:url(&#...
    编程 发布于2025-03-25
  • 如何使用不同数量列的联合数据库表?
    如何使用不同数量列的联合数据库表?
    合并列数不同的表 当尝试合并列数不同的数据库表时,可能会遇到挑战。一种直接的方法是在列数较少的表中,为缺失的列追加空值。 例如,考虑两个表,表 A 和表 B,其中表 A 的列数多于表 B。为了合并这些表,同时处理表 B 中缺失的列,请按照以下步骤操作: 确定表 B 中缺失的列,并将它们添加到表的末...
    编程 发布于2025-03-25
  • 如何在鼠标单击时编程选择DIV中的所有文本?
    如何在鼠标单击时编程选择DIV中的所有文本?
    在鼠标上选择div文本单击带有文本内容,用户如何使用单个鼠标单击单击div中的整个文本?这允许用户轻松拖放所选的文本或直接复制它。 在单个鼠标上单击的div元素中选择文本,您可以使用以下Javascript函数: function selecttext(canduterid){ if(do...
    编程 发布于2025-03-25
  • 对象拟合:IE和Edge中的封面失败,如何修复?
    对象拟合:IE和Edge中的封面失败,如何修复?
    To resolve this issue, we employ a clever CSS solution that solves the problem:position: absolute;top: 50%;left: 50%;transform: translate(-50%, -50%)...
    编程 发布于2025-03-25
  • 如何克服PHP的功能重新定义限制?
    如何克服PHP的功能重新定义限制?
    克服PHP的函数重新定义限制在PHP中,多次定义一个相同名称的函数是一个no-no。尝试这样做,如提供的代码段所示,将导致可怕的“不能重新列出”错误。 但是,PHP工具腰带中有一个隐藏的宝石:runkit扩展。它使您能够灵活地重新定义函数。 runkit_function_renction_re...
    编程 发布于2025-03-25
  • 如何将MySQL数据库添加到Visual Studio 2012中的数据源对话框中?
    如何将MySQL数据库添加到Visual Studio 2012中的数据源对话框中?
    在Visual Studio 2012 尽管已安装了MySQL Connector v.6.5.4,但无法将MySQL数据库添加到实体框架的“ DataSource对话框”中。为了解决这一问题,至关重要的是要了解MySQL连接器v.6.5.5及以后的6.6.x版本将提供MySQL的官方Visual...
    编程 发布于2025-03-25
  • 如何从Google API中检索最新的jQuery库?
    如何从Google API中检索最新的jQuery库?
    从Google APIS 问题中提供的jQuery URL是版本1.2.6。对于检索最新版本,以前有一种使用特定版本编号的替代方法,它是使用以下语法:获取最新版本:未压缩)While these legacy URLs still remain in use, it is recommended ...
    编程 发布于2025-03-25
  • 如何使用组在MySQL中旋转数据?
    如何使用组在MySQL中旋转数据?
    在关系数据库中使用mySQL组使用mySQL组进行查询结果,在关系数据库中使用MySQL组,转移数据的数据是指重新排列的行和列的重排以增强数据可视化。在这里,我们面对一个共同的挑战:使用组的组将数据从基于行的基于列的转换为基于列。 Let's consider the following ...
    编程 发布于2025-03-25
  • 大批
    大批
    [2 数组是对象,因此它们在JS中也具有方法。 切片(开始):在新数组中提取部分数组,而无需突变原始数组。 令ARR = ['a','b','c','d','e']; // USECASE:提取直到索引作...
    编程 发布于2025-03-25
  • 如何在php中使用卷发发送原始帖子请求?
    如何在php中使用卷发发送原始帖子请求?
    如何使用php 创建请求来发送原始帖子请求,开始使用curl_init()开始初始化curl session。然后,配置以下选项: curlopt_url:请求 [要发送的原始数据指定内容类型,为原始的帖子请求指定身体的内容类型很重要。在这种情况下,它是文本/平原。要执行此操作,请使用包含以下标头...
    编程 发布于2025-03-25
  • 在解散期间,如何处理动态JSON字段类型?
    在解散期间,如何处理动态JSON字段类型?
    在GO 要解决此问题,可以采用一种使用接口类型的类型 - 动态方法。考虑以下JSON数据: { “ mykey”:[[ {obj1}, {obj2} 这是给出的 } [2 mykey []接口{}`json:“ mykey”` } mykey slice元素将被...
    编程 发布于2025-03-25
  • Java是否允许多种返回类型:仔细研究通用方法?
    Java是否允许多种返回类型:仔细研究通用方法?
    在Java中的多个返回类型:一种误解类型:在Java编程中揭示,在Java编程中,Peculiar方法签名可能会出现,可能会出现,使开发人员陷入困境,使开发人员陷入困境。 getResult(string s); ,其中foo是自定义类。该方法声明似乎拥有两种返回类型:列表和E。但这确实是如此吗...
    编程 发布于2025-03-25
  • 我可以将加密从McRypt迁移到OpenSSL,并使用OpenSSL迁移MCRYPT加密数据?
    我可以将加密从McRypt迁移到OpenSSL,并使用OpenSSL迁移MCRYPT加密数据?
    将我的加密库从mcrypt升级到openssl 问题:是否可以将我的加密库从McRypt升级到OpenSSL?如果是这样,如何?答案:是的,可以将您的Encryption库从McRypt升级到OpenSSL。可以使用openssl。附加说明: [openssl_decrypt()函数要求iv参...
    编程 发布于2025-03-25
  • 克服NeoApps.ai的REST API挑战
    克服NeoApps.ai的REST API挑战
    使用标准实践开发REST API是至关重要的,但通常具有挑战性。从确保一致的设计和确保身份验证到管理可扩展性和错误处理,开发人员面临需要时间和专业知识的障碍。 REST API开发中的共同挑战 一致的标准:跨端点保持统一性。 :在不断发展的API时管理向后兼容。 身份验证和安全性:安...
    编程 发布于2025-03-25

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3