Я попробовал Гранит.

титульная страница > программирование > Я попробовал Гранит.

Я попробовал Гранит.

Опубликовано 8 ноября 2024 г.

Просматривать:358

I tried out Granite .

Гранит 3.0

Granite 3.0 — это легкое семейство генеративных языковых моделей с открытым исходным кодом, предназначенное для решения ряда задач корпоративного уровня. Он изначально поддерживает многоязычную функциональность, кодирование, рассуждения и использование инструментов, что делает его подходящим для корпоративных сред.

Я протестировал эту модель, чтобы увидеть, с какими задачами она может справиться.

Настройка среды

Я настроил среду Granite 3.0 в Google Colab и установил необходимые библиотеки с помощью следующих команд:

!pip install torch torchvision torchaudio
!pip install accelerate
!pip install -U transformers

Исполнение

Я протестировал производительность моделей Granite 3.0 2B и 8B.

Модель 2Б

Я использовал модель 2B. Вот пример кода для модели 2B:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.0-2b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to("cuda")
output = model.generate(**input_tokens, max_new_tokens=100)
output = tokenizer.batch_decode(output)
print(output[0])

Выход

userPlease list one IBM Research laboratory located in the United States. You should only output its name and location.
assistant1. IBM Research - Austin, Texas

Модель 8B

Модель 8B можно использовать, заменив 2b на 8b. Вот пример кода без полей ввода роли и пользователя для модели 8B:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.0-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

chat = [
    { "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

input_tokens = tokenizer(chat, add_special_tokens=False, return_tensors="pt").to("cuda")
output = model.generate(**input_tokens, max_new_tokens=100)
generated_text = tokenizer.decode(output[0][input_tokens["input_ids"].shape[1]:], skip_special_tokens=True)
print(generated_text)

Выход

1. IBM Almaden Research Center - San Jose, California

Вызов функции

Я исследовал функцию вызова функций, тестируя ее с помощью фиктивной функции. Здесь get_current_weather определен для возврата ложных данных о погоде.

Фиктивная функция

import json

def get_current_weather(location: str) -> dict:
    """
    Retrieves current weather information for the specified location (default: San Francisco).
    Args:
        location (str): Name of the city to retrieve weather data for.
    Returns:
        dict: Dictionary containing weather information (temperature, description, humidity).
    """
    print(f"Getting current weather for {location}")

    try:
        weather_description = "sample"
        temperature = "20.0"
        humidity = "80.0"

        return {
            "description": weather_description,
            "temperature": temperature,
            "humidity": humidity
        }
    except Exception as e:
        print(f"Error fetching weather data: {e}")
        return {"weather": "NA"}

Быстрое создание

Я создал приглашение для вызова функции:

functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and country code, e.g. San Francisco, US",
                }
            },
            "required": ["location"],
        },
    },
]
query = "What's the weather like in Boston?"
payload = {
    "functions_str": [json.dumps(x) for x in functions]
}
chat = [
    {"role":"system","content": f"You are a helpful assistant with access to the following function calls. Your task is to produce a sequence of function calls necessary to generate response to the user utterance. Use the following function calls as required.{payload}"},
    {"role": "user", "content": query }
]

Генерация ответа

Используя следующий код, я сгенерировал ответ:

instruction_1 = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(instruction_1, return_tensors="pt").to("cuda")
output = model.generate(**input_tokens, max_new_tokens=1024)
generated_text = tokenizer.decode(output[0][input_tokens["input_ids"].shape[1]:], skip_special_tokens=True)
print(generated_text)

Выход

{'name': 'get_current_weather', 'arguments': {'location': 'Boston'}}

Это подтвердило способность модели генерировать правильный вызов функции на основе указанного города.

Спецификация формата для расширенного потока взаимодействия

Granite 3.0 позволяет специфицировать формат для облегчения ответов в структурированных форматах. В этом разделе объясняется, как использовать [ВЫГОВОРЕНИЕ] для ответов и [ДУМАТЬ] для внутренних мыслей.

С другой стороны, поскольку вызов функции выводится в виде обычного текста, может потребоваться реализация отдельного механизма для различения вызовов функций и обычных текстовых ответов.

Указание формата вывода

Вот пример подсказки для управления выводами ИИ:

prompt = """You are a conversational AI assistant that deepens interactions by alternating between responses and inner thoughts.

* Record spoken responses after the [UTTERANCE] tag and inner thoughts after the [THINK] tag.
* Use [UTTERANCE] as a start marker to begin outputting an utterance.
* After [THINK], describe your internal reasoning or strategy for the next response. This may include insights on the user's reaction, adjustments to improve interaction, or further goals to deepen the conversation.
* Important: **Use [UTTERANCE] and [THINK] as a start signal without needing a closing tag.**


Follow these instructions, alternating between [UTTERANCE] and [THINK] formats for responses.

example1:
  [UTTERANCE]Hello! How can I assist you today?[THINK]I’ll start with a neutral tone to understand their needs. Preparing to offer specific suggestions based on their response.[UTTERANCE]Thank you! In that case, I have a few methods I can suggest![THINK]Since I now know what they’re looking for, I'll move on to specific suggestions, maintaining a friendly and approachable tone.
...
example>

Please respond to the following user_input.

Hello! What can you do?

"""

Пример кода выполнения

код для генерации ответа:

chat = [
    { "role": "user", "content": prompt },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

input_tokens = tokenizer(chat, return_tensors="pt").to("cuda")
output = model.generate(**input_tokens, max_new_tokens=1024)
generated_text = tokenizer.decode(output[0][input_tokens["input_ids"].shape[1]:], skip_special_tokens=True)
print(generated_text)

Пример вывода

Вывод следующий:

[UTTERANCE]Hello! I'm here to provide information, answer questions, and assist with various tasks. I can help with a wide range of topics, from general knowledge to specific queries. How can I assist you today?
[THINK]I've introduced my capabilities and offered assistance, setting the stage for the user to share their needs or ask questions.

Теги [UTTERANCE] и [THINK] были успешно использованы, что позволило эффективно форматировать ответ.

В зависимости от приглашения в выводе иногда могут появляться закрывающие теги (например, [/UTTERANCE] или [/THINK]), но в целом формат вывода обычно можно указать успешно.

Пример потокового кода

Давайте также посмотрим, как выводить потоковые ответы.

В следующем коде используются библиотеки asyncio и threading для асинхронной потоковой передачи ответов от Granite 3.0.

import asyncio
from threading import Thread
from typing import AsyncIterator
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TextIteratorStreamer,
)

device = "auto"
model_path = "ibm-granite/granite-3.0-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

async def generate(chat) -> AsyncIterator[str]:
    # Apply chat template and tokenize input
    chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    input_tokens = tokenizer(chat, add_special_tokens=False, return_tensors="pt").to("cuda")

    # Set up the streamer
    streamer = TextIteratorStreamer(
        tokenizer,
        skip_prompt=True,
        skip_special_tokens=True,
    )
    generation_kwargs = dict(
        **input_tokens,
        streamer=streamer,
        max_new_tokens=1024,
    )
    # Generate response in a separate thread
    thread = Thread(target=model.generate, kwargs=generation_kwargs)
    thread.start()

    for output in streamer:
        if not output:
            continue
        await asyncio.sleep(0)
        yield output

# Execute asynchronous generation in the main function
async def main():
    chat = [
        { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
    ]
    generator = generate(chat)
    async for output in generator:  # Use async for to retrieve responses sequentially
        print(output, end="|")

await main()

Пример вывода

Выполнение приведенного выше кода приведет к генерации асинхронных ответов в следующем формате:

1. |IBM |Almaden |Research |Center |- |San |Jose, |California|

Этот пример демонстрирует успешную потоковую передачу. Каждый токен генерируется асинхронно и отображается последовательно, что позволяет пользователям просматривать процесс генерации в режиме реального времени.

Краткое содержание

Granite 3.0 обеспечивает достаточно сильные отклики даже с моделью 8B. Функции вызова функций и спецификации формата также работают достаточно хорошо, что указывает на их потенциал для широкого спектра приложений.

Заявление о выпуске Эта статья воспроизведена по адресу: https://dev.to/m_sea_bass/i-tried-out-granite-30-53lm?1. Если есть какие-либо нарушения, свяжитесь с [email protected], чтобы удалить ее.

Последний учебник Более>