」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 建立您自己的 GitHub Copilot:程式碼完成工具逐步指南

建立您自己的 GitHub Copilot:程式碼完成工具逐步指南

發佈於2024-11-08
瀏覽:173

Ever thought building a code completion tool like GitHub Copilot was complex? Surprisingly, it’s not as hard as it seems!

As an engineer, I’ve always been fascinated by how code completion tools work under the hood. So, I reverse-engineered the process to see if I could build one myself.

Here is one i build myself and published it - LLM-Autocompleter

With AI-assisted tools becoming the norm in software development, creating your own code completion tool is a great way to learn about Language Server Protocol (LSP), APIs, and integration with advanced models like OpenAI's GPT. Plus, it’s an incredibly rewarding project.

Code completion tools essentially combine a Language Server Protocol (LSP) server with inline code completion mechanisms from platforms like VS Code. In this tutorial, we'll leverage VS Code’s inline completion API and build our own LSP server.

Before we dive in, let's understand what an LSP Server is.

Language Server Protocol (LSP)

An LSP server is a backend service that provides language-specific features to text editors or Integrated Development Environments (IDEs). It acts as a bridge between the editor (the client) and the language-specific tools, delivering features like:

  • Code completion (suggesting snippets of code as you type),

  • Go-to definition (navigating to the part of the code where a symbol is defined),

  • Error checking (highlighting syntax errors in real-time).

The idea behind the Language Server Protocol (LSP) is to standardize the protocol for how such servers and development tools communicate. This way, a single Language Server can be re-used in multiple development tools and LSP is just a protocol.

By standardizing how these servers communicate with editors through LSP, developers can create language-specific features that work seamlessly across a variety of platforms, like VS Code, Sublime Text, and even Vim.

Building Your Own GitHub Copilot: A Step-by-Step Guide to Code Completion Tools

Now that you understand the basics of LSP, let’s dive into building our own code completion tool, step by step.

We’ll begin by using a sample inline completion extension provided by VS Code. You can clone it directly from GitHub:

vscode-sample-inlinecompletion

now lets go we setting up the lsp-server, you can follow the below structure

.
├── client // Language Client
│   ├── src
│   │   ├── test // End to End tests for Language Client / Server
│   │   └── extension.ts // Language Client entry point
├── package.json // The extension manifest.
└── server // Language Server
    └── src
        └── server.ts // Language Server entry point

for more information you take a look into as well lsp-sample

Code

I would be giving you bit's of code, You have to stitch things together i want you guys to learn. The below image shows what we are going to build.

Building Your Own GitHub Copilot: A Step-by-Step Guide to Code Completion Tools

Lets go to client/src/extension.ts and remove everything from activate function

export function activate(context: ExtensionContext) {
}

lets start the setup

  1. Creating an lsp client and start it.
  • serverModule: Points to the path of the language server’s main script.
  • debugOptions: Useful for running the server in debug mode.

extension.ts

export function activate(context: ExtensionContext) {
    const serverModule = context.asAbsolutePath(
        path.join("server", "out", "server.js")
    );

    const debugOptions = { execArgv: ['--nolazy', '--
 inspect=6009'] };

  // communication with the server using Stdio
    const serverOptions: ServerOptions = {
        run: {
            module: serverModule,
            transport: TransportKind.stdio,
        },
        debug: {
            module: serverModule,
            transport: TransportKind.stdio,
            options: debugOptions
        }
    };

        const clientOptions: LanguageClientOptions = {
        documentSelector: [{ scheme: 'file' }],
        initializationOptions: serverConfiguration
    };


     client = new LanguageClient(
        'LSP Server Name',
        serverOptions,
        clientOptions
    );

  client.start();
}
  1. Receive data on lsp server go to server/src/server.ts

Some bit for information

we have different types of protocol we can follow to communicate between server and client.
for more information you can go to microsoft-lsp-docs

Why stdio? Stdio is one of the most widely supported communication protocols between clients and servers. It allows the LSP server we’re building to work not only in VS Code but also in other editors like Vim and Sublime Text.

server.ts

const methodStore: Record = {
  exit, 
  initialize,
  shutdown,
};

process.stdin.on("data", async (bufferChuck) => {
    buffer  = bufferChuck;

    while (true) {
        try {
            // Check for the Content-Length line
            const lengthMatch = buffer.match(/Content-Length: (\d )\r\n/);
            if (!lengthMatch) break;

            const contentLength = parseInt(lengthMatch[1], 10);
            const messageStart = buffer.indexOf("\r\n\r\n")   4;

            // Continue unless the full message is in the buffer
            if (buffer.length 



initialize.ts

export const initialize = (message: RequestMessage): InitializeResult => {

    return {
        capabilities: {
            completionProvider: {
                resolveProvider: true
            },
            textDocumentSync: TextDocumentSyncKind.Incremental,
            codeActionProvider: {
                resolveProvider: true 
            }
        },
        serverInfo: {
            name: "LSP-Server",
            version: "1.0.0",
        },
    };
};

exit.ts

export const exit = () => {
    process.exit(0);
  };

shutdown.ts

export const shutdown = () => {
    return null;
  };

Once done with basic function, you can now run the vscode in debugging mode using F5 key on keyboard or follow debugging-guide


Now lets start with adding in-line provider and get the request and response according

Let's add a new method into the methodStore

server.ts

const methodStore: Record = {
  exit, 
  initialize,
  shutdown,
  "textDocument/generation": generation
};

generation.ts

export const generation = async (message: any) => {
    if(!message && message !== undefined) return {};

    const text = message.params.textDocument.text as string;

    if(!text) return {};

        const cursorText = getNewCursorText(text, message.params.position.line, message.params.position.character);

  const response = await getResponseFromOpenAI(cursorText, message.params.fsPath);

 return {
    generatedText: response,
  }

}

function getNewCursorText(text: string, line: number, character: number): string {
    const lines = text.split('\n');
    if (line = lines.length) return text;

    const targetLine = lines[line];
    if (character  targetLine.length) return text;

    lines[line] = targetLine.slice(0, character)   ''   targetLine.slice(character);
    return lines.join('\n');
}


const getResponseFromOpenAI = async (text: string, fsPath: stiring): Promise => {
     const message = {
          "role": "user",
          "content": text
    };

   const systemMetaData: Paramaters = {
    max_token: 128,
    max_context: 1024,
    messages: [],
    fsPath: fsPath
   } 

   const messages = [systemPrompt(systemMetaData), message]

   const chatCompletion: OpenAI.Chat.ChatCompletion | undefined = await this.open_ai_client?.chat.completions.create({
            messages: messages,
            model: "gpt-3.5-turbo",
            max_tokens: systemMetaData?.max_tokens ?? 128,
        });


        if (!chatCompletion) return "";

        const generatedResponse = chatCompletion.choices[0].message.content;

        if (!generatedResponse) return "";

        return generatedResponse;
}

template.ts

interface Parameters {
    max_tokens: number;
    max_context: number;
    messages: any[];
    fsPath: string;
}

 export const systemPrompt = (paramaters: Parameters | null) => {
    return {
        "role": "system",
        "content": `
        Instructions:
            - You are an AI programming assistant.
            - Given a piece of code with the cursor location marked by , replace  with the correct code.
            - First, think step-by-step.
            - Describe your plan for what to build in pseudocode, written out in great detail.
            - Then output the code replacing the .
            - Ensure that your completion fits within the language context of the provided code snippet.
            - Ensure, completion is what ever is needed, dont write beyond 1 or 2 line, unless the  is on start of a function, class or any control statment(if, switch, for, while).

            Rules:
            - Only respond with code.
            - Only replace ; do not include any previously written code.
            - Never include  in your response.
            - Handle ambiguous cases by providing the most contextually appropriate completion.
            - Be consistent with your responses.
            - You should only generate code in the language specified in the META_DATA.
            - Never mix text with code.
            - your code should have appropriate spacing.

            META_DATA: 
            ${paramaters?.fsPath}`
    };  
};

Let's now register the inline providers

extension.ts

import {languages} from "vscode";


function getConfiguration(configName: string) {
    if(Object.keys(workspace.getConfiguration(EXTENSION_ID).get(configName)).length > 0){
        return workspace.getConfiguration(EXTENSION_ID).get(configName);
    }
    return null;
}

const inLineCompletionConfig = getConfiguration("inlineCompletionConfiguration");

export function activate(context: ExtensionContext) {
 // OTHER CODE

  languages.registerInlineCompletionItemProvider(
        { pattern: "**" },
        {
            provideInlineCompletionItems: (document: TextDocument, position: Position) => {
                const mode = inLineCompletionConfig["mode"] || 'slow';
                return provideInlineCompletionItems(document, position, mode);
            },
        }

    );

} 



let lastInlineCompletion = Date.now();
let lastPosition: Position | null = null;
let inlineCompletionRequestCounter = 0;

const provideInlineCompletionItems = async (document: TextDocument, position: Position, mode: 'fast' | 'slow') => {
    const params = {
        textDocument: {
            uri: document.uri.toString(),
            text: document.getText(),
        },
        position: position,
        fsPath: document.uri.fsPath.toString()
    };

    inlineCompletionRequestCounter  = 1;
    const localInCompletionRequestCounter = inlineCompletionRequestCounter;
    const timeSinceLastCompletion = (Date.now() - lastInlineCompletion) / 1000;
    const minInterval = mode === 'fast' ? 0 : 1 / inLineCompletionConfig["maxCompletionsPerSecond"];

    if (timeSinceLastCompletion  setTimeout(r, (minInterval - timeSinceLastCompletion) * 1000));
    }

    if (inlineCompletionRequestCounter === localInCompletionRequestCounter) {
        lastInlineCompletion = Date.now();

        let cancelRequest = CancellationToken.None;
        if (lastPosition && position.isAfter(lastPosition)) {
            cancelRequest = CancellationToken.Cancelled;
        }
        lastPosition = position;

        try {
            const result = await client.sendRequest("textDocument/generation", params, cancelRequest);


            const snippetCode = new SnippetString(result["generatedText"]);
            return [new InlineCompletionItem(snippetCode)];
        } catch (error) {
            console.error("Error during inline completion request", error);
            client.sendNotification("window/showMessage", {
                type: 1, // Error type
                message: "An error occurred during inline completion: "   error.message
            });
            return [];
        }
    } else {
        return [];
    }
};

This blog provides the foundation you need to build your own code completion tool, but the journey doesn’t end here. I encourage you to experiment, research, and improve upon this code, exploring different features of LSP and AI to tailor the tool to your needs.

Whoever is trying to implement this i want them to learn, research and stitch things together.

What You've Learned

  1. Understanding LSP Servers: You’ve learned what an LSP server is, how it powers language-specific tools, and why it’s critical for cross-editor support.

  2. Building VS Code Extensions: You’ve explored how to integrate code completions into VS Code using APIs.

  3. AI-Driven Code Completion: By connecting to OpenAI’s GPT models, you’ve seen how machine learning can enhance developer productivity with intelligent suggestions.

If you reach here, i love to know what you have learned.

Please Hit a like if you learned something new today from my blog.

Connect with me- linked-In

版本聲明 本文轉載於:https://dev.to/nayanraj-adhikary/building-your-own-github-copilot-a-step-by-step-guide-to-code-completion-tools-56gi?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 哪種在JavaScript中聲明多個變量的方法更可維護?
    哪種在JavaScript中聲明多個變量的方法更可維護?
    在JavaScript中聲明多個變量:探索兩個方法在JavaScript中,開發人員經常遇到需要聲明多個變量的需要。對此的兩種常見方法是:在單獨的行上聲明每個變量: 當涉及性能時,這兩種方法本質上都是等效的。但是,可維護性可能會有所不同。 第一個方法被認為更易於維護。每個聲明都是其自己的語句,使...
    程式設計 發佈於2025-04-09
  • \“(1)vs.(;;):編譯器優化是否消除了性能差異?\”
    \“(1)vs.(;;):編譯器優化是否消除了性能差異?\”
    答案: 在大多數現代編譯器中,while(1)和(1)和(;;)之間沒有性能差異。編譯器: perl: 1 輸入 - > 2 2 NextState(Main 2 -E:1)V-> 3 9 Leaveloop VK/2-> A 3 toterloop(next-> 8 last-> 9 ...
    程式設計 發佈於2025-04-09
  • 如何使用組在MySQL中旋轉數據?
    如何使用組在MySQL中旋轉數據?
    在關係數據庫中使用mySQL組使用mySQL組進行查詢結果,在關係數據庫中使用MySQL組,轉移數據的數據是指重新排列的行和列的重排以增強數據可視化。在這裡,我們面對一個共同的挑戰:使用組的組將數據從基於行的基於列的轉換為基於列。 Let's consider the following ...
    程式設計 發佈於2025-04-09
  • 如何處理PHP文件系統功能中的UTF-8文件名?
    如何處理PHP文件系統功能中的UTF-8文件名?
    在PHP的Filesystem functions中處理UTF-8 FileNames 在使用PHP的MKDIR函數中含有UTF-8字符的文件很多flusf-8字符時,您可能會在Windows Explorer中遇到comploreer grounder grounder grounder gro...
    程式設計 發佈於2025-04-09
  • 為什麼我在Silverlight Linq查詢中獲得“無法找到查詢模式的實現”錯誤?
    為什麼我在Silverlight Linq查詢中獲得“無法找到查詢模式的實現”錯誤?
    查詢模式實現缺失:解決“無法找到”錯誤在銀光應用程序中,嘗試使用LINQ建立錯誤的數據庫連接的嘗試,無法找到以查詢模式的實現。 ”當省略LINQ名稱空間或查詢類型缺少IEnumerable 實現時,通常會發生此錯誤。 解決問題來驗證該類型的質量是至關重要的。在此特定實例中,tblpersoon可能...
    程式設計 發佈於2025-04-09
  • 如何簡化PHP中的JSON解析以獲取多維陣列?
    如何簡化PHP中的JSON解析以獲取多維陣列?
    php 試圖在PHP中解析JSON數據的JSON可能具有挑戰性,尤其是在處理多維數組時。要簡化過程,建議將JSON作為數組而不是對象解析。 執行此操作,將JSON_DECODE函數與第二個參數設置為true:[&&&&& && &&&&& json = JSON = JSON_DECODE($ ...
    程式設計 發佈於2025-04-09
  • PHP陣列鍵值異常:了解07和08的好奇情況
    PHP陣列鍵值異常:了解07和08的好奇情況
    PHP數組鍵值問題,使用07&08 在給定數月的數組中,鍵值07和08呈現令人困惑的行為時,就會出現一個不尋常的問題。運行print_r($月)返回意外結果:鍵“ 07”丟失,而鍵“ 08”分配給了9月的值。 此問題源於PHP對領先零的解釋。當一個數字帶有0(例如07或08)的前綴時,PHP將...
    程式設計 發佈於2025-04-09
  • 版本5.6.5之前,使用current_timestamp與時間戳列的current_timestamp與時間戳列有什麼限制?
    版本5.6.5之前,使用current_timestamp與時間戳列的current_timestamp與時間戳列有什麼限制?
    在時間戳列上使用current_timestamp或MySQL版本中的current_timestamp或在5.6.5 此限制源於遺留實現的關注,這些限制需要對當前的_timestamp功能進行特定的實現。 創建表`foo`( `Productid` int(10)unsigned not ...
    程式設計 發佈於2025-04-09
  • 如何配置Pytesseract以使用數字輸出的單位數字識別?
    如何配置Pytesseract以使用數字輸出的單位數字識別?
    Pytesseract OCR具有單位數字識別和僅數字約束 在pytesseract的上下文中,在配置tesseract以識別單位數字和限制單個數字和限制輸出對數字可能會提出質疑。 To address this issue, we delve into the specifics of Te...
    程式設計 發佈於2025-04-09
  • 如何使用Python有效地以相反順序讀取大型文件?
    如何使用Python有效地以相反順序讀取大型文件?
    在python 反向行讀取器生成器 == ord('\ n'): 緩衝區=緩衝區[:-1] 剩餘_size- = buf_size lines = buffer.split('\ n'....
    程式設計 發佈於2025-04-09
  • 如何使用PHP從XML文件中有效地檢索屬性值?
    如何使用PHP從XML文件中有效地檢索屬性值?
    從php $xml = simplexml_load_file($file); foreach ($xml->Var[0]->attributes() as $attributeName => $attributeValue) { echo $attributeName,...
    程式設計 發佈於2025-04-09
  • 如何將來自三個MySQL表的數據組合到新表中?
    如何將來自三個MySQL表的數據組合到新表中?
    mysql:從三個表和列的新表創建新表 答案:為了實現這一目標,您可以利用一個3-way Join。 選擇p。 *,d.content作為年齡 來自人為p的人 加入d.person_id = p.id上的d的詳細信息 加入T.Id = d.detail_id的分類法 其中t.taxonomy ...
    程式設計 發佈於2025-04-09
  • 如何使用Python理解有效地創建字典?
    如何使用Python理解有效地創建字典?
    在python中,詞典綜合提供了一種生成新詞典的簡潔方法。儘管它們與列表綜合相似,但存在一些顯著差異。 與問題所暗示的不同,您無法為鑰匙創建字典理解。您必須明確指定鍵和值。 For example:d = {n: n**2 for n in range(5)}This creates a dict...
    程式設計 發佈於2025-04-09
  • 如何使用不同數量列的聯合數據庫表?
    如何使用不同數量列的聯合數據庫表?
    合併列數不同的表 當嘗試合併列數不同的數據庫表時,可能會遇到挑戰。一種直接的方法是在列數較少的表中,為缺失的列追加空值。 例如,考慮兩個表,表 A 和表 B,其中表 A 的列數多於表 B。為了合併這些表,同時處理表 B 中缺失的列,請按照以下步驟操作: 確定表 B 中缺失的列,並將它們添加到表的...
    程式設計 發佈於2025-04-09
  • eval()vs. ast.literal_eval():對於用戶輸入,哪個Python函數更安全?
    eval()vs. ast.literal_eval():對於用戶輸入,哪個Python函數更安全?
    稱量()和ast.literal_eval()中的Python Security 在使用用戶輸入時,必須優先確保安全性。強大的Python功能Eval()通常是作為潛在解決方案而出現的,但擔心其潛在風險。 This article delves into the differences betwee...
    程式設計 發佈於2025-04-09

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3