」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 是時候離開了嗎?重建的時間到了!製作推特

是時候離開了嗎?重建的時間到了!製作推特

發佈於2024-09-01
瀏覽:791

The most critical features of a new social network for users fed up with Musk and Twitter, are as follows;

  • Import Twitter's archive.zip file
  • Easy as possible to sign up
  • Similar if not identical user features

Less critical but definitely helpful features of the platform;

  • Ethically monetised and moderated
  • Make use of AI to help identify problematic content
  • Blue tick with the use of Onfido or SMART identity services

In this post, we'll focus on the first feature. Importing Twitter's archive.zip file.

The file

Twitter haven't made your data all that easy to obtain. It's great that they give you access to it (legally, they have to). The format is crap.

It actually comes as a mini web archive and all your data is stuck in JavaScript files. It is more of a web app than convenient storage of data.

When you open up the Your archive.html file you get something like this;

Time to Leave? Time to Rebuild! Making Twitter

Note: I made the descision pretty early on to build using Next.js for the site, Go and GraphQL for the backend.

So, what do you do when your data isn't structured data?

Well, you parse it.

Creating a basic Go script

Head on over to the official docs on how to get started with Go, and set up your project directory.

We're going to hack this process together. It seems one of the most important features to attract people who feel too attached to TwitterX.

First step is to create a main.go file. In this file we'll GO (hah) and do some STUFF;

  • os.Args: This is a slice that holds command-line arguments.
  • os.Args[0] is the program's name, and os.Args[1] is the first argument passed to the program.
  • Argument Check: The function checks if at least one argument is provided. If not, it prints a message asking for a path.
  • run function: This function simply prints the path passed to it, for now.
package main

import (
    "fmt"
    "os"
)

func run(path string) {
    fmt.Println("Path:", path)
}

func main() {
    if len(os.Args) 



At every step, we'll run the file like so;

go run main.go twitter.zip

If you don't have a Twitter archive export, create a simple manifest.js file and give it the following JavaScript.

window.__THAR_CONFIG = {
  "userInfo" : {
    "accountId" : "1234567890",
    "userName" : "lukeocodes",
    "displayName" : "Luke ✨"
  },
};

Compress that into your twitter.zip file that we'll use throughout.

Read a Zip file

The next step is to read the contents of the zip file. We want to do this as efficiently as possible, and reduce time data is extracted on the disk.

There are many files in the zip that don't need to be extracted, too.

We'll edit the main.go file;

  • Opening the ZIP file: The zip.OpenReader() function is used to open the ZIP file specified by path.
  • Iterating through the files: The function loops over each file in the ZIP archive using r.File, which is a slice of zip.File. The Name property of each file is printed.
package main

import (
    "archive/zip"
    "fmt"
    "log"
    "os"
)

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("Files in the zip archive:")
    for _, f := range r.File {
        fmt.Println(f.Name)
    }
}

func main() {
    // Example usage
    if len(os.Args) 



JS only! We're hunting structured data

This archive file is seriously unhelpful. We want to check for just .js files, and only in the /data directory.

  • Opening the ZIP file: The ZIP file is opened using zip.OpenReader().
  • Checking the /data directory: The program iterates through the files in the ZIP archive. It uses strings.HasPrefix(f.Name, "data/") to check if the file resides in the /data directory.
  • Finding .js files: The program also checks if the file has a .js extension using filepath.Ext(f.Name).
  • Reading and printing contents: If a .js file is found in the /data directory, the program reads and prints its contents.
package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Print the contents
    fmt.Printf("Contents of %s:\n", file.Name)
    fmt.Println(string(contents))
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



Parse the JS! We want that data

We've found the structured data. Now we need to parse it. The good news is there are existing packages for using JavaScript inside Go. We'll be using goja.

If you're on this section, familiar with Goja, and you've seen the output of the file, you may see we're going to have errors in our future.

Install goja:

go get github.com/dop251/goja

Now we're going to edit the main.go file to do the following;

  • Parsing with goja: The goja.New() function creates a new JavaScript runtime, and vm.RunString(processedContents) runs the processed JavaScript code within that runtime.
  • Handle errors in parsing
package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(contents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    fmt.Printf("Parsed JavaScript file: %s\n", file.Name)
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



SUPRISE. window is not defined might be a familiar error. Basically goja runs an EMCA runtime. window is browser context and sadly unavailable.

ACTUALLY Parse the JS

I went through a few issues at this point. Including not being able to return data because it's a top level JS file.

Long story short, we need to modify the contents of the files before loading them into the runtime.

Let's modify the main.go file;

  • reConfig: A regex that matches any assignment of the form window.someVariable = { and replaces it with var data = {.
  • reArray: A regex that matches any assignment of the form window.someObject.someArray = [ and replaces it with var data = [
  • Extracting data: Running the script, we use vm.Get("data") to retrieve the value of the data variable from the JavaScript context.
package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc)
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Output the parsed data
    fmt.Printf("Processed JavaScript file: %s\n", file.Name)
    fmt.Printf("Data extracted: %v\n", value.Export())
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



Hurrah. Assuming I didn't muck up the copypaste into this post, you should now see a rather ugly print of the struct data from Go.

JSON would be nice

Edit the main.go file to marshall the JSON output.

  • Use value.Export() to get the data from the struct
  • Use json.MarshallIndent() for pretty printed JSON (use json.Marshall if you want to minify the output).
package main

import (
    "archive/zip"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated :/
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Convert the data to a Go-native type
    data := value.Export()

    // Marshal the Go-native type to JSON
    jsonData, err := json.MarshalIndent(data, "", "  ")
    if err != nil {
        log.Fatalf("Error marshalling data to JSON: %v", err)
    }

    // Output the JSON data
    fmt.Println(string(jsonData))
}

func run(zipFilePath string) {
    // Open the zip file
    r, err := zip.OpenReader(zipFilePath)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



That's it!

go run main.go twitter.zip
}
  "userInfo": {
    "accountId": "1234567890",
    "displayName": "Luke ✨",
    "userName": "lukeocodes"
  }
}

Open source

I'll be open sourcing a lot of this work so that others who want to parse the data from the archive, can store it how they like.

版本聲明 本文轉載於:https://dev.to/lukeocodes/time-to-leave-time-to-rebuild-making-twitter20-4jgc?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 如何正確使用與PDO參數的查詢一樣?
    如何正確使用與PDO參數的查詢一樣?
    在pdo 中使用類似QUERIES在PDO中的Queries時,您可能會遇到類似疑問中描述的問題:此查詢也可能不會返回結果,即使$ var1和$ var2包含有效的搜索詞。錯誤在於不正確包含%符號。 通過將變量包含在$ params數組中的%符號中,您確保將%字符正確替換到查詢中。沒有此修改,PD...
    程式設計 發佈於2025-04-17
  • 在UTF8 MySQL表中正確將Latin1字符轉換為UTF8的方法
    在UTF8 MySQL表中正確將Latin1字符轉換為UTF8的方法
    在UTF8表中將latin1字符轉換為utf8 ,您遇到了一個問題,其中含義的字符(例如,“jáuòiñe”)在utf8 table tabled tablesset中被extect(例如,“致電。為了解決此問題,您正在嘗試使用“ mb_convert_encoding”和“ iconv”轉換受...
    程式設計 發佈於2025-04-17
  • 查找當前執行JavaScript的腳本元素方法
    查找當前執行JavaScript的腳本元素方法
    如何引用當前執行腳本的腳本元素在某些方案中理解問題在某些方案中,開發人員可能需要將其他腳本動態加載其他腳本。但是,如果Head Element尚未完全渲染,則使用document.getElementsbytagname('head')[0] .appendChild(v)的常規方...
    程式設計 發佈於2025-04-17
  • 獲取HTML元素真實寬高方法揭秘
    獲取HTML元素真實寬高方法揭秘
    確定HTML元素的實際寬度和高度在瀏覽器的視口內的HTML元素,必須準確地計算出其尺寸。這是如何有效地檢索元素的實際寬度和高度。 使用元素屬性 兩種技術都在現代瀏覽器中支持, including:ChromeFirefoxEdgeSafariconst width = document.getEl...
    程式設計 發佈於2025-04-17
  • 解決MySQL插入Emoji時出現的\\"字符串值錯誤\\"異常
    解決MySQL插入Emoji時出現的\\"字符串值錯誤\\"異常
    Resolving Incorrect String Value Exception When Inserting EmojiWhen attempting to insert a string containing emoji characters into a MySQL database us...
    程式設計 發佈於2025-04-17
  • 使用Wildcard Imports對不對?
    使用Wildcard Imports對不對?
    避免通配符導入的情況合格的名稱優於Barenames。最好使用PYQT4.QTCORE IMPICT QT而不是從PYQT4 Import QT中明確指定您要導入的模塊。合格的名稱使跟踪代碼依賴性和調試錯誤變得更容易。 他們還降低了模塊之間碰撞的風險。如果兩個模塊定義了具有相同名稱的函數,則需要明確...
    程式設計 發佈於2025-04-17
  • Vitest、MSW與Playwright在React+Vite+TS項目中的配置 - 第三部分
    Vitest、MSW與Playwright在React+Vite+TS項目中的配置 - 第三部分
    1。安裝劇作家 要設置playwright,運行以下命令: NPM初始化劇作@最新 npm init playwright@latest 您將通過終端中的設置嚮導引導。當提示使用“將端到端測試放在哪裡?” ,您可以將其設置為src/tests(如較早的教程中的建議)。 [2 ...
    程式設計 發佈於2025-04-17
  • 如何用單一SQL查詢獲取varchar字段的大小?
    如何用單一SQL查詢獲取varchar字段的大小?
    在SQL中檢索varchar [n]字段的大小,並使用一個查詢確定VARCHAR字段的長度對於數據管理和存儲優化。在SQL中,您可以執行單個語句來檢索此信息。 syntax:說明: 字段的名稱。 data_type:這將提供該列的數據類型,該列應該為'varchar'。 。 ...
    程式設計 發佈於2025-04-17
  • Python不會對超範圍子串切片報錯的原因
    Python不會對超範圍子串切片報錯的原因
    在python中用索引切片範圍:二重性和空序列索引單個元素不同,該元素會引起錯誤,切片在序列的邊界之外沒有。 這種行為源於索引和切片之間的基本差異。索引一個序列,例如“示例” [3],返回一個項目。但是,切片序列(例如“示例” [3:4])返回項目的子序列。 索引不存在的元素時,例如“示例” [9...
    程式設計 發佈於2025-04-17
  • 如何實時捕獲和流媒體以進行聊天機器人命令執行?
    如何實時捕獲和流媒體以進行聊天機器人命令執行?
    在開發能夠執行命令的chatbots的領域中,實時從命令執行實時捕獲Stdout,一個常見的需求是能夠檢索和顯示標準輸出(stdout)在cath cath cant cant cant cant cant cant cant cant interfaces in Chate cant inter...
    程式設計 發佈於2025-04-17
  • 為什麼使用固定定位時,為什麼具有100%網格板柱的網格超越身體?
    為什麼使用固定定位時,為什麼具有100%網格板柱的網格超越身體?
    網格超過身體,用100%grid-template-columns 為什麼在grid-template-colms中具有100%的顯示器,當位置設置為設置的位置時,grid-template-colly修復了? 問題: 考慮以下CSS和html: class =“ snippet-code”> ...
    程式設計 發佈於2025-04-17
  • 如何簡化PHP中的JSON解析以獲取多維陣列?
    如何簡化PHP中的JSON解析以獲取多維陣列?
    php 試圖在PHP中解析JSON數據的JSON可能具有挑戰性,尤其是在處理多維數組時。 To simplify the process, it's recommended to parse the JSON as an array rather than an object.To do...
    程式設計 發佈於2025-04-17
  • 無管理C++客戶端如何與WCF服務通信?
    無管理C++客戶端如何與WCF服務通信?
    在不管理的C客戶端與WCF服務之間的差距可以通過在託管c中寫入橋樑DLL來與WCF服務無縫地通信。這是建立此連接的綜合指南: 1。創建端點接口和類:定義了wcf Service的C#接口(ihelloservice)及其相應的實現類(HelloService)。創建Windows NT服務:創建一...
    程式設計 發佈於2025-04-17
  • Java中Lambda表達式為何需要“final”或“有效final”變量?
    Java中Lambda表達式為何需要“final”或“有效final”變量?
    Lambda Expressions Require "Final" or "Effectively Final" VariablesThe error message "Variable used in lambda expression shou...
    程式設計 發佈於2025-04-17
  • 可以在純CS中將多個粘性元素彼此堆疊在一起嗎?
    可以在純CS中將多個粘性元素彼此堆疊在一起嗎?
    [2这里: https://webthemez.com/demo/sticky-multi-header-scroll/index.html </main> <section> { display:grid; grid-template-...
    程式設計 發佈於2025-04-17

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3