떠날 시간인가? 재건할 시간입니다! 트위터 만들기

첫 장 > 프로그램 작성 > 떠날 시간인가? 재건할 시간입니다! 트위터 만들기

떠날 시간인가? 재건할 시간입니다! 트위터 만들기

2024-09-01에 게시됨

검색:848

The most critical features of a new social network for users fed up with Musk and Twitter, are as follows;

Import Twitter's archive.zip file
Easy as possible to sign up
Similar if not identical user features

Less critical but definitely helpful features of the platform;

Ethically monetised and moderated
Make use of AI to help identify problematic content
Blue tick with the use of Onfido or SMART identity services

In this post, we'll focus on the first feature. Importing Twitter's archive.zip file.

The file

Twitter haven't made your data all that easy to obtain. It's great that they give you access to it (legally, they have to). The format is crap.

It actually comes as a mini web archive and all your data is stuck in JavaScript files. It is more of a web app than convenient storage of data.

When you open up the Your archive.html file you get something like this;

Time to Leave? Time to Rebuild! Making Twitter

Note: I made the descision pretty early on to build using Next.js for the site, Go and GraphQL for the backend.

So, what do you do when your data isn't structured data?

Well, you parse it.

Creating a basic Go script

Head on over to the official docs on how to get started with Go, and set up your project directory.

We're going to hack this process together. It seems one of the most important features to attract people who feel too attached to TwitterX.

First step is to create a main.go file. In this file we'll GO (hah) and do some STUFF;

os.Args: This is a slice that holds command-line arguments.
os.Args[0] is the program's name, and os.Args[1] is the first argument passed to the program.
Argument Check: The function checks if at least one argument is provided. If not, it prints a message asking for a path.
run function: This function simply prints the path passed to it, for now.

package main

import (
    "fmt"
    "os"
)

func run(path string) {
    fmt.Println("Path:", path)
}

func main() {
    if len(os.Args) 



At every step, we'll run the file like so;


go run main.go twitter.zip




If you don't have a Twitter archive export, create a simple manifest.js file and give it the following JavaScript.


window.__THAR_CONFIG = {
  "userInfo" : {
    "accountId" : "1234567890",
    "userName" : "lukeocodes",
    "displayName" : "Luke ✨"
  },
};




Compress that into your twitter.zip file that we'll use throughout.


  
  
  Read a Zip file


The next step is to read the contents of the zip file. We want to do this as efficiently as possible, and reduce time data is extracted on the disk.

There are many files in the zip that don't need to be extracted, too.

We'll edit the main.go file;

Opening the ZIP file: The zip.OpenReader() function is used to open the ZIP file specified by path.
Iterating through the files: The function loops over each file in the ZIP archive using r.File, which is a slice of zip.File. The Name property of each file is printed.

package main

import (
    "archive/zip"
    "fmt"
    "log"
    "os"
)

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("Files in the zip archive:")
    for _, f := range r.File {
        fmt.Println(f.Name)
    }
}

func main() {
    // Example usage
    if len(os.Args) 




  
  
  JS only! We're hunting structured data


This archive file is seriously unhelpful. We want to check for just .js files, and only in the /data directory.

Opening the ZIP file: The ZIP file is opened using zip.OpenReader().
Checking the /data directory: The program iterates through the files in the ZIP archive. It uses strings.HasPrefix(f.Name, "data/") to check if the file resides in the /data directory.
Finding .js files: The program also checks if the file has a .js extension using filepath.Ext(f.Name).
Reading and printing contents: If a .js file is found in the /data directory, the program reads and prints its contents.

package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Print the contents
    fmt.Printf("Contents of %s:\n", file.Name)
    fmt.Println(string(contents))
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 




  
  
  Parse the JS! We want that data


We've found the structured data. Now we need to parse it. The good news is there are existing packages for using JavaScript inside Go. We'll be using goja.

If you're on this section, familiar with Goja, and you've seen the output of the file, you may see we're going to have errors in our future.

Install goja:


go get github.com/dop251/goja




Now we're going to edit the main.go file to do the following;

Parsing with goja: The goja.New() function creates a new JavaScript runtime, and vm.RunString(processedContents) runs the processed JavaScript code within that runtime.
Handle errors in parsing

package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(contents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    fmt.Printf("Parsed JavaScript file: %s\n", file.Name)
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



SUPRISE. window is not defined might be a familiar error. Basically goja runs an EMCA runtime. window is browser context and sadly unavailable.


  
  
  ACTUALLY Parse the JS


I went through a few issues at this point. Including not being able to return data because it's a top level JS file.

Long story short, we need to modify the contents of the files before loading them into the runtime.

Let's modify the main.go file;

reConfig: A regex that matches any assignment of the form window.someVariable = { and replaces it with var data = {.
reArray: A regex that matches any assignment of the form window.someObject.someArray = [ and replaces it with var data = [
Extracting data: Running the script, we use vm.Get("data") to retrieve the value of the data variable from the JavaScript context.

package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc)
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Output the parsed data
    fmt.Printf("Processed JavaScript file: %s\n", file.Name)
    fmt.Printf("Data extracted: %v\n", value.Export())
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



Hurrah. Assuming I didn't muck up the copypaste into this post, you should now see a rather ugly print of the struct data from Go.


  
  
  JSON would be nice


Edit the main.go file to marshall the JSON output.

Use value.Export() to get the data from the struct
Use json.MarshallIndent() for pretty printed JSON (use json.Marshall if you want to minify the output).

package main

import (
    "archive/zip"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated :/
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Convert the data to a Go-native type
    data := value.Export()

    // Marshal the Go-native type to JSON
    jsonData, err := json.MarshalIndent(data, "", "  ")
    if err != nil {
        log.Fatalf("Error marshalling data to JSON: %v", err)
    }

    // Output the JSON data
    fmt.Println(string(jsonData))
}

func run(zipFilePath string) {
    // Open the zip file
    r, err := zip.OpenReader(zipFilePath)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



That's it!


go run main.go twitter.zip






}
  "userInfo": {
    "accountId": "1234567890",
    "displayName": "Luke ✨",
    "userName": "lukeocodes"
  }
}





  
  
  Open source


I'll be open sourcing a lot of this work so that others who want to parse the data from the archive, can store it how they like.

릴리스 선언문 이 기사는 https://dev.to/lukeocodes/time-to-leave-time-to-rebuild-making-twitter20-4jgc?1에 복제되어 있습니다. 침해 내용이 있는 경우에는 [email protected]으로 연락하여 삭제하시기 바랍니다. 그것

최신 튜토리얼 더>

테스트 커버리지에 대한 진실
강력한 진실. 다음의 간단하고 간단한 코드를 살펴보세요. function sum(a, b) { return a b; } 이제 이에 대한 몇 가지 테스트를 작성해 보겠습니다. test('sum', () => { expect(sum(1, 2...

프로그램 작성 2024-11-06에 게시됨
내 OpenGL 삼각형이 Go에서 렌더링되지 않는 이유는 무엇입니까? 정점 버퍼 문제를 조사합니다.
Go에서 OpenGL 정점 버퍼 문제Go에서 OpenGL을 사용하여 삼각형을 표시하려고 할 때 사용자가 정점에 문제가 발생했습니다. 버퍼가 모양을 렌더링하지 못했습니다. Go 코드는 튜토리얼에서 파생되었지만 C 코드와는 달리 아무런 출력도 생성하지 않았습니다.문제 원인...

프로그램 작성 2024-11-06에 게시됨
$Linux 32비트 배포판의 Go 프로그램에서 `ulimit -n`을 설정하면 \"잘못된 인수\" 오류가 발생하는 이유는 무엇입니까?$
Linux 32비트 배포판의 Go 프로그램에서 `ulimit -n`을 설정하면 \"잘못된 인수\" 오류가 발생하는 이유는 무엇입니까?
Go 프로그램에서 ulimit -n을 어떻게 설정합니까?문제사용자가 Go 프로그램 내에서 ulimit -n을 설정하려고 시도했습니다. setrlimit 및 getrlimit 시스템 호출을 사용하여 전역이 아닌 프로그램 내에서 제한합니다. 그런데 값을 설정하려고 할 때 ...

프로그램 작성 2024-11-06에 게시됨
Python에서 무제한의 깊이로 동적으로 중첩된 사전을 만드는 방법은 무엇입니까?
깊이가 정의되지 않은 동적으로 중첩된 사전복잡한 다중 레벨 데이터 구조와 관련된 시나리오에서는 변수 중첩이 포함된 사전이 필요한 경우가 종종 있습니다. 수준. 삽입 문을 하드코딩하는 것이 잠재적인 해결책이지만 중첩 깊이를 미리 알 수 없는 경우 이 접근 방식은 비실용적...

프로그램 작성 2024-11-06에 게시됨
강력해진 Python: 손쉬운 프로그래밍을 위한 초보자 가이드
Python은 간단한 구문과 광범위한 응용 프로그램을 갖춘 강력한 프로그래밍 언어입니다. Python을 설치한 후 변수 할당, 데이터 유형 및 흐름 제어를 포함한 기본 구문을 배울 수 있습니다. 실제 사례에서는 몬테카를로 시뮬레이션을 통해 파이(pi)를 계산해 수치 계...

프로그램 작성 2024-11-06에 게시됨
jQuery 없이 동적으로 생성된 요소에 대해 이벤트 수신을 수행하는 방법은 무엇입니까?
jQuery 없이 동적으로 생성된 요소에서 이벤트 수신 대기외부 페이지 작업 시 동적으로 생성된 요소에 이벤트 리스너를 추가하는 것이 어려울 수 있습니다. 이러한 시나리오에서는 이벤트 처리 위임이 중요합니다.한 가지 접근 방식은 event.target 속성을 사용하여 ...

프로그램 작성 2024-11-06에 게시됨
Snipbyte의 고급 출석 관리 시스템으로 인력 효율성 최적화
오늘날의 비즈니스 환경에서는 직원 출석, 교대근무, 급여를 효율적으로 관리하는 것이 조직의 성공을 좌우할 수 있습니다. 정확한 출석 추적은 원활한 운영을 보장할 뿐만 아니라 생산성 향상에도 도움이 됩니다. Snipbyte에서는 고급 출석 관리 시스템을 포함하여 비즈니스...

프로그램 작성 2024-11-06에 게시됨
Laravel 인증 경로 튜토리얼
Laravel auth routes is one of the essential features of the Laravel framework. Using middlewares you can implement different authentication strategies...

프로그램 작성 2024-11-06에 게시됨
큰 텍스트 파일의 특정 줄로 효율적으로 이동하려면 어떻게 해야 합니까?
큰 텍스트 파일에서 줄 점프 최적화: 대체 접근 방식다양한 길이의 줄이 포함된 대용량 텍스트 파일을 처리할 때 다음을 수행하는 것이 비효율적인 경우가 많습니다. 특정 줄 번호에 도달하기 위해 각 줄을 순차적으로 읽습니다. 질문에 제공된 코드 샘플은 전체 파일을 통해 잠...

프로그램 작성 2024-11-06에 게시됨
JavaScript에서 HTML 요소의 CSS 속성 값을 검색하는 방법은 무엇입니까?
JavaScript에서 HTML 요소에 대한 CSS 속성 값 얻기웹 개발에서 CSS 속성을 동적으로 조작하면 사용자 경험과 인터페이스를 향상시킬 수 있습니다. JavaScript를 사용하면 이러한 속성에 액세스하는 것이 간단합니다.귀하의 시나리오에서는 CSS 파일이 H...

프로그램 작성 2024-11-06에 게시됨
PLSQL의 DBMS_OUTPUT.PUT_LINE
Oracle PL/SQL에서 출력을 인쇄하는 방법은 DBMS_OUTPUT.PUT_LINE 프로시저를 사용하는 것입니다. 이 프로시저는 DBMS_OUTPUT이 활성화된 경우 실행 후 볼 수 있는 콘솔 또는 출력 버퍼에 텍스트를 씁니다. 사용 방법은 다음과 같습니다. 먼저...

프로그램 작성 2024-11-06에 게시됨
자동화를 위해 Python 활용: 코드로 일상 작업 단순화
소개 Python은 웹 개발부터 데이터 과학에 이르기까지 광범위한 애플리케이션에 적합한 언어로 자리매김했습니다. Python이 진정으로 빛나는 영역 중 하나는 자동화입니다. 일상적인 작업을 자동화하거나, 워크플로를 간소화하거나, 시간과 노력을 절약하는 ...

프로그램 작성 2024-11-06에 게시됨
Python에서 Pandas 시리즈에 함수를 적용하기 위해 인수를 전달하는 방법은 무엇입니까?
Python Pandas의 계열 적용 함수에 인수 전달pandas 라이브러리는 'apply()' 메서드를 제공합니다. 시리즈의 각 요소에 기능을 적용합니다. 그러나 이전 버전의 Pandas에서는 추가 인수가 함수에 전달되는 것을 허용하지 않습니다.이전 버...

프로그램 작성 2024-11-06에 게시됨
Java 8 Lambda를 사용하여 효율적으로 여러 필드별로 컬렉션을 정렬하는 방법은 무엇입니까?
Java 8 Lambda를 사용하여 여러 필드로 컬렉션 정렬제공된 정렬 코드가 불완전한 것으로 나타나 예상한 정렬 순서를 생성하지 못할 수 있습니다. Java 8 람다를 사용하여 보다 효율적이고 포괄적인 접근 방식을 살펴보겠습니다.Java 8 람다 사용Java 8은 간...

프로그램 작성 2024-11-06에 게시됨
개발자는 어떻게 JavaScript의 HTML 페이지 간에 데이터를 안전하게 교환할 수 있습니까?
JavaScript로 HTML 페이지 전체에서 데이터 무결성 유지웹 페이지 간에 데이터를 전송할 때 쿼리 매개변수를 사용하는 기존 접근 방식(예: "http://localhost/ project/index.html?status=exist") URL에 ...

프로그램 작성 2024-11-06에 게시됨