Hora de sair? Hora de reconstruir! Criando o Twitter

Primeira página > Programação > Hora de sair? Hora de reconstruir! Criando o Twitter

Hora de sair? Hora de reconstruir! Criando o Twitter

Publicado em 01/09/2024

Navegar:347

The most critical features of a new social network for users fed up with Musk and Twitter, are as follows;

Import Twitter's archive.zip file
Easy as possible to sign up
Similar if not identical user features

Less critical but definitely helpful features of the platform;

Ethically monetised and moderated
Make use of AI to help identify problematic content
Blue tick with the use of Onfido or SMART identity services

In this post, we'll focus on the first feature. Importing Twitter's archive.zip file.

The file

Twitter haven't made your data all that easy to obtain. It's great that they give you access to it (legally, they have to). The format is crap.

It actually comes as a mini web archive and all your data is stuck in JavaScript files. It is more of a web app than convenient storage of data.

When you open up the Your archive.html file you get something like this;

Time to Leave? Time to Rebuild! Making Twitter

Note: I made the descision pretty early on to build using Next.js for the site, Go and GraphQL for the backend.

So, what do you do when your data isn't structured data?

Well, you parse it.

Creating a basic Go script

Head on over to the official docs on how to get started with Go, and set up your project directory.

We're going to hack this process together. It seems one of the most important features to attract people who feel too attached to TwitterX.

First step is to create a main.go file. In this file we'll GO (hah) and do some STUFF;

os.Args: This is a slice that holds command-line arguments.
os.Args[0] is the program's name, and os.Args[1] is the first argument passed to the program.
Argument Check: The function checks if at least one argument is provided. If not, it prints a message asking for a path.
run function: This function simply prints the path passed to it, for now.

package main

import (
    "fmt"
    "os"
)

func run(path string) {
    fmt.Println("Path:", path)
}

func main() {
    if len(os.Args) 



At every step, we'll run the file like so;


go run main.go twitter.zip




If you don't have a Twitter archive export, create a simple manifest.js file and give it the following JavaScript.


window.__THAR_CONFIG = {
  "userInfo" : {
    "accountId" : "1234567890",
    "userName" : "lukeocodes",
    "displayName" : "Luke ✨"
  },
};




Compress that into your twitter.zip file that we'll use throughout.


  
  
  Read a Zip file


The next step is to read the contents of the zip file. We want to do this as efficiently as possible, and reduce time data is extracted on the disk.

There are many files in the zip that don't need to be extracted, too.

We'll edit the main.go file;

Opening the ZIP file: The zip.OpenReader() function is used to open the ZIP file specified by path.
Iterating through the files: The function loops over each file in the ZIP archive using r.File, which is a slice of zip.File. The Name property of each file is printed.

package main

import (
    "archive/zip"
    "fmt"
    "log"
    "os"
)

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("Files in the zip archive:")
    for _, f := range r.File {
        fmt.Println(f.Name)
    }
}

func main() {
    // Example usage
    if len(os.Args) 




  
  
  JS only! We're hunting structured data


This archive file is seriously unhelpful. We want to check for just .js files, and only in the /data directory.

Opening the ZIP file: The ZIP file is opened using zip.OpenReader().
Checking the /data directory: The program iterates through the files in the ZIP archive. It uses strings.HasPrefix(f.Name, "data/") to check if the file resides in the /data directory.
Finding .js files: The program also checks if the file has a .js extension using filepath.Ext(f.Name).
Reading and printing contents: If a .js file is found in the /data directory, the program reads and prints its contents.

package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Print the contents
    fmt.Printf("Contents of %s:\n", file.Name)
    fmt.Println(string(contents))
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 




  
  
  Parse the JS! We want that data


We've found the structured data. Now we need to parse it. The good news is there are existing packages for using JavaScript inside Go. We'll be using goja.

If you're on this section, familiar with Goja, and you've seen the output of the file, you may see we're going to have errors in our future.

Install goja:


go get github.com/dop251/goja




Now we're going to edit the main.go file to do the following;

Parsing with goja: The goja.New() function creates a new JavaScript runtime, and vm.RunString(processedContents) runs the processed JavaScript code within that runtime.
Handle errors in parsing

package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(contents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    fmt.Printf("Parsed JavaScript file: %s\n", file.Name)
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



SUPRISE. window is not defined might be a familiar error. Basically goja runs an EMCA runtime. window is browser context and sadly unavailable.


  
  
  ACTUALLY Parse the JS


I went through a few issues at this point. Including not being able to return data because it's a top level JS file.

Long story short, we need to modify the contents of the files before loading them into the runtime.

Let's modify the main.go file;

reConfig: A regex that matches any assignment of the form window.someVariable = { and replaces it with var data = {.
reArray: A regex that matches any assignment of the form window.someObject.someArray = [ and replaces it with var data = [
Extracting data: Running the script, we use vm.Get("data") to retrieve the value of the data variable from the JavaScript context.

package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc)
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Output the parsed data
    fmt.Printf("Processed JavaScript file: %s\n", file.Name)
    fmt.Printf("Data extracted: %v\n", value.Export())
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



Hurrah. Assuming I didn't muck up the copypaste into this post, you should now see a rather ugly print of the struct data from Go.


  
  
  JSON would be nice


Edit the main.go file to marshall the JSON output.

Use value.Export() to get the data from the struct
Use json.MarshallIndent() for pretty printed JSON (use json.Marshall if you want to minify the output).

package main

import (
    "archive/zip"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated :/
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Convert the data to a Go-native type
    data := value.Export()

    // Marshal the Go-native type to JSON
    jsonData, err := json.MarshalIndent(data, "", "  ")
    if err != nil {
        log.Fatalf("Error marshalling data to JSON: %v", err)
    }

    // Output the JSON data
    fmt.Println(string(jsonData))
}

func run(zipFilePath string) {
    // Open the zip file
    r, err := zip.OpenReader(zipFilePath)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



That's it!


go run main.go twitter.zip






}
  "userInfo": {
    "accountId": "1234567890",
    "displayName": "Luke ✨",
    "userName": "lukeocodes"
  }
}





  
  
  Open source


I'll be open sourcing a lot of this work so that others who want to parse the data from the archive, can store it how they like.

Declaração de lançamento Este artigo foi reproduzido em: https://dev.to/lukeocodes/time-to-leave-time-to-rebuild-making-twitter20-4jgc?1 Se houver alguma violação, entre em contato com [email protected] para excluir isto

Tutorial mais recente Mais>

Como criar dicionários aninhados dinamicamente com profundidade ilimitada em Python?
Dicionários dinamicamente aninhados com profundidade indefinidaEm cenários que envolvem estruturas de dados complexas de vários níveis, muitas vezes é...

Programação Publicado em 2024-11-06
Python tornou-se poderoso: um guia para iniciantes para programação sem esforço
Python é uma linguagem de programação poderosa com sintaxe simples e ampla aplicação. Depois de instalar o Python, você poderá aprender sua sintaxe bá...

Programação Publicado em 2024-11-06
Como escutar eventos em elementos criados dinamicamente sem jQuery?
Event Listening em elementos criados dinamicamente sem jQueryAo trabalhar com páginas externas, adicionar ouvintes de eventos a elementos gerados dina...

Programação Publicado em 2024-11-06
Otimize a eficiência da força de trabalho com o sistema avançado de gerenciamento de atendimento da Snipbyte
No cenário empresarial atual, o gerenciamento eficiente da frequência, dos turnos e da folha de pagamento dos funcionários pode ser o sucesso ou o fra...

Programação Publicado em 2024-11-06
Tutorial de rotas de autenticação do Laravel
Laravel auth routes is one of the essential features of the Laravel framework. Using middlewares you can implement different authentication strategies...

Programação Publicado em 2024-11-06
Como posso pular com eficiência para uma linha específica em um arquivo de texto grande?
Otimizando o salto de linha em arquivos de texto grandes: uma abordagem alternativaAo processar arquivos de texto enormes com linhas de comprimentos v...

Programação Publicado em 2024-11-06
Como recuperar valores de propriedades CSS para elementos HTML em JavaScript?
Obter valores de propriedades CSS para elementos HTML em JavaScriptNo desenvolvimento web, a manipulação dinâmica de propriedades CSS pode aprimorar a...

Programação Publicado em 2024-11-06
DBMS_OUTPUT.PUT_LINE em PLSQL
No Oracle PL/SQL, o método para imprimir a saída é usar o procedimento DBMS_OUTPUT.PUT_LINE. Este procedimento grava texto no console ou buffer de saí...

Programação Publicado em 2024-11-06
Aproveitando Python para automação: simplificando tarefas diárias com código
Introdução Python conquistou seu lugar como uma linguagem de referência para uma ampla gama de aplicações, desde desenvolvimento web até ciên...

Programação Publicado em 2024-11-06
Como passar argumentos para aplicar funções à série Pandas em Python?
Passando argumentos para funções de aplicação de série em Python PandasA biblioteca pandas fornece o método 'apply()' para aplicar uma função ...

Programação Publicado em 2024-11-06
Como classificar coleções por vários campos com eficiência usando Java 8 Lambda?
Classificando coleções com vários campos usando Java 8 LambdaO código de classificação fornecido parece incompleto e pode não produzir a ordem de clas...

Programação Publicado em 2024-11-06
Como os desenvolvedores podem trocar dados com segurança entre páginas HTML em JavaScript?
Mantendo a integridade dos dados em páginas HTML em JavaScriptAo transferir dados entre páginas da web, a abordagem tradicional de usar parâmetros de ...

Programação Publicado em 2024-11-06
Aperte! Extensão do código VS
Hoje publiquei minha primeira extensão do VS Code - Clamp it! Esta extensão facilita a geração de tamanhos fixos para seu código CSS. Fiz isso porque ...

Programação Publicado em 2024-11-06
Dominando o encapsulamento em Java: um guia abrangente com exemplos
Um guia detalhado para encapsulamento Java Encapsulamento é um dos quatro princípios fundamentais de OOP (Programação Orientada a Objetos) em...

Programação Publicado em 2024-11-06
Usando API de armazenamento local com JavaScript e React JS
JavaScript এবং React এ Local Storage API ব্যবহার সম্পর্কে বিস্তারিত আলোচনা করতে পারবেন? JavaScript এবং React এ Local Storage API ব্যবহার খুব ...

Programação Publicado em 2024-11-06

Classificação Mais>

Aprenda japonês Aprender coreano Aprenda chinês Aprender língua estrangeira Jogo Problema comum Periféricos de tecnologia IA Tutorial de software Programação Artigo