1. Storing and Sharing Large Language Model Weights on GitHub LFS


    GitHub's Large File Storage (LFS) feature is an excellent choice for managing large files, such as model weights for a large language model. With GitHub LFS, you can version large files without bloating the repository size, and it works seamlessly with Git. Pulumi provides an easy way to manage your GitHub resources, including LFS, right from your infrastructure code.

    Below, I am going to provide you with a Pulumi Python program that sets up a GitHub repository with LFS enabled for storing and sharing your large language model weights. The program will:

    1. Create a new GitHub repository.
    2. Track certain file types using LFS.
    3. Push a dummy large file to demonstrate how LFS works with Pulumi.

    Let's begin by creating a new GitHub repository and setting up LFS:

    import pulumi import pulumi_github as github # Create a new GitHub repository to store your language model weights repo = github.Repository("language-model-weights", description="Repository for storing and sharing large language model weights", visibility="public", # you can set this to "private" if you need ) # Initialize a list of typical file extensions for model weights that should be tracked by LFS large_file_types = [ "*.pt", # PyTorch model files "*.bin", # HuggingFace model files "*.h5", # Keras model files "*.model", # Other generic model files extension ] # Use pulumi_github for each large file type to configure LFS for the repository for file_type in large_file_types: github.RepositoryFile(f"lfs-track-{file_type}", repository=repo.name, file=".gitattributes", content=f"{file_type} filter=lfs diff=lfs merge=lfs -text", branch="main" # Assuming 'main' is the default branch ) # This is a dummy example, normally you would use the actual path to your large file # Create a pulumi asset for a dummy large file large_file_asset = pulumi.FileAsset("dummy-large-file.pt") # Upload a dummy large file to demonstrate Git LFS usage large_file = github.RepositoryFile("dummy-large-model-weight", repository=repo.name, file="models/dummy-large-file.pt", content=large_file_asset.hash, branch="main" ) # Output the repository HTTPS URL pulumi.export('repository_url', repo.html_url)

    How does the program work?

    • First, we import the necessary libraries pulumi and pulumi_github.
    • We create a GitHub repository named language-model-weights. You can choose the visibility you prefer, either public or private depending on your sharing needs.
    • We prepare a list of file extensions that are typically associated with large model files which you might want to store in LFS.
    • Using a loop, we create a .gitattributes file and configure LFS to track files with these extensions by setting up a github.RepositoryFile resource for each file type.
    • We add a sample file (dummy-large-file.pt) to the repository to simulate adding a large model file with LFS. In reality, you would update this part with the actual files you need to upload.

    This Pulumi program creates resources on GitHub, but to actually upload large files into the LFS-enabled repository, you'd still need to use Git commands locally to push the large files or use a GitHub action to automate the process.

    The repository URL is exported at the end of the program, which you can use to access your GitHub repository through a web browser.

    Remember to install the Pulumi GitHub provider by running pip install pulumi_github in your environment before executing this program.

    When you apply this Pulumi code (pulumi up), it will reach out to GitHub and create the repository and files as per the defined resources, hence automating the process of setting up a shared space for your large language model weights. Remember to have your GitHub token set up for authentication.