Papers
arxiv:1910.09700

Quantifying the Carbon Emissions of Machine Learning

Published on Oct 21, 2019
Authors:

Abstract

The Machine Learning Emissions Calculator helps estimate and mitigate the carbon emissions associated with training neural networks by considering factors like server location, training duration, and hardware.

AI-generated summary

From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.

Community

你好

你好呀?

This comment has been hidden (marked as Spam)
This comment has been hidden (marked as Spam)
This comment has been hidden (marked as Spam)
This comment has been hidden

Models Download Stats

How are downloads counted for models?

Counting the number of downloads for models is not a trivial task, as a single model repository might contain multiple files, including multiple model weight files (e.g., with sharded models) and different formats depending on the library (GGUF, PyTorch, TensorFlow, etc.). To avoid double counting downloads (e.g., counting a single download of a model as multiple downloads), the Hub uses a set of query files that are employed for download counting. No information is sent from the user, and no additional calls are made for this. The count is done server-side as the Hub serves files for downloads.

Every HTTP request to these files, including GET and HEAD, will be counted as a download. By default, when no library is specified, the Hub uses config.json as the default query file. Otherwise, the query file depends on each library, and the Hub might examine files such as pytorch_model.bin or adapter_config.json.

Which are the query files for different libraries?

By default, the Hub looks at config.json, config.yaml, hyperparams.yaml, params.json, and meta.yaml. Some libraries override these defaults by specifying their own filter (specifying countDownloads). The code that defines these overrides is open-source. For example, for the nemo library, all files with .nemo extension are used to count downloads.

Can I add my query files for my library?

Yes, you can open a Pull Request here. Here is a minimal example adding download metrics for VFIMamba. Check out the integration guide for more details.

How are GGUF files handled?

GGUF files are self-contained and are not tied to a single library, so all of them are counted for downloads. This will double count downloads in the case a user performs cloning of a whole repository, but most users and interfaces download a single GGUF file for a given repo.

How is diffusers handled?

The diffusers library is an edge case and has its filter configured in the internal codebase. The filter ensures repos tagged as diffusers count both files loaded via the library as well as through UIs that require users to manually download the top-level safetensors.

filter: [
        {
            bool: {
                /// Include documents that match at least one of the following rules
                should: [
                    /// Downloaded from diffusers lib
                    {
                        term: { path: "model_index.json" },
                    },
                    /// Direct downloads (LoRa, Auto1111 and others)
                    /// Filter out nested safetensors and pickle weights to avoid double counting downloads from the diffusers lib
                    {
                        regexp: { path: "[^/]*\\.safetensors" },
                    },
                    {
                        regexp: { path: "[^/]*\\.ckpt" },
                    },
                    {
                        regexp: { path: "[^/]*\\.bin" },
                    },
                ],
                minimum_should_match: 1,
            },
        },
    ]
}

Sign up or log in to comment

Models citing this paper 1,000+

Browse 1,000+ models citing this paper

Datasets citing this paper 24

Browse 24 datasets citing this paper

Spaces citing this paper 16,437

Collections including this paper 5