Skip to main content

Using HuggingFace Transformers

HF Transformers #

HuggingFace (🤗) Transformers is a library that enables to easily download the state-of-the-art pretrained models. It is also possible to create and train a model from scratch, after modifying the structure of existing models. Although the library starts from transformer based language models, it became a general community hub and includes other models such as convolution based Resnet.

It can easily be installed via pip 1:

pip install transformers

Most code is borrowed from HuggingFace transformers example codes.

Using Library #

In this post, I introduce how to create 1. language models, and 2. image classification model (resnet). But HF Transformers is not limited to those models but also supports audio classification and multimodal (image captioning or document visual question answering).

For the entire post, I use AutoClasses for easier implementation. The entire examples are implemented in Python using PyTorch.

All model construction starts from instantiating a model configuration, which specifies the model structure (e.g., the number of layers, the number of attentions, etc). AutoConfig is used for this purpose; it automatically downloads well-known model configurations from the HF model hub.

It also supports to load local files by getting os.PathLike type of argument, this post only covers downloading the model from the hub by specifying its name. As it automatically downloads the configs and models and caches them into the local filesystem, we don’t have to manually download and manage them. The first config for gpt2-xl downloads a configuration for this model, while the second one downloads this model.

from transformers import AutoConfig

config_gpt2   = AutoConfig.from_pretrained("gpt2-xl")
config_resnet = AutoConfig.from_pretrained("microsoft/resnet-152")

config_gpt2 and config_resnet has different class type:

transformers.models.gpt2.configuration_gpt2.GPT2Config       # config_gpt2
transformers.models.resnet.configuration_resnet.ResNetConfig # config_resnset

Both GPT2Config and ResNetConfig are subclasses of PretrainedConfig, a base class of all Config classes.

Natural Language Processing #

Instatiating a model #

GPT, BERT, and T5 are very famous examples of natural language processing.

There are three types of natural language processing models in HF transformers: CausalLM, MaskedLM, and Seq2SeqLM. HF has two pages to explain causalLM and maskedLM, but have no idea what Seq2seqLM is.

For GPT2, there are two APIs to instantiate a model: AutoModelForPreTraining, and AutoModelForCausalLM:

Use AutoModel if you do not intend to pretrain the model. The result does not include any pretraining head. For GPT-2, the type of model is GPT2Model, instead of GPT2LMHeadModel.

Use .from_pretrained() API to get pretrained parameters as well.

from transformers import AutoModelForPreTraining
model = AutoModelForPreTraining.from_config(config_gpt2)
  (transformer): GPT2Model(
from transformers import AutoModelForcausalLM
model = AutoModelForcausalLM.from_config(config_gpt2)
  (transformer): GPT2Model(

For other models, choose a proper API to instantiate a model. If the API does not support such model, it ruturns the following error:

config = AutoConfig.from_pretrained("t5-base")
ValueError: Unrecognized configuration class  for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, ...

Find which model is supported by which API here.

Instantiating a dataset #

HuggingFace also provides various datasets. For GPT-2, I will use wikitext dataset.

Use datasets package to load a dataset:

from datasets import load_dataset
raw_dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

This will load a subset wikitext-2-raw-v1 from this dataset.

Loading a tokenizer and preprocessing #

A tokenizer is for preparing the inputs for a model.

I could not understand the details but just follows the example code to implement preprocessing :)

from transformers import AutoTokenizers
tokenizer = AutoTokenizer.from_pretrained("gpt2-xl")

column_names = list(raw_dataset["train"].features)
text_column_name = "text" if "text" in column_names else column_names[0]

max_seq_length = tokenizer.model_max_length

def tokenize_function(examples):
    return tokenizer(examples[text_column_name])

tokenized_datasets =

def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {
        k: list(chain(*examples[k])) for k in examples.keys()
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
    # customize this part to your needs.
    if total_length >= max_seq_length:
        total_length = (total_length // max_seq_length) * max_seq_length
    # Split by chunks of max_len.
    result = {
        k: [
            t[i : i + max_seq_length]
            for i in range(0, total_length, max_seq_length)
        for k, t in concatenated_examples.items()
    result["labels"] = result["input_ids"].copy()
    return result

tokenized_datasets =
    group_texts, batched=True, load_from_cache_file=True

Train! #

Code from here.

import evaluate
from transformers import Trainer, default_data_collator

train_dataset = tokenized_datasets["train"]

metric = evaluate.load("accuracy")
def compute_metrics(eval_preds):
    preds, labels = eval_preds
    # preds have the same shape as the labels, after the argmax(-1) has been calculated
    # by preprocess_logits_for_metrics but we need to shift the labels
    labels = labels[:, 1:].reshape(-1)
    preds = preds[:, :-1].reshape(-1)
    return metric.compute(predictions=preds, references=labels)

trainer = Trainer(
    args=training_args, # type: transformers.TrainingArguments
    train_dataset = train_dataset,
    eval_dataset = None, # If you want to add evaluation, use tokenized_datasets["validation"]
    tokenizer = tokenizer,
    data_collator = default_data_collator,
    compute_metrics = None # If you want to add evaluation, use compute_metrics

Image Classification #

Instantiating a model #

ViT, Resnet are examples of image classification. Use transformers.AutoModelForImageClassifcation.

from transformers import AutoConfig, AutoModelForImageClassification
config = AutoConfig.from_pretrained("microsoft/resnet-152")
model = AutoModelForImageClassification.from_config(config)
  (resnet): ResNetModel(

Instantiating a dataset #

Same with NLP. But use an image set, not text set.

from datasets import load_dataset
dataset = load_dataset("Maysee/tiny-imagenet", task="image-classification")

This dataset does not include any subset, hence omitted.

Instantiating an Image Processor and preprocessing #

This is a major difference from training NLP models: instead of using a tokenizer, it should use an image processor.

from transformers import AutoImageProcessor
from torchvision.transformers import (

image_processor = AutoImageProcessor.from_pretrained("microosft/resnet-152")
size = (
    if "shortest_edge" in image_processor.size
    else (image_processor.size["height"], image_processor.size["width"])

normalize = Normalize(
    mean=image_processor.image_mean, std=image_processor.image_std
_train_transforms = Compose(
_val_transforms = Compose(

def train_transforms(example_batch):
    """Apply _train_transforms across a batch."""
    example_batch["pixel_values"] = [
        for pil_img in example_batch["image"]
    return example_batch

def val_transforms(example_batch):
    """Apply _val_transforms across a batch."""
    example_batch["pixel_values"] = [
        for pil_img in example_batch["image"]
    return example_batch


Train! #

Use different tokenizer, data collator, compute_metrics to make it work:

import numpy as np
import torch

def compute_metrics(p):
    return metric.compute(
        predictions=np.argmax(p.predictions, axis=1), references=p.label_ids

def collate_fn(examples):
    pixel_values = torch.stack(
        [example["pixel_values"] for example in examples]
    labels = torch.tensor([example["labels"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

trainer = Trainer(
    model = model,
    args = training_args, # type: transformers.TrainingArguments,
    train_dataset = dataset["train"],
    eval_dataset = None, # use dataset["validation"] if want to evaluate
    compute_metrics = compute_metrics,
    tokenizer = image_processor, # Now we use image processor for tokenizer input
    data_collator = collate_fn,  # Don't use default data collator here

  1. HuggingFace Transformers: Installation ↩︎