MinerU on Low VRAM: When Hybrid OOMs but Vlm Works Fine

Nite included in Tutorials

2026-04-23 About 700 words 3 minutes

Contents

I recently needed to process some PDFs locally with MinerU and ran into a series of issues on my 8GB GPU. The most counter-intuitive finding: the supposedly “lighter” hybrid mode ran out of memory, while the seemingly “heavier” vlm mode worked fine. After digging through the source code, I figured out why. Here’s the rundown for anyone running MinerU on low-VRAM hardware.

First hurdle: the default backend OOMs right away

MinerU 3.x defaults to the hybrid-auto-engine backend. After installing, I ran:

mineru -p input.pdf -o output

It crashed almost immediately:

CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total
capacity of 8 GiB of which 65.56 MiB is free.

With 8GB of VRAM, only 65MB was free - not even enough for a 20MB allocation.

Tweaking parameters didn’t help

The documentation lists several environment variables for controlling memory usage:

MINERU_HYBRID_BATCH_RATIO=1 \
MINERU_API_MAX_CONCURRENT_REQUESTS=1 \
MINERU_PROCESSING_WINDOW_SIZE=1 \
mineru -p input.pdf -o output

In particular, MINERU_HYBRID_BATCH_RATIO has a reference table suggesting values of 8 or lower for GPUs under 6GB. My 8GB auto-detected value would be 2 (the get_batch_ratio function in the source sets it to 1 for <8GB, 2 for ≥8GB). I tried every combination anyway.

No luck - still OOM. The reason: batch_ratio only controls the batch size for small model inference, not the number of models loaded into VRAM.

The counter-intuitive discovery: vlm mode actually works

On a whim, I switched to vlm-auto-engine:

mineru -p input.pdf -o output -b vlm-auto-engine

It ran through. Slower than hybrid, but no crashes.

This seemed wrong - vlm mode runs every page through a large model for inference. Shouldn’t that use more VRAM?

Source code reveals the answer

Looking at MinerU’s model_init.py, the two modes load completely different sets of models:

hybrid-auto-engine (HybridModelSingleton):

Model	Purpose	Always in VRAM
PPDocLayoutV2	Layout detection	Yes
PytorchPaddleOCR	OCR text recognition	Yes
MFR (UniMerNet)	Math formula recognition	Yes (if enabled)
Table recognition models	Table structure parsing	Yes
VLM (~2B params)	Complex content	Yes

4–5 models simultaneously resident in VRAM. 8GB simply can’t hold them all.

vlm-auto-engine:

Model	Purpose	Always in VRAM
VLM (~2B params)	Everything in one pass	Yes

Just 1 model loaded, roughly 4–5GB VRAM. Fits within 8GB.

The conclusion is clear: while each small model in hybrid mode is individually lightweight, their combined VRAM footprint exceeds that of a single VLM model. On low-VRAM hardware, vlm mode is actually the better choice.

Another gotcha: http-client with third-party LLMs is incompatible

Since local VRAM was tight, I tried offloading VLM inference to a remote API:

mineru -p input.pdf -o output -b hybrid-http-client -u http://some-api:30000/v1

The logs spat out a WARNING:

WARNING | mineru_vl_utils.mineru_client:parse_layout_output:251 -
Layout output does not match expected format: ```json
[
    {"bbox_2d": [145, 83, 853, 442], "label": "Table I"},
    ...
]

The LLM wrapped its response in a ```json code block, which MinerU couldn’t parse.

Inspecting the request MinerU sends to the LLM, the prompt is remarkably barebones:

MinerU expects the remote model to be its fine-tuned VLM (based on Qwen2-VL), which has been trained to output special tokens when it sees “Layout Detection:”:

<|box_start|>145 83 853 442<|box_end|><|ref_start|>Table I<|ref_end|>

A general-purpose LLM has no idea about these special tokens, so it naturally returns JSON wrapped in a markdown code block. The http-client mode is not a “plug in any OpenAI-compatible API” solution - you must use it with MinerU’s fine-tuned VLM model.

The working solution

For 8GB VRAM environments, just use vlm-auto-engine:

# Install
uv pip install -U "mineru[all]"

# Run
mineru -p input.pdf -o output -b vlm-auto-engine

If vlm also OOMs (edge cases), fall back to pure CPU mode:

mineru -p input.pdf -o output -b pipeline

Slower, but rock-solid.