Choose Best Ollama Model for GPU

Auswahl des besten Ollama Models

Will man sich mit KI bzw. LLMs beschäftigen und installiert Ollama mit OpenWebUI, muss dann ein Modell herunter geladen werden. Auf der Ollama Webseite gibt es eine riesige Auswahl (Stand 27.05.2025 in Summe 6512 Modelle).

Abgesehen vom Zweck stellt man sich da die Frage, welche Modelle (meistens ist das Ziel ein Modell zu wählen, welches möglichst viele Parameter hat um die bestmögliche Qualität zu erhalten) komplett in den VRAM der verwendeten GPU geladen werden können um beste Performance zu erzielen.

Viele Modelle gibt es in unterschiedlichen Größen, kleine Modelle für GPUs die weniger Speicher haben als auch große Modelle für GPUs die mehr Speicher haben.

Weil das bei der großen Auswahl etwas schwierig ist, da die Ollama Webseite leider keine passende Möglichkeit bietet, habe ich eine Liste aller von der Ollama Webseite verfügbaren Modelle erstellt. In der Liste wünschte ich mir eine Filtermöglichkeit zu haben, wieviel VRAM die verwendete GPU hat um Modelle, die sich nicht im VRAM ausgehen auszublenden. Dann wollte ich von der Modellfamilie das größte Modell angezeigt bekommen. Diese Liste dann noch mit den Spalten Name, Parameteranzahl, Quantisierung und Dateigröße. Und natürlich sollte das auch sortierbar sein.

Das ganze sollte auch dynamisch und immer aktuell sein, falls neue Modelle dazu kommen. Das Ergebnis davon habe ich gleich online verfügbar gemacht: Liste von Ollama Modellen

Selecting the Best Ollama Model

When wanting to work with AI or LLMs and installing Ollama with OpenWebUI, a model needs to be downloaded. On the Ollama website there is a huge selection (as of May 27, 2025, a total of 6512 models).

Apart from the purpose, one then asks which models (usually the goal is to choose a model with as many parameters as possible to get the best possible quality) can be completely loaded into the VRAM of the used GPU to achieve the best performance.

Many models come in different sizes, small models for GPUs that have less memory as well as large models for GPUs that have more memory.

Because this is somewhat difficult with the large selection, since the Ollama website unfortunately does not offer a suitable option, I created a list of all models available from the Ollama website. In the list, I wanted to have a filter option for how much VRAM the used GPU has to hide models that don't fit in VRAM. Then I wanted to display the largest model from the model family. This list then also with the columns Name, Parameter count, Quantization and File size. And of course this should also be sortable.

The whole thing should also be dynamic and always current in case new models are added. I made the result available online immediately: List of Ollama Models

Wybór najlepszego modelu Ollama

Gdy chce się zajmować sztuczną inteligencją lub LLM i instaluje Ollama z OpenWebUI, należy pobrać model. Na stronie Ollama jest ogromny wybór (stan na 27.05.2025 łącznie 6512 modeli).

Oprócz celu, zadaje się pytanie, które modele (zwykle celem jest wybór modelu z jak największą liczbą parametrów, aby uzyskać najlepszą możliwą jakość) mogą być całkowicie załadowane do VRAM używanej karty graficznej, aby osiągnąć najlepszą wydajność.

Wiele modeli występuje w różnych rozmiarach, małe modele dla kart graficznych z mniejszą ilością pamięci, jak również duże modele dla kart graficznych z większą ilością pamięci.

Ponieważ przy tak dużym wyborze jest to nieco trudne, a strona Ollama niestety nie oferuje odpowiedniej opcji, stworzyłem listę wszystkich modeli dostępnych na stronie Ollama. Na liście chciałem mieć opcję filtrowania według ilości VRAM karty graficznej, aby ukryć modele, które nie mieszczą się w VRAM. Następnie chciałem wyświetlać największy model z rodziny modeli. Ta lista z kolumnami Nazwa, Liczba parametrów, Kwantyzacja i Rozmiar pliku. I oczywiście powinno to być również sortowalne.

Całość powinna być również dynamiczna i zawsze aktualna w przypadku dodania nowych modeli. Wynik tego od razu udostępniłem online: Lista modeli Ollama