Facebook iconHow To Use Open Source LLMs (Large Language Model)?
F22 logo
Blogs/AI

How To Use Open Source LLMs (Large Language Model)?

Written by Ajay Patel
Oct 22, 2025
5 Min Read
How To Use Open Source LLMs (Large Language Model)? Hero

For code versioning, we utilize GitHub, a platform that allows us to manage and store different versions of our code repositories. For Docker images, Docker Hub is the platform where we can store, manage, and distribute our Docker images. Similarly, for AI models, we have Hugging Face

Hugging Face provides a centralized platform for sharing and managing AI models, allowing us to access and use pre-trained models, as well as distribute our own models with ease. Hugging Face is a platform with over 800k models and 186k datasets all open source and publicly available, in an online platform where people can easily collaborate and build together. The Hub works as a central place where anyone can explore, experiment, collaborate, and build technology with AI/Machine Learning.

Getting Started with HuggingFace

1. Create your account on Hugging Face.

2. Navigate to the models section and choose your model. 

3. From the left sidebar, select the specific type of task or problem you're trying to solve. This could be anything from text generation, translation, question answering, or summarization. Select the model that best fits your needs. 

For our tutorial, we are going to use the google/gemma-2-2b-it model.

Hardware Requirements To Use a LLM

For running a model, we can use either a CPU or a GPU to do the computation. A CPU performs most of the general computing tasks. On the other hand, a GPU is specifically designed to handle complex mathematical calculations. Therefore, when the model is computationally intensive, meaning it requires a lot of mathematical calculations, using a GPU can significantly reduce the inference time and make the process more efficient as compared to using a CPU. 

There will not be any difference in output whatever we use. The only difference is inference time. They're like super fast assembly lines for mathematical calculations. In real life, imagine you have a large batch of packets that need to be labeled. A CPU (regular computer) would label one packet at a time, but a GPU can label several packets at once. This means that a GPU can finish the task much more quickly than a CPU. GPUs are designed with a large number of cores that can handle parallel processing tasks efficiently. 

AI model inference often involves performing the same operation on a large set of data (eg. matrix multiplication), and the parallel architecture of GPUs allows them to handle these tasks simultaneously, leading to faster inference. 

Google Colab Notebook

What is Google Colab?

Google Colab, short for Google Colaboratory, is a free cloud-based platform that allows you to write and execute Python code through your browser. It's essentially a Jupyter notebook environment that requires no setup and runs entirely in the cloud. Colab provides free access to computing resources including GPUs, making it an invaluable tool for data scientists, machine learning practitioners, and researchers.

Key features of Google Colab include:

Running and Using Open-Source LLMs Effectively
Discover top open-weight models, how to deploy them locally or in the cloud, and best practices for fine-tuning and evaluation.
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 24 Jan 2026
10PM IST (60 mins)

1. Free GPU and TPU access

2. Easy sharing and collaboration

3. Integration with Google Drive

4. Pre-installed popular libraries

5. Interactive code execution

Why are we using Google Colab?

We're utilizing Google Colab for several compelling reasons:

1. Accessibility: Colab eliminates the need for local setup, allowing us to start coding immediately without worrying about hardware constraints or software installations.

2. Free GPU access: For our LLM project, we require significant computational power. Colab provides free access to NVIDIA Tesla T4 GPUs, which are well-suited for machine learning tasks.

3. Cost-effectiveness: By leveraging Colab's free resources, we can experiment with and develop LLM models without incurring the high costs associated with purchasing or renting powerful hardware.

4. Collaboration: Colab notebooks are easy to share, making it simple to collaborate with team members or share our work with the community.

5. Flexibility: Colab supports a wide range of Python libraries and can be easily connected to other Google services like Drive, making data management and workflow integration seamless.

6. Learning and experimentation: The platform's user-friendly interface and pre-configured environment lower the barrier to entry for those new to machine learning or working with LLMs.

By using Google Colab, we can focus on the core aspects of our LLM project - coding, model development, and experimentation - without getting bogged down by infrastructure concerns or budget limitations. This allows for rapid prototyping and iteration, crucial in the fast-paced field of AI and machine learning.

Now let’s create a new Google Colab notebook, which provides access to Tesla T4 GPU machines. Create a new notebook and change runtime to T4. Now we are ready to use the LLM model on a GPU machine. 

Comprehensive Practical Guide: Setting Up and Using the Gemma-2-2b-it Model

1. Installing Required Packages

!pip install transformers torch bitsandbytes accelerate huggingface_hub

2. Logging into Hugging Face for Model Access

from huggingface_hub import notebook_login
notebook_login()
Add your token and login

3. Accepting Google's Usage License

Now lets go to Gemma-2-2b-it and accept the license to proceed with the following steps.

Running and Using Open-Source LLMs Effectively
Discover top open-weight models, how to deploy them locally or in the cloud, and best practices for fine-tuning and evaluation.
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 24 Jan 2026
10PM IST (60 mins)

4. Loading and Configuring Gemma-2-2b-it

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

5. Inference

query = "what is AI?"
input_ids = tokenizer(query, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=1024)
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Model Output:

what is AI?

**Artificial Intelligence (AI)** is a broad field of computer science that aims to create machines capable of performing tasks that typically require human intelligence. 

**Key Concepts:**

* **Learning:** AI systems can learn from data and improve their performance over time.
* **Reasoning:** AI systems can use logic and rules to solve problems and make decisions.
* **Problem-solving:** AI systems can identify and solve complex problems.
* **Perception:** AI systems can interpret sensory information, such as images and sounds.
* **Natural Language Processing (NLP):** AI systems can understand and generate human language.

**Types of AI:**

* **Narrow or Weak AI:** Designed to perform a specific task, like playing chess or recommending products.
* **General or Strong AI:** Hypothetical AI that possesses human-level intelligence and can perform any intellectual task.
* **Super AI:** Hypothetical AI that surpasses human intelligence in all aspects.

**Applications of AI:**

AI is used in a wide range of applications, including:

* **Healthcare:** Diagnosis, treatment planning, drug discovery.
* **Finance:** Fraud detection, risk assessment, algorithmic trading.
* **Transportation:** Self-driving cars, traffic optimization.
* **Customer service:** Chatbots, virtual assistants.
* **Entertainment:** Content creation, personalized recommendations.

**Benefits of AI:**

* **Increased efficiency and productivity.**
* **Improved decision-making.**
* **Automation of tasks.**
* **New discoveries and innovations.**

**Challenges of AI:**

* **Job displacement.**
* **Bias and fairness.**
* **Privacy and security.**
* **Ethical considerations.**


**In summary:** AI is a rapidly evolving field with the potential to revolutionize many aspects of our lives. It involves creating machines that can learn, reason, and solve problems, leading to advancements in various industries and applications. However, it also presents challenges that need to be addressed to ensure its responsible and beneficial development.

In conclusion, leveraging open-source Large Language Models (LLMs) has become increasingly accessible thanks to platforms like HuggingFace and Google Colab. These tools provide a user-friendly environment for exploring state-of-the-art models without the need for extensive hardware resources. 

By following a comprehensive guide and understanding the hardware requirements, anyone can get started with LLMs, experiment with different models, and integrate advanced natural language understanding capabilities into their projects. Whether you're a researcher, developer, or enthusiast, the journey into the world of LLMs offers immense potential for innovation and discovery.

Author-Ajay Patel
Ajay Patel

Hi, I am an AI engineer with 3.5 years of experience passionate about building intelligent systems that solve real-world problems through cutting-edge technology and innovative solutions.

Share this article

Phone

Next for you

Socratic Method in AI Prompting: A Practical Guide Cover

AI

Jan 21, 20268 min read

Socratic Method in AI Prompting: A Practical Guide

In most AI interactions, we focus on getting answers as quickly as possible. But fast answers are not always the correct ones. When prompts are vague or incomplete, large language models often produce responses that miss context or follow weak lines of reasoning. This is where Socratic questioning becomes useful in AI prompting. Instead of giving the model a single instruction, Socratic prompting guides it through a series of thoughtful questions. These questions help the model clarify assumpt

What Is Meta Prompting? How to Design Better Prompts Cover

AI

Jan 21, 202611 min read

What Is Meta Prompting? How to Design Better Prompts

If you have ever asked an AI to write a blog post and received something vague, repetitive, or uninspiring, you are not alone. Large language models are powerful, but their performance depends heavily on the quality of the instructions they receive. This is where meta prompting comes in. Instead of asking the model for an answer directly, meta prompting asks the model to design better instructions for itself before responding. By planning how it should think, structure, and evaluate its output

What is Tree Of Thoughts Prompting? Cover

AI

Jan 21, 202610 min read

What is Tree Of Thoughts Prompting?

Large language models often begin with confident reasoning, then drift,  skipping constraints, jumping to weak conclusions, or committing too early to a flawed idea. This happens because most prompts force the model to follow a single linear chain of thought. Tree of Thoughts (ToT) prompting solves this by allowing the model to explore multiple reasoning paths in parallel, evaluate them, and continue only with the strongest branches. Instead of locking into the first plausible answer, the model