Blogs/AI/How To Use Open Source LLMs (Large Language Model)?

How To Use Open Source LLMs (Large Language Model)?

Sep 11, 2024 • 5 Min Read

Written by Ajay Patel

How To Use Open Source LLMs (Large Language Model)? Hero

For code versioning, we utilize GitHub, a platform that allows us to manage and store different versions of our code repositories. For Docker images, Docker Hub is the platform where we can store, manage, and distribute our Docker images. Similarly, for AI models, we have Hugging Face.

Hugging Face provides a centralized platform for sharing and managing AI models, allowing us to access and use pre-trained models, as well as distribute our own models with ease. Hugging Face is a platform with over 800k models and 186k datasets all open source and publicly available, in an online platform where people can easily collaborate and build together. The Hub works as a central place where anyone can explore, experiment, collaborate, and build technology with AI/Machine Learning.

Getting Started with HuggingFace

1. Create your account on Hugging Face.

2. Navigate to the models section and choose your model.

3. From the left sidebar, select the specific type of task or problem you're trying to solve. This could be anything from text generation, translation, question answering, or summarization. Select the model that best fits your needs.

For our tutorial, we are going to use the google/gemma-2-2b-it model.

Hardware Requirements To Use a LLM

For running a model, we can use either a CPU or a GPU to do the computation. A CPU performs most of the general computing tasks. On the other hand, a GPU is specifically designed to handle complex mathematical calculations. Therefore, when the model is computationally intensive, meaning it requires a lot of mathematical calculations, using a GPU can significantly reduce the inference time and make the process more efficient as compared to using a CPU.

There will not be any difference in output whatever we use. The only difference is inference time. They're like super fast assembly lines for mathematical calculations. In real life, imagine you have a large batch of packets that need to be labeled. A CPU (regular computer) would label one packet at a time, but a GPU can label several packets at once. This means that a GPU can finish the task much more quickly than a CPU. GPUs are designed with a large number of cores that can handle parallel processing tasks efficiently.

AI model inference often involves performing the same operation on a large set of data (eg. matrix multiplication), and the parallel architecture of GPUs allows them to handle these tasks simultaneously, leading to faster inference.

Google Colab Notebook

What is Google Colab?

Google Colab, short for Google Colaboratory, is a free cloud-based platform that allows you to write and execute Python code through your browser. It's essentially a Jupyter notebook environment that requires no setup and runs entirely in the cloud. Colab provides free access to computing resources including GPUs, making it an invaluable tool for data scientists, machine learning practitioners, and researchers.

Key features of Google Colab include:

Partner with Us for Success

Experience seamless collaboration and exceptional results.

1. Free GPU and TPU access

2. Easy sharing and collaboration

3. Integration with Google Drive

4. Pre-installed popular libraries

5. Interactive code execution

Why are we using Google Colab?

We're utilizing Google Colab for several compelling reasons:

1. Accessibility: Colab eliminates the need for local setup, allowing us to start coding immediately without worrying about hardware constraints or software installations.

2. Free GPU access: For our LLM project, we require significant computational power. Colab provides free access to NVIDIA Tesla T4 GPUs, which are well-suited for machine learning tasks.

3. Cost-effectiveness: By leveraging Colab's free resources, we can experiment with and develop LLM models without incurring the high costs associated with purchasing or renting powerful hardware.

4. Collaboration: Colab notebooks are easy to share, making it simple to collaborate with team members or share our work with the community.

5. Flexibility: Colab supports a wide range of Python libraries and can be easily connected to other Google services like Drive, making data management and workflow integration seamless.

6. Learning and experimentation: The platform's user-friendly interface and pre-configured environment lower the barrier to entry for those new to machine learning or working with LLMs.

By using Google Colab, we can focus on the core aspects of our LLM project - coding, model development, and experimentation - without getting bogged down by infrastructure concerns or budget limitations. This allows for rapid prototyping and iteration, crucial in the fast-paced field of AI and machine learning.

Now let’s create a new Google Colab notebook, which provides access to Tesla T4 GPU machines. Create a new notebook and change runtime to T4. Now we are ready to use the LLM model on a GPU machine.

Comprehensive Practical Guide: Setting Up and Using the Gemma-2-2b-it Model

1. Installing Required Packages

!pip install transformers torch bitsandbytes accelerate huggingface_hub

2. Logging into Hugging Face for Model Access

from huggingface_hub import notebook_login
notebook_login()

3. Accepting Google's Usage License

Now lets go to Gemma-2-2b-it and accept the license to proceed with the following steps.

Partner with Us for Success

Experience seamless collaboration and exceptional results.

4. Loading and Configuring Gemma-2-2b-it

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

5. Inference

query = "what is AI?"
input_ids = tokenizer(query, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=1024)
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Model Output:

what is AI?

**Artificial Intelligence (AI)** is a broad field of computer science that aims to create machines capable of performing tasks that typically require human intelligence. 

**Key Concepts:**

* **Learning:** AI systems can learn from data and improve their performance over time.
* **Reasoning:** AI systems can use logic and rules to solve problems and make decisions.
* **Problem-solving:** AI systems can identify and solve complex problems.
* **Perception:** AI systems can interpret sensory information, such as images and sounds.
* **Natural Language Processing (NLP):** AI systems can understand and generate human language.

**Types of AI:**

* **Narrow or Weak AI:** Designed to perform a specific task, like playing chess or recommending products.
* **General or Strong AI:** Hypothetical AI that possesses human-level intelligence and can perform any intellectual task.
* **Super AI:** Hypothetical AI that surpasses human intelligence in all aspects.

**Applications of AI:**

AI is used in a wide range of applications, including:

* **Healthcare:** Diagnosis, treatment planning, drug discovery.
* **Finance:** Fraud detection, risk assessment, algorithmic trading.
* **Transportation:** Self-driving cars, traffic optimization.
* **Customer service:** Chatbots, virtual assistants.
* **Entertainment:** Content creation, personalized recommendations.

**Benefits of AI:**

* **Increased efficiency and productivity.**
* **Improved decision-making.**
* **Automation of tasks.**
* **New discoveries and innovations.**

**Challenges of AI:**

* **Job displacement.**
* **Bias and fairness.**
* **Privacy and security.**
* **Ethical considerations.**


**In summary:** AI is a rapidly evolving field with the potential to revolutionize many aspects of our lives. It involves creating machines that can learn, reason, and solve problems, leading to advancements in various industries and applications. However, it also presents challenges that need to be addressed to ensure its responsible and beneficial development.

In conclusion, leveraging open-source Large Language Models (LLMs) has become increasingly accessible thanks to platforms like HuggingFace and Google Colab. These tools provide a user-friendly environment for exploring state-of-the-art models without the need for extensive hardware resources.

By following a comprehensive guide and understanding the hardware requirements, anyone can get started with LLMs, experiment with different models, and integrate advanced natural language understanding capabilities into their projects. Whether you're a researcher, developer, or enthusiast, the journey into the world of LLMs offers immense potential for innovation and discovery.

Ajay Patel

Sr. Backend Developer

Hi, I am an AI engineer with 3.5 years of experience passionate about building intelligent systems that solve real-world problems through cutting-edge technology and innovative solutions.

Next for you

What is Google Gemini CLI & how to install and use it? Cover

AI

Jul 3, 2025 • 2 min read

What is Google Gemini CLI & how to install and use it?

Ever wish your terminal could help you debug, write code, or even run DevOps tasks, without switching tabs? Google’s new Gemini CLI might just do that. Launched in June 2025, Gemini CLI is an open-source command-line AI tool designed to act like your AI teammate, helping you write, debug, and understand code right from the command line. What is Gemini CLI? Gemini CLI is a smart AI assistant you can use directly in your terminal. It’s not just for chatting, it’s purpose-built for developers.

How To Use Local LLMs with Ollama? (A Complete Guide) Cover

AI

Jul 3, 2025 • 6 min read

How To Use Local LLMs with Ollama? (A Complete Guide)

AI tools like chatbots and content generators are everywhere. But usually, they run online using cloud services. What if you could run those smart AI models directly on your own computer, just like running a regular app? That’s what Ollama helps you do. In this blog, you’ll learn how to set it up, use it in different ways (like with terminal, code, or API), change some basic settings, and know what it can and can't do. What is Ollama? Ollama is a software that allows you to use large, power

Graph RAG vs Temporal Graph RAG: How AI Understands Time Cover

AI

Jul 2, 2025 • 4 min read

Graph RAG vs Temporal Graph RAG: How AI Understands Time

What if AI could rewind time to answer your questions? Most AI tools today focus on what happened, but not WHEN it happened. That’s where Temporal Graph RAG steps in. It combines the power of knowledge graphs with time-aware intelligence to give more accurate, contextual answers. In this blog, you’ll learn: * What Graphs and Knowledge Graphs are * How Graph RAG works and why it’s smarter than regular RAG * How Temporal Graph RAG takes it to the next level with time-aware intelligence Wha