Facebook iconHow To Use Open Source LLMs (Large Language Model)?
Blogs/AI

How To Use Open Source LLMs (Large Language Model)?

Aug 2, 20245 Min Read
by Ajay Patel
How To Use Open Source LLMs (Large Language Model)? Hero

For code versioning, we utilize GitHub, a platform that allows us to manage and store different versions of our code repositories. For Docker images, Docker Hub is the platform where we can store, manage, and distribute our Docker images. Similarly, for AI models, we have Hugging Face

Hugging Face provides a centralized platform for sharing and managing AI models, allowing us to access and use pre-trained models, as well as distribute our own models with ease. Hugging Face is a platform with over 800k models and 186k datasets all open source and publicly available, in an online platform where people can easily collaborate and build together. The Hub works as a central place where anyone can explore, experiment, collaborate, and build technology with AI/Machine Learning.

Getting Started with HuggingFace

1. Create your account on Hugging Face.

2. Navigate to the models section and choose your model. 

3. From the left sidebar, select the specific type of task or problem you're trying to solve. This could be anything from text generation, translation, question answering, or summarization. Select the model that best fits your needs. 

For our tutorial, we are going to use the google/gemma-2-2b-it model.

Hardware Requirements To Use a LLM

For running a model, we can use either a CPU or a GPU to do the computation. A CPU performs most of the general computing tasks. On the other hand, a GPU is specifically designed to handle complex mathematical calculations. Therefore, when the model is computationally intensive, meaning it requires a lot of mathematical calculations, using a GPU can significantly reduce the inference time and make the process more efficient as compared to using a CPU. 

There will not be any difference in output whatever we use. The only difference is inference time. They're like super fast assembly lines for mathematical calculations. In real life, imagine you have a large batch of packets that need to be labeled. A CPU (regular computer) would label one packet at a time, but a GPU can label several packets at once. This means that a GPU can finish the task much more quickly than a CPU. GPUs are designed with a large number of cores that can handle parallel processing tasks efficiently. 

AI model inference often involves performing the same operation on a large set of data (eg. matrix multiplication), and the parallel architecture of GPUs allows them to handle these tasks simultaneously, leading to faster inference. 

Partner with Us for Success

Experience seamless collaboration and exceptional results.

Google Colab Notebook

What is Google Colab?

Google Colab, short for Google Colaboratory, is a free cloud-based platform that allows you to write and execute Python code through your browser. It's essentially a Jupyter notebook environment that requires no setup and runs entirely in the cloud. Colab provides free access to computing resources including GPUs, making it an invaluable tool for data scientists, machine learning practitioners, and researchers.

Key features of Google Colab include:

1. Free GPU and TPU access

2. Easy sharing and collaboration

3. Integration with Google Drive

4. Pre-installed popular libraries

Partner with Us for Success

Experience seamless collaboration and exceptional results.

5. Interactive code execution

Why are we using Google Colab?

We're utilizing Google Colab for several compelling reasons:

1. Accessibility: Colab eliminates the need for local setup, allowing us to start coding immediately without worrying about hardware constraints or software installations.

2. Free GPU access: For our LLM project, we require significant computational power. Colab provides free access to NVIDIA Tesla T4 GPUs, which are well-suited for machine learning tasks.

3. Cost-effectiveness: By leveraging Colab's free resources, we can experiment with and develop LLM models without incurring the high costs associated with purchasing or renting powerful hardware.

4. Collaboration: Colab notebooks are easy to share, making it simple to collaborate with team members or share our work with the community.

5. Flexibility: Colab supports a wide range of Python libraries and can be easily connected to other Google services like Drive, making data management and workflow integration seamless.

6. Learning and experimentation: The platform's user-friendly interface and pre-configured environment lower the barrier to entry for those new to machine learning or working with LLMs.

By using Google Colab, we can focus on the core aspects of our LLM project - coding, model development, and experimentation - without getting bogged down by infrastructure concerns or budget limitations. This allows for rapid prototyping and iteration, crucial in the fast-paced field of AI and machine learning.

Now let’s create a new Google Colab notebook, which provides access to Tesla T4 GPU machines. Create a new notebook and change runtime to T4. Now we are ready to use the LLM model on a GPU machine. 

Comprehensive Practical Guide: Setting Up and Using the Gemma-2-2b-it Model

1. Installing Required Packages

!pip install transformers torch bitsandbytes accelerate huggingface_hub

2. Logging into Hugging Face for Model Access

from huggingface_hub import notebook_login
notebook_login()
Add your token and login

3. Accepting Google's Usage License

Now lets go to Gemma-2-2b-it and accept the license to proceed with the following steps.

4. Loading and Configuring Gemma-2-2b-it

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

5. Inference

query = "what is AI?"
input_ids = tokenizer(query, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=1024)
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Model Output:

what is AI?

**Artificial Intelligence (AI)** is a broad field of computer science that aims to create machines capable of performing tasks that typically require human intelligence. 

**Key Concepts:**

* **Learning:** AI systems can learn from data and improve their performance over time.
* **Reasoning:** AI systems can use logic and rules to solve problems and make decisions.
* **Problem-solving:** AI systems can identify and solve complex problems.
* **Perception:** AI systems can interpret sensory information, such as images and sounds.
* **Natural Language Processing (NLP):** AI systems can understand and generate human language.

**Types of AI:**

* **Narrow or Weak AI:** Designed to perform a specific task, like playing chess or recommending products.
* **General or Strong AI:** Hypothetical AI that possesses human-level intelligence and can perform any intellectual task.
* **Super AI:** Hypothetical AI that surpasses human intelligence in all aspects.

**Applications of AI:**

AI is used in a wide range of applications, including:

* **Healthcare:** Diagnosis, treatment planning, drug discovery.
* **Finance:** Fraud detection, risk assessment, algorithmic trading.
* **Transportation:** Self-driving cars, traffic optimization.
* **Customer service:** Chatbots, virtual assistants.
* **Entertainment:** Content creation, personalized recommendations.

**Benefits of AI:**

* **Increased efficiency and productivity.**
* **Improved decision-making.**
* **Automation of tasks.**
* **New discoveries and innovations.**

**Challenges of AI:**

* **Job displacement.**
* **Bias and fairness.**
* **Privacy and security.**
* **Ethical considerations.**


**In summary:** AI is a rapidly evolving field with the potential to revolutionize many aspects of our lives. It involves creating machines that can learn, reason, and solve problems, leading to advancements in various industries and applications. However, it also presents challenges that need to be addressed to ensure its responsible and beneficial development.

In conclusion, leveraging open-source Large Language Models (LLMs) has become increasingly accessible thanks to platforms like HuggingFace and Google Colab. These tools provide a user-friendly environment for exploring state-of-the-art models without the need for extensive hardware resources. 

By following a comprehensive guide and understanding the hardware requirements, anyone can get started with LLMs, experiment with different models, and integrate advanced natural language understanding capabilities into their projects. Whether you're a researcher, developer, or enthusiast, the journey into the world of LLMs offers immense potential for innovation and discovery.

Author-Ajay Patel
Ajay Patel

AI engineer passionate about building intelligent systems that solve real-world problems through cutting-edge technology and innovative solutions.

Phone

Next for you

Pinecone Vector DB Guide: Core Concepts Explained Cover

AI

Nov 20, 20244 min read

Pinecone Vector DB Guide: Core Concepts Explained

Think of AI as a super-smart library that needs to understand and remember massive amounts of information. But here's the challenge: how do we help AI organize and quickly find exactly what it needs? Enter Pinecone - imagine it as an AI's personal librarian that's incredibly fast at organizing and finding information. Pinecone provides a managed vector database that enables developers to store, search, and retrieve high-dimensional vector embeddings efficiently. This blog will explore key conce

A Complete Guide to AI Agents Cover

AI

Nov 20, 20249 min read

A Complete Guide to AI Agents

Remember the last time a chatbot actually solved your problem without frustration? Or when your smart home adjusted the temperature perfectly before you even thought about it? That's AI agents at work - sophisticated digital assistants that are transforming our interaction with technology in ways we barely notice but increasingly depend on. In this comprehensive guide, you'll learn how AI agents work, explore their different types—from simple reflex agents to complex learning systems—and discov

What is Data Engineering? Cover

AI

Nov 3, 202412 min read

What is Data Engineering?

Data engineering is the art and science of designing, constructing, and maintaining the systems that collect, store, and process vast amounts of data. It's the foundation upon which modern data-driven organizations build their insights and make critical decisions. As a data engineer, I've witnessed firsthand how this field has evolved from simple data management to complex, real-time data processing ecosystems. In essence, data engineering is about creating robust, scalable infrastructures that