[RAG] 항공편탐색 Functiona Calling 구현 공부 (Ollama + RAG + ChromaDB + Function Calling)

대장고양이샤샤 2025. 6. 2. 11:25

Function Calling

이 글은 스티븐 바티폴 (Milvus 개발자)가 작성한 Llama 3.1을 벡터 DB Milvus와 간단한 API들과 연동해서 Function Calling기능을 구현한 글을 참고하였습니다.
제시된 코드에서 ChromaDB로 수정하여 Function Calling을 소개하는 글입니다.
추가적인 의견 환영합니다!
▶ 참고 GitHub : stephen37/ollama_local_rag
▶ 참고 : [전문가 기고] Ollama, Llama 3, Milvus로 함수 호출 (Function Calling)하기 by Stephen Batifol

함수 호출 (Function Calling)을 이해해 보자

현재 GPT-4, Llama 3.1 등의 LLM은 함수를 호출해야 하는 시점을 감지하고 해당 함수를 호출하는데 필요한 파라미터를 포함한 JSON 포맷을 작성할 수 있습니다.

‘함수 호출’ 기능을 사용하면, AI 어플리케이션 개발자는

데이터를 추출하거나 태깅 - 예를 들면 위키피디아에서 사람의 이름을 추출하는 등 - 해서 활용하는 LLM 기반의 솔루션 개발
자연어를 API 콜 또는 그에 해당하는 DB 쿼리로 변환해서 작동하는 어플리케이션
특정한 종류의 Knowledge Base를 대상으로 대화형 지식 검색을 하는 엔진

등 LLM의 기본적인 제약 조건을 넘어서 다양한 기능을 제공하는 강력한 AI 어플리케이션을 만들 수 있습니다.

이런 AI 어플리케이션을 만들려면 어떤 작업을 해야 하는지 살펴보려고 하는데요, 오늘 사용해 볼 도구들은 다음과 같습니다.

Ollama (Open Large Language Model Application)
- Ollama는 로컬 PC 환경에서 거대언어모델 (LLM)을 쉽게 실행할 수 있게 해 주는 오픈소스 소프트웨어로, 특히 메타의 Llama 모델을 손쉽게 사용할 수 있게 해 줍니다.
- LLM 사용에 필요한 모든 설정을 Model File이라는 1개 파일에 정의해서 사용하는데, 여기에는 모델 데이터, 설정 및 내부 실행 내용이 포함됩니다.
- 윈도우즈, 리눅스, 맥에서 사용 가능합니다.

Milvus 벡터 DB
- ‘벡터 DB’는 머신러닝 모델의 임베딩을 사용해서 ‘비정형 데이터’를 처리하고 인덱싱, 검색할 수 있도록 설계한 특수한 유형의 DB라고 보면 되겠습니다. 전통적인 RDB (관계형 DB)처럼 테이블 형식으로 데이터를 구성하는 게 아니라 데이터를 ‘벡터 임베딩’이라고 부르는 고차원 벡터로 표현하고 관리합니다.
- 시중에 벡터 DB가 여러가지 있는데, Milvus는 그 중 하나로 오픈소스 벡터 DB라고 생각하시면 되겠습니다.

Llama 3.1-8B
- 얼마 전 메타에서 발표한 Llama 3.1 패밀리에 속한 모델들 중 하나로 가장 소형 모델입니다. 이전의 Llama 3에 비교해서 Context가 8K에서 128K로 아주 많이 커졌고, 다국어 지원 능력이 더 뛰어난 모델이다 정도로 이해하시면 될 것 같습니다.
- 이 글의 맥락에서 가장 중요한 건, 이전 버전과 달리 Llama 3.1은 함수 호출이 기본적으로 가능하도록 내재화되어 있다는 건데요. 메타에서는 긴 대화의 맥락을 이어가면서 함수 호출을 하려면 Llama 3.1-70B나 Llama 3.1-405B를 추천하는데, 이 글처럼 간단한 테스트는 8B 모델도 무리없습니다.

위 도구들을 사용해서, 아래와 같은 흐름으로 실행되는 아주 간단한 AI 어플리케이션 예제를 만드는 과정을 따라가 보려고 합니다:

시스템 상 질문 답변의 흐름

Llama 3.1과 Ollama 사용하기

Llama 3.1 모델은 함수 호출을 하기 위해서 파인튜닝을 한 모델입니다. 단일 (Single) 함수 호출, 중첩 (Nested) 함수 호출, 병렬 (Parallel) 함수 호출 뿐 아니라 다중 턴 (Multi-turn) 함수 호출도 지원합니다 - 다시 말하면, 여러 단계나 복잡한 병렬적인 처리가 필요한 작업을 함수 호출로 처리할 수 있다는 겁니다.

이 글의 예제에서는 Milvus에서 비행 시간 (Flight Time)을 가져오고 검색을 하기 위한 API 호출을 하는 다양한 함수를 구현해 보려고 합니다. Llama 3.1이 사용자의 질의에 따라서 어떤 함수를 호출할지 결정합니다.

Llama 3.1외에 함수 호출을 제공하는 모델들

이외에 ollama에서 함수 호출을 지원하는 모델들을 간단히 소개하겠습니다.

공식 지원 모델 목록은 링크에서 확인해주세요 Ollama - Tools 카테고리

Tool support · Ollama Blog

Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world.

ollama.com

[ 대표적인 Function Calling 지원 Ollama 모델 ]
- Llama 3.1 : 단일/병렬/중첩 호출 및 멀티턴 지원
- Gemma 3 (1B~27B) : Google Gemma 3 시리즈, 실시간 검색 등 함수 호출 예제 다수
- Qwen3 : Tool Calling 및 Agent 기능 내장, Ollama에서 적합
- 기타 : xLAM, Mistral, Deepseek 등 파인튜닝 버전

!ollama pull llama3.1:8b # 가장 추천
!ollama pull llama3.2:3b  # Colab에서 사용 가능
!ollama pull qwen2.5:7b  # 성능 좋음
!ollama pull mistral:7b  # 안정적

모델과 라이브러리 설치

자 먼저, 예제를 실행해 보기 위한 환경 설정을 해야 합니다.

Llama 3.1을 Ollama를 사용해서 다운로드합니다.

ollama run llama3.1

위 명령어를 이용해서, 사용하시는 랩탑이나 PC에 모델을 다운로드하고 Ollama로 사용할 수 있도록 준비합니다.

* 만약 Google Colab기반으로 실행중이라면?

Colab내부에서 Ollama 설치부터 해주셔야합니다.

# 1. Ollama 설치
!curl -fsSL https://ollama.com/install.sh | sh

# 2. 백그라운드에서 Ollama 서버 시작
import subprocess
import time
import os

# Ollama 서버를 백그라운드에서 실행
process = subprocess.Popen(['ollama', 'serve'], 
                         stdout=subprocess.PIPE, 
                         stderr=subprocess.PIPE)

# 서버가 시작될 때까지 잠시 대기
time.sleep(5)

print("Ollama 서버가 시작되었습니다.")

# 3. 모델 다운로드 (다른 모델도 가능)
!ollama pull llama3.1:8b

# 4. 설치된 모델 확인
!ollama list

# 5. Colab 내부 ollama 설치
!pip install ollama

# 6. 연결 테스트
import ollama

try:
    client = ollama.Client()
    response = client.chat(
        model='gemma3:12b',
        messages=[{'role': 'user', 'content': '안녕하세요'}]
    )
    print("연결 성공!")
    print(response['message']['content'])
except Exception as e:
    print(f"연결 실패: {e}")

다음으로는, 구동에 필요한 각종 라이브러리를 설치합니다

!pip install chromadb

ChromaDB 에 데이터 생성

자 이네, Chroma에 데이터를 좀 입력해 볼까요? 이 데이터가 Llama 3.1 모델이 검색, 활용하게 될 타겟 데이터입니다.

# chroma 코드

import chromadb
from chromadb.utils import embedding_functions

# Create an embedding function
client = chromadb.PersistentClient(path="./chroma_db")

# # define embedding function
embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")

docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

# Create a collection
collection = client.get_or_create_collection(name="demo_collection", 
                                             embedding_function=embedding_fn
                                             )

# data insert
collection.add(
    documents=docs,
    metadatas=[{"subject":"history"} for _ in range(len(docs))], #metadata - 각 문장에 대한 메타데이터 동일하게 생성
    ids = [str(i) for i in range(len(docs))] #문서에 고유 ID부여 ("0" 문자열 변환)
)


print(f"데이터 {len(docs)}개 엔티티가 삽입되었습니다.")
print(f"컬렉션 '{collection.name}'에 총 {collection.count()}개 문서가 저장되어 있습니다.")

사용할 함수를 정의

이 예제에서는 두 개의 함수를 정의하는데, 하나는 ‘비행 시간’을 확인하기 위한 API 호출을 시뮬레이션 (get_flight_times)하는 것이고, 나머지 하나는 Milvus DB에 검색 쿼리 (search_data_in_vector_db)를 하는 겁니다.

함수 ‘search_data_in_vector_db는 앞에서 생성한 Milvus의 로컬 DB (Collection)을 사용하지만, 함수 ‘get_flight_times’는 보시다시피 함수 안에 필요한 데이터를 넣어 놓았기 때문에 시뮬레이션한다고 이야기한 것이고요 (실제 어플리케이션이라면 비행 시간을 알려주는 외부 서비스를 호출하겠죠)

1. get_flight_times함수

이 함수는 항공편 시간 조회를 도와주는 함수로 이륙시간과 도착시간을 파라미터로 가집니다.

2. search_data_in_vector_db

이 함수는 query 파라미터에 따라 AI 관련 데이터 벡터 검색후 답변을 도출합니다.

import json


# Simulates an API call to get flight times
# In a real application, this would fetch data from a live database or API
# fuction 1 ) 항공편 시간 조회 (departure, arrival 파라미터 필요)
def get_flight_times(departure: str, arrival: str) -> str:
    flights = {
        'NYC-LAX': {'departure': '08:00 AM', 'arrival': '11:30 AM', 'duration': '5h 30m'},
        'LAX-NYC': {'departure': '02:00 PM', 'arrival': '10:30 PM', 'duration': '5h 30m'},
        'LHR-JFK': {'departure': '10:00 AM', 'arrival': '01:00 PM', 'duration': '8h 00m'},
        'JFK-LHR': {'departure': '09:00 PM', 'arrival': '09:00 AM', 'duration': '7h 00m'},
        'CDG-DXB': {'departure': '11:00 AM', 'arrival': '08:00 PM', 'duration': '6h 00m'},
        'DXB-CDG': {'departure': '03:00 AM', 'arrival': '07:30 AM', 'duration': '7h 30m'},
    }

    key = f'{departure}-{arrival}'.upper()
    return json.dumps(flights.get(key, {'error': 'Flight not found'}))

# Search data related to Artificial Intelligence in a vector database
#  fuction 2 ) AI 관련 데이터 벡터 검색(Query 파라미터 필요)
def search_data_in_vector_db(query: str) -> str:
    # 임베딩한 vectordb가져오기
    collection = chroma_client.get_collection("demo_collection")

    # ChromaDB에서 검색 수행
    res = collection.query(
        query_texts=[query],
        n_results=2,
        include=["documents","metadatas"] # 문서 내용과 메타데이터를 포함해서 출력
    )

    print(res)
    return json.dumps(res)

만든 함수를 LLM이 사용할 수 있도록 명령(with Markdown)

이제, 위에서 정의한 함수를 LLM이 사용할 수 있게끔 명령어를 작성합니다.

Llama 3.1은 tool_choice 파라미터가 아니라 특별한 프롬프트 문법을 사용해서 함수 호출을 하게끔 되어 있습니다.

ollama client에 사용자가 model정보와 사용자 messege를 입력하면, 모델이 질문을 분석한 뒤 필요한 함수가 무엇인지 결정합니다. 적절한 해당 함수가 실행되고, 함수가 도출한 값을 바탕으로 다시 최종 답변이 생성됩니다.

from IPython.display import Markdown, display
import ollama

# [Markdown 스타일 출력]---------------------
def display_markdown_response(content: str):
  #markdwon으로 렌더링
  display(Markdown(content))

#run_md 함수에서 사용
#사용자 질문을 받아 적절한 함수를 호출하고 마튿운 결과를 반환하는 AI 에이전트
def run_md(model: str, question: str):
    # create ollama client   
    ollama_client = ollama.Client()

    # Initialize conversation with a user query
    messages = [{"role": "user", "content": question}]

    # First API call: Send the query and function description to the model
    response = ollama_client.chat(
        model=model,
        messages=messages,
        tools=[
            {
                "type": "function", # function calling
                "function": {
                    "name": "get_flight_times",
                    "description": "Get the flight times between two cities",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "departure": {
                                "type": "string",
                                "description": "The departure city (airport code)",
                            },
                            "arrival": {
                                "type": "string",
                                "description": "The arrival city (airport code)",
                            },
                        },
                        "required": ["departure", "arrival"],
                    },
                },
            },
            {
                "type": "function", # function calling
                "function": {
                    "name": "search_data_in_vector_db",
                    "description": "Search about Artificial Intelligence data in a vector database",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "The search query",
                            },
                        },
                        "required": ["query"],
                    },
                },
            },
        ],
    )

    # Add the model's response to the conversation history
    messages.append(response["message"])

    # Check if the model decided to use the provided function

    if not response["message"].get("tool_calls"):
        print("The model didn't use the function. Its response was:")
        print(response["message"]["content"])
        return

    # Process function calls made by the model
    if response["message"].get("tool_calls"):
        available_functions = {
            "get_flight_times": get_flight_times,
            "search_data_in_vector_db": search_data_in_vector_db,
        }

        for tool in response["message"]["tool_calls"]:
            function_to_call = available_functions[tool["function"]["name"]]
            function_args = tool["function"]["arguments"]
            function_response = function_to_call(**function_args)

            # Add function response to the conversation
            messages.append(
                {
                    "role": "tool",
                    "content": function_response,
                }
            )

    # Second API call: Get final response from the model
    final_response = ollama_client.chat(model=model, messages=messages)

    print("🤖 AI 응답:")
    display_markdown_response(final_response["message"]["content"])

모델응답을 받은 뒤, 'for tool in response["message"]["tool_calls"] ...'부터 시작하는 Function Calling의 핵심 실행 부분에서는 함수 호출 루프가 시작됩니다. 모델이 요청한 각 함수를 순차적으로 처리합니다.

먼저, 모델이 함수 호출 여부를 결정한 응답이 response["message"]에 담긴 뒤, 모델 응답은 대화 히스토리에 저장됩니다.

# 질문: "뉴욕에서 LA 항공편 시간 알려줘"
# response["message"] 예시
response["message"] = {
    "role": "assistant", 
    "content": null,
    "tool_calls": [
        {
            "function": {
                "name": "get_flight_times",
                "arguments": {"departure": "NYC", "arrival": "LAX"}
            }
        }
    ]
}

# 히스토리 저장
messages.append(response["message"])

응답에서 함수 호출 여부에 따라 서로 다르게 처리됩니다.

딕셔너리에 tool_calls 키가 없다면( "tool_calls": null // 또는 키 자체가 없음 ), 일반 텍스트 응답만 출력하고 함수가 종료됩니다.

# 함수를 호출하지 않는 경우 예시
if not response["message"].get("tool_calls"):
    print("The model didn't use the function. Its response was:")
    print(response["message"]["content"])
    return

그러나, 모델 응답 딕셔너리에 tool_calls 키가 있다면, 함수 호출 처리를 준비합니다.

# 함수를 호출하는 경우 예시
if response["message"].get("tool_calls"):
    available_functions = {
        "get_flight_times": get_flight_times,
        "search_data_in_vector_db": search_data_in_vector_db,
    }

실제 동작 흐름은 아래와 같습니다.

예를 들어, 사용자 입력 데이터 분석결과 아래와 같은 데이터가 get_flight_times 함수를 호출하였다면,

함수명을 통해 실제 함수 객체를 찾고, 함수 파라미터도추출합니다.

# tool_calls는 리스트 형태 
# tool_calls 결과 중 하나의 tool 예시)
[ 
  { "function":  { "name": " get_flight_times ", 
    "arguments": {"departure": "NYC" , "arrival" : "LAX"} } }
]

1. 호출 함수 탐색
function_to_call = available_functions[tool["function"]["name"]] 
# ㄴtool_calls에서 함수명 문자열 "get_flight_times"을 추출

2. 함수 파라미터 탐색
function_args = tool["function"]["arguments"] 
# ㄴ함수 파라미터 {"departure": "NYC" , "arrival" : "LAX"}추출

3. 함수 실행
function_response = function_to_call(**function_args)
# ㄴget_flight_times(departure="NYC",arrival="LAX") 형태로 실행

이렇게 생성된 함수결과는 대화에 추가되고, 히스토리에 저장됩니다.

messages.append({
    "role": "tool", #함수 실행 결과표시 
    "content": function_response, #모델이 최종답변 생성할 참고 함수결과
})

대화 히스토리 예시)

[
  {"role": "user", "content": "뉴욕에서 LA 항공편 시간 알려줘"},
  {"role": "assistant", "tool_calls": [{"function":...}] },
  {"role": "tool", "content": "{"departure": "NYC" , "arrival" : "LAX"}"}, 
  {"role": "assistant", "content": "5시30분입니다."}
]

사용자 질문: "뉴욕행 비행기 이륙시간 알려줘"
       ↓
모델 1차 응답 (assistant): tool_calls 생성
       ↓
함수 실행: get_flight_times() 
       ↓
함수 결과 저장 (tool):{"departure": "NYC", "arrival": "LAX"}
       ↓
모델 2차 응답 (assistant):"뉴욕행 비행기 이륙시간은 5시30분입니다."
       ↓
사용자가 보는 최종 출력

예제 실행

원하는 비행편의 시간을 확인할 수 있는지 테스트해 봅니다:

질문

Q1.만약에 다중함수가 필요할것같으면 그건 LLM이 알아서 판단하는 구조인지 여부

A1. YES, LLM이 스스로 판단한다.

LLM이 질문을 분석해서 필요한 함수들을 스스로 판단하고 선택한다.
예를 들어 "인천에서 도쿄 항공편 시간 알려주고 예약 사이트도 안내해줘" 라는 질문이 들어오면,

LLM이 두 개 함수가 모두 필요하다고 판단하여 tool_calls 배열에 두 함수를 모두 포함시킨다.
사용자가 명시적으로 "두 함수 다 써줘"라고 말하지 않아도, 질문의 맥락을 이해해서 자동으로 적절한 함수 조합을 결정하는 구조이다.

Q2.모든 fuction calling이 이런 자가 판단 구조를 가지는가

A1. YES, 모든 Function Calling이 이런 구조

이는 OpenAI가 정립한 표준이고, 대부분의 LLM 서비스가 이 패턴을 따른다.

Ollama 특유 구조가 아니라 업계 표준이므로, 다른 플랫폼으로 쉽게 마이그레이션 가능하다.

표준 Function Calling 아키텍처

1. 공통 구조 (OpenAI 표준)

사용자 질문 → LLM 분석 → 함수 선택/호출 → 함수 실행 → 결과 해석 → 최종 응답

2. 주요 플랫폼별 구현 방식 차이

OpenAI GPT

response = openai.chat.completions.create(
    model="gpt-4",
    messages=messages,
    tools=tools_definition
)

Anthropic Claude

response = anthropic.messages.create(
    model="claude-3",
    messages=messages,
    tools=tools_definition
)

Google Gemini

response = genai.generate_content(
    model="gemini-pro",
    contents=messages,
    tools=tools_definition
)

Ollama (로컬)

response = ollama.chat(
    model="llama3.1",
    messages=messages,
    tools=tools_definition
)

저작자표시 (새창열림)