IA RAG con Ollama y Vector DB

3 min readJan 31, 2025

El problema

Un cliente de la industria automotriz recibe leads con datos inconsistentes, los cuales requieren trabajo manual de filtrado, reubicación y programación frecuente de queries.

Por ejemplo: en el campo Modelo a veces se indica la cuota de financiación, en el campo OBS se vuelca sin criterio uniforme mucha información vital, etc

Solución #1 Crear un RAG con Ollama, Pandas y LanceDB

Importar el CSV a un Vector Database
Montar un LLM local con RAG (Retrieval-Augmented Generation)
Escribir un script Python para chatear usando el contexto del Vector Database

Ahora, en lugar de analizar cada fila manualmente, reubicar datos y escribir consultas SQL, es posible interactuar en lenguaje natural solicitando, por ejemplo:

“Listar los mejores contactos de condición autónomo que buscan X modelo de auto”

¿Por qué un Vector Database?

Un Vector Database utiliza algoritmos de búsqueda de similitud, como nearest neighbor search, para recuperar los vectores más cercanos a una consulta dada y permite generar fácilmente el contexto — ordenado y estructurado -para el LLM.

Demo del RAG con Vector Database

IA RAG con Vector Database en Python

Tecnologías utilizadas

Ollama
LanceDB
Modelos qwen2.5 y bge-m3

Detalles a tener en cuenta

Es preciso usar un modelo capaz de usar embeddings. En la página de Ollama es posible filtrar los modelos embeddings con este link

Otras opciones de RAG

Otra opción analizada fue PandasAI “una plataforma que simplifica realizar preguntas a tus datos en lenguaje natural”

Dicho esto, la instalación de PandasAI da error a menos que se haga un downgrade a Python 3.11 Luego, es preciso un API Key https://app.pandabi.ai o bien configurar la API de OpenAI.

Es posible agregar semántica explicando qué es cada planilla y columna, por medio de

pai.create(path=”…” description=”Base de datos de call center con leads para planes de compra de autos 0KM”,
df = file_df,
columns=[
{
“name”: “ID”,
“type”: “integer”,
“description”: “ID del lead”
},
{
“name”: “FECHA_ALTA”,
“type”: “string”,
“description”: “Fecha del contacto con el lead”
},

Funcionó relativamente bien para datos estructurados, pero tuvo dificultades para operar con datos desordenados e inconsistentes en un campo de texto Obs.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Roni Bandini

656 Followers

49 Following

Contracultura maker 🛠️ Artes electrónicas 💡Inteligencia Artificial: LLM y Machine Learning 🚀 Embedded System developer

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Building a Multi-Agent RAG Pipeline with Crew AI

Data And Beyond

TONI RAMCHANDANI

Building a Multi-Agent RAG Pipeline with Crew AI

In today’s era of intelligent systems, the ability to combine diverse retrieval tools with robust language models is transforming the way…

Feb 14

What’s the Best PDF Extractor for RAG? I Tried LlamaParse, Unstructured and Vectorize

Level Up Coding

Pavan Belagatti

What’s the Best PDF Extractor for RAG? I Tried LlamaParse, Unstructured and Vectorize

If you’re building retrieval augmented generation (RAG) applications, you will eventually need to work with documents that are in PDF form.

Feb 19

Lists

Natural Language Processing

1964 stories1608 saves

ChatGPT prompts

51 stories2609 saves

Markdown contains more information than just text.

AI Advances

Dr. Leon Eversberg

Benchmarking PDF to Markdown Document Converters — Part 2

Testing 4 more Python Markdown converters on a benchmark PDF document for better RAG results with LLMs

Feb 22

🚀SmolAgents from HuggingFace:A Step-by-Step Guide To Create AI Agents With Examples🚀

Artificial Intelligence in Plain English

Sreedevi Gogusetty

🚀SmolAgents from HuggingFace:A Step-by-Step Guide To Create AI Agents With Examples🚀

smolagents

Feb 21

Agentic RAG: Mastering Document Retrieval with CrewAI, DeepSeek, and Streamlit

Vikram Bhat

Agentic RAG: Mastering Document Retrieval with CrewAI, DeepSeek, and Streamlit

Building a Privacy-First, Local multi-agent Chatbot for Dynamic Document Conversations.

6d ago

DeepSeek+ Local Knowledge Base: Impressively Powerful

Generative AI

Crank Lee

DeepSeek+ Local Knowledge Base: Impressively Powerful

Today, I will share the deployment of Deepseek + local knowledge base.

Feb 8

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams