【AI-问答对生成】通过模板化问答生成器自动化创建问答对（Langchain+Ollama）

2024-11-07

notes / AI notes

字数统计: 2.1k | 阅读时长≈ 9 分钟

¶写在前面 / 链接存档

¶LangChain

¶LangChain基本概念

LangChain 是一个强大的框架，专门用于将大语言模型（LLM）与其他工具和数据源集成，以便进行更复杂的任务处理

链（Chain）：
1. LangChain 中的链是一个执行流程，由多个组件（如工具、模型、数据等）按照特定顺序连接而成
2. 可以创建自定义的链来实现数据流转和复杂操作
这个说法有点像工作流
Prompt（提示模板）：
1. LangChain 使用模板化的提示（Prompt）来引导语言模型生成目标内容
2. 可以设计特定的模板，动态填充不同的数据，并将其传递给大语言模型进行回答生成
工具（Tools）：
1. LangChain 支持集成多种工具，如 Web Scraping、数据库查询、API 调用等
2. 可以将这些工具与语言模型结合，构建复杂的自动化问答系统
Agents（智能体）：
1. LangChain 中的智能体是能够根据当前环境做出决策的对象
2. 它们可以选择适当的工具来处理输入，并生成响应
3. 在问答系统中，智能体可以用来根据问题选择使用的生成模板或工具

¶LangChain中问答生成流程

创建问答模板（Prompt Template）

GPT提供的问答模版，还得修改

from langchain.prompts import PromptTemplate

template = """
    请根据以下信息回答问题：

    景点名称：{attraction_name}
    历史背景：{history}
    开放时间：{opening_hours}
    门票信息：{ticket_info}
    地址：{address}
    交通方式：{transportation}

    问：{question}
"""
prompt = PromptTemplate(
    input_variables=["attraction_name", "history", "opening_hours", "ticket_info", "address", "transportation", "question"],
    template=template
)

将模板与语言模型结合（LLM）

使用LangChain中的语言模型来生成回答

from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain

# 创建模型
llm = ChatOpenAI(model="qwen2.5:7b", temperature=0.7)

'''
此处为1.中的模板生成代码，
最后生成模板变量prompt
'''

# 将模型与模板结合
chain = LLMChain(prompt=prompt, llm=llm)

# 生成回答
attraction_data = {
    "attraction_name": "西安城墙",
    "history": "西安城墙是中国保存最完整、规模最大的古代城墙之一，建于明朝初期。",
    "opening_hours": "每日8:00 AM - 6:00 PM",
    "ticket_info": "成人票：120元，学生票：60元",
    "address": "陕西省西安市莲湖区城墙街",
    "transportation": "可乘坐地铁2号线，或公交路线23、27、35路到达。",
    "question": "西安城墙的历史背景是什么？"
}

# 生成答案
answer = chain.run(attraction_data)
print(answer)

2.5. 自分の想法和改造

生成单个问答对成功之后个人的一些想法

'''
上面部分基本保持 不变
'''

# 1. 问答模板生成
from langchain.prompts import PromptTemplate


# 其中的变量，应该通过递归，通过读取数据库表头字段进行自动化填充和问题生成
template = """
    请根据以下信息回答问题：

    景点名称：{attraction_name}
    历史背景：{history}
    开放时间：{opening_hours}
    门票信息：{ticket_info}
    地址：{address}
    交通方式：{transportation}

    问：{question}
"""
# 这个问题应该让模型来生成，“请根据以上信息，给我生成一些可以用这些信息回答的问题（提问词有待完善）”

# 生成Prompt
prompt = PromptTemplate(
    input_variables=["attraction_name", "history", "opening_hours", "ticket_info", "address", "transportation", "question"],
    template=template
)


# 生成回答
attraction_data = {
    "attraction_name": "西安城墙",
    "history": "西安城墙是中国保存最完整、规模最大的古代城墙之一，建于明朝初期。",
    "opening_hours": "每日8:00 AM - 6:00 PM",
    "ticket_info": "成人票：120元，学生票：60元",
    "address": "陕西省西安市莲湖区城墙街",
    "transportation": "可乘坐地铁2号线，或公交路线23、27、35路到达。",
    "question": "西安城墙的历史背景是什么？"
}


# 生成答案
answer = chain.run(attraction_data)
print(answer)  # 这个answer打印，看一下生成的结果怎么样
# 这里应该再加个存储到txt文件的操作
# 可以直接存入数据库不？新开个库存放生成的问答对
# 如果存入数据库的话，是不是要给数据做个提取筛选结构化啥的？存到对应字段

生成多个问答对

为了生成多个不同的问答对，你可以构建多个不同的模板，或者用不同的 question 值生成多种问题

questions = [
    "西安城墙的开放时间是什么？",
    "西安城墙怎么去？",
    "西安城墙的门票价格是多少？"
]
# 这里的问题集可以从上一步生成再存入数据库的部分中抽取
# 给一个问题的多个相似表达，然后让模型再生成不同的相似问题

for question in questions:
    attraction_data["question"] = question
    answer = chain.run(attraction_data)
    print(f"Q: {question}")
    print(f"A: {answer}")

使用更多的自定义功能

工具集成：如果你想在回答中结合外部数据（例如从旅游网站获取实时票价），可以集成 API 查询或数据库访问等功能。
路线规划：可以调用百度地图的API
增强问答：你可以结合用户的历史提问，动态调整问题生成的方式，从而更好地匹配用户的需求
想法：提示用户输入想要调用的回答（编号/关键词），触发特定回答的再次调用
即把这个历史回答加入当前问题的Prompt中
或者干脆设计一个按钮啥的完成这个功能

¶使用 LangChain 生成问答对的优势

自动化
减少手动生成问答对的工作量，
可以自动化生成大量问答对
灵活性
你可以根据不同的需求调整模板，
生成不同类型的问题和答案。
多样性
通过模板化和语言模型的结合，
能够生成多样化的问答对，
避免重复和单一
集成外部数据
你可以通过集成 API 或其他工具，
将动态的数据（例如实时票价、天气情况等）纳入回答中

¶Ollama

1	pip install ollama

ollama · PyPI

ollama库的地址

¶Ollama Python Library

The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

要求：Python版本大于等于3.8

¶基础用法（在Python项目中）

import ollama
response = ollama.chat(model='llama3.1', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

¶流式回答

setting stream=True

import ollama

stream = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

¶API

The Ollama Python library’s API is designed around the Ollama REST API

¶Chat

1	ollama.chat(model='llama3.1', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

¶Generate

1	ollama.generate(model='llama3.1', prompt='Why is the sky blue?')

¶List

1	ollama.list()

¶Show

1	ollama.show('llama3.1')

¶Create

modelfile='''
FROM llama3.1
SYSTEM You are mario from super mario bros.
'''

ollama.create(model='example', modelfile=modelfile)

¶Copy

1	ollama.copy('llama3.1', 'user/llama3.1')

¶Delete

1	ollama.delete('llama3.1')

¶Pull

1	ollama.pull('llama3.1')

¶Embeddings

1	ollama.embeddings(model='llama3.1', prompt='The sky is blue because of rayleigh scattering')

¶Ps

1	ollama.ps()

¶Custom client

A custom client can be created with the following fields:

host: The Ollama host to connect to
timeout: The timeout for requests

from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.1', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

¶Async client

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response = await AsyncClient().chat(model='llama3.1', messages=[message])

asyncio.run(chat())

Setting stream=True modifies functions to return a Python asynchronous generator

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  async for part in await AsyncClient().chat(model='llama3.1', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

asyncio.run(chat())

¶Errors

Errors are raised if requests return an error status or if an error is detected while streaming.

model = 'does-not-yet-exist'

try:
  ollama.chat(model)
except ollama.ResponseError as e:
  print('Error:', e.error)
  if e.status_code == 404:
    ollama.pull(model)