AI Agent 实践

Posted 2024-03-16 Updated 2024-03- 28

By Eric

95~122 min read

AIGC的趋势

未来LLM基础模型会越来越强大，而且很多小模型(huggingface models)也在蓬勃发展，再与当下企业的应用或者私有化数据进行融合，形成新的AI应用。

在运用 LLM 技术驱动的自主 Agent 系统中，LLM 承担了智能大脑的角色，且包含了三个提升其关键性能的组件：规划（Planning）、记忆（Memory）和工具（Tool）。

关于Agent更多具体内容可以参考：LLM Powered Autonomous Agents | Lil'Log (lilianweng.github.io)

接下来一步步搭建属于自己的外挂领域数据，以及AI Agent。

环境准备

Python 3.11
pip 24.0
IntelliJ IDEA / Pycharm
OpenAI API Key
Kubernetes 1.26.5
Helm v3.12.1

技术栈

LangChain：作为基础框架
LangChain Smith：调用链路追踪
Jupyter Notebook：编写代码和运行代码
Milvs：向量数据库
Gradio：前端UI框架

向量数据库的选择

当前有非常多的向量数据库，让人眼花缭乱，以下这些都是我使用过的：

Faiss: Faiss 是由 Facebook AI Research 开发的库，专注于高效密集向量相似性搜索。支持多种索引和距离度量，适用于大规模数据集，可在 CPU 和 GPU 上运行。 Welcome to Faiss Documentation — Faiss documentation
Chroma: Chroma 是一个基于 GPU 的向量搜索库，提供快速的近似和精确搜索。它的设计旨在优化 GPU 性能，适合需要实时搜索的应用。 🧪 Usage Guide | Chroma (trychroma.com)
Pinecone: Pinecone 是一个云服务，提供向量数据库即服务。它易于使用，支持实时数据更新，自动扩展，适合不想管理硬件的开发者。注册后会赠送$100的初始金额，可以体验下。 Quickstart (pinecone.io)
Milvus: Milvus 是一个开源向量数据库，支持多种索引结构和相似性搜索算法。它适用于多种语言，提供丰富的 API 和 SDK，适合多种 AI 应用场景。 Milvus documentation

为什么选择Milus

从官方文档和一些博客中找到如下几个考虑维度：

数据的持久化以及低成本存储 数据不丢是最低的底线。许多单机和轻量级的向量数据库并没有关注数据的可靠性，Milvus基于对象存储和消息队列的存储方案既通过存储计算分离提升了系统的弹性和扩展性，又保证了系统的可持久化性。更为重要的是，大多数ANN索引都是纯内存加载的，需要消耗大量内存才能执行检索。Milvus是全球第一款支持磁盘索引的向量数据库，相比磁盘索引可以提供十倍以上的存储性价比。
数据分布 传统数据库的分库分表分片往往基于主键或者分区键。对于传统数据库而言这种设置非常合理，原因是用户查询时往往给出确切的查询条件并路由到对应的分片。对于向量数据库而言，查询往往是找到全局与目标向量相似的向量，此时查询往往需要像MPP数据库一样在所有分区执行，算力需求随着数据量增长而增加。向量原生数据库将向量作为一等公民，可以根据向量数据分布设置合理的分区策略，并充分利用数据分布信息设置查询策略来提升查询性能和查询精度。
稳定可靠 在向量数据库的使用场景中，很多都要求单机故障能在分钟级恢复，同时也有越来越多的关键场景提出了主备容灾甚至跨机房容灾的需求。基于向量数据库的使用场景，传统基于Raft/Paxos的复制策略存在着资源浪费严重，数据预先分片困难等问题。Milvus基于分布式存储和消息队列实现数据的可用性，基于K8s实现无状态故障恢复的无疑更省资源，故障恢复时间也更短。向量数据库的稳定性另一个重要挑战是资源管理。传统数据库更加关注磁盘、网络等IO资源的调度管理，而向量数据库的核心瓶颈是计算和内存。
可运维可观测 Milvus支持多种部署模式，例如K8s Operator和Helm chart、docker compose、pip install等，并提供了基于grafana、prometheus和Loki的监控报警体系。Zilliz还开源了向量数据库可视化管理组件Attu以及向量数据库可视化工具Feder，大大降低了向量数据库的管理难度，提升了向量检索的可解释程度。得益于Milvus 2.0的分布式云原生架构，Milvus也是业内首款支持多租户隔离、RBAC、Quota限流、滚动升级的向量数据库。

企业私有化数据是一个非常非常重要的资源，当前很多RAG应用，如果合理使用这些数据资源进行检索增强，能最大化利用AI+DATA，得到一个更加智能化的应用来服务于企业。

使用K8S部署Milvs向量数据库

安装介绍

想要简单快速安装可以参考官方文档：Install Milvus Standalone with Kubernetes Milvus documentation

 ### 1、添加helm repo
 helm repo add milvus https://zilliztech.github.io/milvus-helm/
 helm repo update
 
 ### 2、查看values文件，主要调整存储参数
 helm show values milvus/milvus  > milvus-values.yaml
 helm template milvus-release milvus/milvus -f milvus-values.yaml -n milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false > local-deployment.yaml
 
 ### 3、安装单机版本milvus
 kubectl create namespace milvus
 helm upgrade --install milvus-release milvus/milvus -f milvus-values.yaml -n milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false
 
 ### 4、本地访问
 kubectl port-forward -n milvus service/milvus-release 19530:19530

以上是本机简单安装，当然也可以结合各种云厂商以及Storage Class高可用存储来部署分布式集群，提供更好的性能和可用性。

安装结果

UI界面展示

安装地址：zilliztech/attu: The GUI for Milvus (github.com)

RAG （Retrieval Augmented Generation）

检索增强生成（RAG）是指对大型语言模型输出进行优化，使其能够在生成响应之前引用训练数据来源之外的权威知识库。大型语言模型（LLM）用海量数据进行训练，使用数十亿个参数为回答问题、翻译语言和完成句子等任务生成原始输出。在 LLM 本就强大的功能基础上，RAG 将其扩展为能访问特定领域或组织的内部知识库，所有这些都无需重新训练模型。这是一种经济高效地改进 LLM 输出的方法，让它在各种情境下都能保持相关性、准确性和实用性。

LLM 面临的已知挑战包括：

在没有答案的情况下提供虚假信息。
当用户需要特定的当前响应时，提供过时或通用的信息。
从非权威来源创建响应。
由于术语混淆，不同的培训来源使用相同的术语来谈论不同的事情，因此会产生不准确的响应。

一个典型的RAG架构图：

rag_overview

典型的 RAG 应用程序有两个主要组件：

Indexing

用于从源引入数据并为其编制索引的管道。这通常离线发生。索引主要有以下几个部分：

1 Load

加载：首先我们需要加载数据。这是通过 DocumentLoaders 完成的。

2 Split

拆分：文本拆分器将大 Documents 块分解为更小的块。这对于索引数据和将其传递到模型都很有用，因为大块更难搜索，并且不适合模型的有限上下文窗口。

3 Store

存储：我们需要某个地方来存储和索引我们的拆分，以便以后可以搜索它们。这通常使用 VectorStore 和 Embeddings 模型来完成。

Retrival And Generation

际的 RAG 链，它在运行时接受用户查询并从索引中检索相关数据，然后将其传递给模型。

1 Retrieval

检索：给定用户输入，使用 Retriever 从存储中检索相关的拆分。

2 Generation

生成：ChatModel / LLM 使用包含问题和检索到的数据的提示生成答案

举个例子🌰

当你准备好了OpenAI的ApiKey以及向量数据库后，接下来会一步步构建一个基于Agent的RAG应用。注意：python package的导入会提示没有安装，可自行通过pip命令安装。

1 加载配置

 import os
 from dotenv import load_dotenv, find_dotenv
 
 # Read local .env file
 _ = load_dotenv(find_dotenv())  
 
 api_key = os.getenv('OPENAI_API_KEY')
 api_base = os.getenv('OPENAI_API_BASE')

2 加载数据

通过WebBaseLoader、PyPDFLoader、加载website、PDF、Video数据。

更多DataLoader可以参考：Document loaders | 🦜️🔗 Langchain

脚本：

 from langchain_community.document_loaders import WebBaseLoader
 
 # from website
 website_urls = [
     "https://flyeric.top/archives/alibaba-cola-4.0-practices",
 ]
 html_docs = []
 for url in website_urls:
     html_docs.extend(WebBaseLoader(web_path=url, encoding="UTF-8").load())
 print(len(html_docs))
 print(html_docs[0].metadata)
 
 # from pdf
 #! pip install pypdf 
 from langchain.document_loaders import PyPDFLoader
 
 pdf_urls = [
     "data/MachineLearning-Lecture01.pdf",
 ]
 pdf_docs = []
 for url in pdf_urls:
     pdf_docs.extend(PyPDFLoader(url).load())
 print(len(pdf_docs))
 print(pdf_docs[0].metadata)
 
 # from video
 # fix ssl issue: pip install pyopenssl ndg-httpsclient pyasn1 urllib3
 from langchain_community.document_loaders import YoutubeLoader
 
 urls = [
     "https://www.youtube.com/watch?v=HAn9vnJy6S4",
     "https://www.youtube.com/watch?v=dA1cHGACXCo",
     "https://www.youtube.com/watch?v=ZcEMLz27sL4",
     "https://www.youtube.com/watch?v=hvAPnpSfSGo",
     "https://www.youtube.com/watch?v=EhlPDL4QrWY",
     "https://www.youtube.com/watch?v=mmBo8nlu2j0",
     "https://www.youtube.com/watch?v=rQdibOsL1ps",
     "https://www.youtube.com/watch?v=28lC4fqukoc",
     "https://www.youtube.com/watch?v=es-9MgxB-uc",
     "https://www.youtube.com/watch?v=wLRHwKuKvOE",
     "https://www.youtube.com/watch?v=ObIltMaRJvY",
     "https://www.youtube.com/watch?v=DjuXACWYkkU",
     "https://www.youtube.com/watch?v=o7C9ld6Ln-M",
 ]
 video_docs = []
 for url in urls:
     video_docs.extend(YoutubeLoader.from_youtube_url(url, add_video_info=True).load())
 print(len(video_docs))
 print(video_docs[0].metadata)

outputs：

 1
 {'source': 'http://193.112.246.235:8090/archives/alibaba-cola-4.0-practices', 'title': "Alibaba COLA 4.0 架构实践 - Eric's Blog", 'description': 'Alibaba COLA架构 4.0 COLA 是 Clean Object-Oriented and Layered Architecture的缩写，代表“整洁面向对象分层架构”。 目前COLA已经发展到COLA v4。 互联网业务项目一般会遇到如下一些普遍问题： 虽然整体架构规划', 'language': 'en'}
 22
 {'source': 'data/MachineLearning-Lecture01.pdf', 'page': 0}
 13
 {'source': 'HAn9vnJy6S4', 'title': 'OpenGPTs', 'description': 'Unknown', 'view_count': 7603, 'thumbnail_url': 'https://i.ytimg.com/vi/HAn9vnJy6S4/hq720.jpg', 'publish_date': '2024-01-31 00:00:00', 'length': 1530, 'author': 'LangChain'}

3 给文档生成Index，存储到向量数据库

脚本：

 from langchain_openai import OpenAIEmbeddings
 #from langchain_community.vectorstores import FAISS
 #from langchain_community.vectorstores import Chroma
 from langchain_community.vectorstores import Milvus
 from langchain_text_splitters import RecursiveCharacterTextSplitter
 import itertools
 
 total_docs = list(itertools.chain(html_docs, pdf_docs, video_docs))
 
 # 将不同的文档设置相同的schema或者设置动态Scheme for milvus
 for doc in total_docs:
     source_metadata = {"source": doc.metadata["source"]}
     doc.metadata = source_metadata
 
 # 拆分文档
 text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100)
 chunked_docs = text_splitter.split_documents(total_docs)
 # 优于text-embedding-ada-002
 embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 
 connection_args = {"host": "127.0.0.1", "port": "19530"}
 collection_name = "collection_demo_all"
 vectorstore = Milvus(
     embedding_function=embeddings,
     connection_args=connection_args,
     drop_old =True
 ).from_documents(
     documents=chunked_docs, 
     embedding=embeddings, 
     collection_name=collection_name,
     connection_args = connection_args,
 )
 
 # vectorstore = FAISS.from_documents(
 #     chunked_docs,
 #     embeddings,
 # )
 # vectorstore = Chroma.from_documents(
 #     chunked_docs,
 #     embeddings,
 # )

查看数据：

也可通过db进行相似性文档检索

 # 进行相似性检索
 ## 会查询website data
 result1 = vectorstore.similarity_search("什么是COLA架构？", k=3)
 print("=======result1=======\n" + str(result1))
 ## 会查询PDF data
 result2 = vectorstore.similarity_search("what did they say about matlab?", k=3)
 print("=======result2=======\n" + str(result2))
 ## 会查询Video data
 result3 = vectorstore.similarity_search("how do I build a RAG agent?", k=3)
 print("=======result3=======\n" + str(result3))

outputs：

 =======result1=======
 [Document(page_content='概述架构的意义 就是 要素结构：要素 是 组成架构的重要元素；结构 是 要素之间的关系。而 应用架构的意义 就在于定义一套良好的结构；治理应用复杂度，降低系统熵值；从随心所欲的混乱状态，走向井井有条的有序状态。COLA架构就是为此而生，其核心职责就是定义良好的应用结构，提供最佳应用架构的最佳实践。通过不断探索，我们发现良好的分层结构，良好的包结构定义，可以帮助我们治理混乱不堪的业务应用系统。经过多次迭代，我们定义出了相对稳定、可靠的应用架构：COLA v4COLA 架构COLA的官方博文中是这么介绍的：因为业务问题都有一定的共性。例如，典型的业务系统都需要：接收request，响应response；做业务逻辑处理，像校验参数，状态流转，业务计算等等；和外部系统有联动，像数据库，微服务，搜索引擎等；正是有这样的共性存在，才会有很多普适的架构思想出现，比如分层架构、六边形架构、洋葱圈架构、整洁架构（Clean Architecture）、DDD架构等等。这些应用架构思想虽然很好，但我们很多同学还是“不讲Co德，明白了很多道理，可还是过不好这一生”。问题就在于缺乏实践和指导。COLA的意义就在于，他不仅是思想，还提供了可落地的实践。应该是为数不多的应用架构层面的开源软件。COLA架构 区别于这些架构的地方，在于除了思想之外，我们还提供了可落地的工具和实践指导。COLA分层架构官方分层图：以及官方介绍的各层的命名和含义：层次包名功能必选Adapter层web处理页面请求的Controller否Adapter层wireless处理无线端的适配否Adapter层wap处理wap端的适配否App层executor处理request，包括command和query是App层consumer处理外部message否App层scheduler处理定时任务否Domain层model领域模型否Domain层service领域能力，包括DomainService否Domain层gateway领域网关，解耦利器是Domain层repository领域数据访问是Infra层gatewayimpl网关实现是Infra层repositoryimpl数据库访问实现是Infra层mapperibatis数据库映射否Infra层config配置信息否Client SDKapi服务对外透出的API是Client', metadata={'source': 'http://193.112.246.235:8090/archives/alibaba-cola-4.0-practices', 'pk': 448376587072256902}), Document(page_content='Alibaba COLA架构 4.0COLA 是 Clean Object-Oriented and Layered Architecture的缩写，代表“整洁面向对象分层架构”。 目前COLA已经发展到COLA v4。互联网业务项目一般会遇到如下一些普遍问题：虽然整体架构规划做的不错，但落地严重偏离，缺乏足够的抽象和弹性设计，面向流程编程。业务的工期紧、迭代快，导致代码结构混乱，几乎没有代码注释和文档，即使有项目代码规范。人员变动频繁，接手别人的老项目，新人根本没时间吃透代码结构，也很难快速了解上下文，紧迫的工期又只能让屎山越堆越大。多人协作开发，每个人的编码习惯不同，工具类代码各用个的，业务命名也经常冲突，团队成员庞大后更加影响效率。看似相同的功能，却很难加入改动，却经常听到：要写这张卡，先把之前的哪哪改了。Code Review效果不佳，很难快速了解别人的上下文，只能简单看到一些命名、设计原则或明显的实现问题。大部分团队几乎没有时间做代码重构，任由代码腐烂。或者没有动力或KPI进行代码重构。不写单元测试，或编写的大量单元测试用处不大，有新功能加入或重构时导致要修改大量的测试。每当新启动一个代码仓库，都是信心满满，结构整洁。但是时间越往后，代码就变得腐败不堪，技术债务越来越庞大…有无好的解决方案呢？也是有的：设计完善的应用架构以及代码落地规范，定期进行Review，让代码的腐烂来得慢一些。（当然很难做到完全不腐烂）定期做代码重构，解决技术债务设计尽量保持简单，让不同层级的开发都能快速看懂并上手开发，而不是在一堆复杂的没人看懂的代码上堆更多的屎山。设计尽量遵守SOLID原则，开发人员经常违反单一职责原则和开闭原则，导致代码和测试调整困难。坚持Code Review，先看模型设计，再了解实现细节。坚持编写测试，编写有效的测试，而不只是看测试覆盖率和测试数量，推荐使用TDD。Alibaba COLA架构，就是为了提供一个可落地的业务代码结构规范，让代码腐烂的尽可能慢一些，让团队的开发效率尽可能快一些。COLA 概述架构的意义 就是 要素结构：要素 是 组成架构的重要元素；结构 是 要素之间的关系。而 应用架构的意义', metadata={'source': 'http://193.112.246.235:8090/archives/alibaba-cola-4.0-practices', 'pk': 448376587072256901}), Document(page_content='Alibaba COLA架构 4.0COLA 是 Clean Object-Oriented and Layered Architecture的缩写，代表“整洁面向对象分层架构”。 目前COLA已经发展到COLA v4。互联网业务项目一般会遇到如下一些普遍问题：虽然整体架构规划做的不错，但落地严重偏离，缺乏足够的抽象和弹性设计，面向流程编程。业务的工期紧、迭代快，导致代码结构混乱，几乎没有代码注释和文档，即使有项目代码规范。人员变动频繁，接手别人的老项目，新人根本没时间吃透代码结构，也很难快速了解上下文，紧迫的工期又只能让屎山越堆越大。多人协作开发，每个人的编码习惯不同，工具类代码各用个的，业务命名也经常冲突，团队成员庞大后更加影响效率。看似相同的功能，却很难加入改动，却经常听到：要写这张卡，先把之前的哪哪改了。Code', metadata={'source': 'http://193.112.246.235:8090/archives/alibaba-cola-4.0-practices', 'pk': 448376587072255954})]
 =======result2=======
 [Document(page_content="those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people call it a free ve rsion of MATLAB, which it sort  of is, sort of isn't.  \nSo I guess for those of you that haven't s een MATLAB before, and I know most of you \nhave, MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to", metadata={'source': 'data/MachineLearning-Lecture01.pdf', 'pk': 448376587072256033}), Document(page_content='machine learning stuff was actually useful. So what was it that you learned? Was it \nlogistic regression? Was it the PCA? Was it the data ne tworks? What was it that you \nlearned that was so helpful?" And the student said, "Oh, it was the MATLAB."  \nSo for those of you that don\'t know MATLAB yet, I hope you do learn it. It\'s not hard, \nand we\'ll actually have a short MATLAB tutori al in one of the discussion sections for \nthose of you that don\'t know it.', metadata={'source': 'data/MachineLearning-Lecture01.pdf', 'pk': 448376587072256037}), Document(page_content="those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people call it a free ve rsion of MATLAB, which it sort  of is, sort of isn't.  \nSo I guess for those of you that haven't s een MATLAB before, and I know most of you \nhave, MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to \nplot data. And it's sort of an extremely easy to  learn tool to use for implementing a lot of \nlearning algorithms.  \nAnd in case some of you want to work on your  own home computer or something if you \ndon't have a MATLAB license, for the purposes of  this class, there's also — [inaudible] \nwrite that down [inaudible] MATLAB — there' s also a software package called Octave \nthat you can download for free off the Internet. And it has somewhat fewer features than MATLAB, but it's free, and for the purposes of  this class, it will work for just about \neverything.", metadata={'source': 'data/MachineLearning-Lecture01.pdf', 'pk': 448376587072256942})]
 =======result3=======
 [Document(page_content="this is a blog post we wrote kind of a while back talking about Rag and the various approaches to improve rag um and I'll of course share this link but if we look at this diagram here there's a few different places that we can think about kind of improving our rag chain so initially we can think about like if we have a raw user question we can transform it in some way um we can query you can use routing to query different databases we can use Query construction if you want to go from like natur language to", metadata={'source': 'EhlPDL4QrWY', 'pk': 448376587072256321}), Document(page_content="hi this is Lance from the Lang chain team and today we're going to be building and deploying a rag app using pine con serval list from scratch so we're going to kind of walk through all the code required to do this and I'll use these slides as kind of a guide to kind of lay the the ground work um so first what is rag so under capoy has this pretty nice visualization that shows LMS as a kernel of a new kind of operating system and of course one of the core components of our operating system is the ability to", metadata={'source': 'EhlPDL4QrWY', 'pk': 448376587072256302}), Document(page_content="hi this is Lance from the Lang chain team and today we're going to be building and deploying a rag app using pine con serval list from scratch so we're going to kind of walk through all the code required to do this and I'll use these slides as kind of a guide to kind of lay the the ground work um so first what is rag so under capoy has this pretty nice visualization that shows LMS as a kernel of a new kind of operating system and of course one of the core components of our operating system is the ability to connect like your CPU or in this case an LM uh to dis or in this case say a vector store some kind of data storage that contains information that you want to pass into the llm and llms have this notion of a context window which typically pass in like say prompts but of course we can also pass in information retrieved from external sources now one of the most popular external sources are vector stores and these have really nice properties like the ability to perform semantic similarity search so you can", metadata={'source': 'EhlPDL4QrWY', 'pk': 448376587072257089})]

接下来结合LLM进行QnA。

4 构建合适的提示词，选择基础大模型

现在有很多关于提示词工程的文章，这里不赘述了。现在有很多获取提示词的网站，如：LangChain Hub、Hugging Face Datasets等

脚本：

 #from langchain import hub
 from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
 from langchain_openai import ChatOpenAI
 
 # 构建prompt 和 llm基础模型
 ## 从标准库中构建提示词
 ## prompt = hub.pull("rlm/rag-prompt")
 ## prompt = hub.pull("hwchase17/openai-functions-agent")
 ## prompt = hub.pull("hwchase17/openai-tools-agent")
 ## print(prompt.messages)
 prompt = ChatPromptTemplate.from_messages(
     [
         ("system", "你是问答任务的助手。使用以下检索到的上下文来回答问题。如果你不知道答案，就说你不知道。"),
         MessagesPlaceholder(variable_name="chat_history", optional=True),
         ("human", "{question}"),
         MessagesPlaceholder(variable_name='agent_scratchpad'),
     ]
 )
 
 llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

5 构建Tool

使用create_retriever_tool 创建tool时，其中description字段一定要认真编写，这个会影响后面langchain采用哪个tool，不然对于自然语言处理时选择错误的tool会得到不合理的结果。

脚本：

 from langchain.tools.retriever import create_retriever_tool
 
 ## 构建检索器
 retriever = vectorstore.as_retriever()
 
 retriever_tool = create_retriever_tool(
     retriever,
     "db_content_retriever_tool",
     "a retriever tool for COLA架构、MachineLearning、RAG、Langchain",
 )
 tools = [retriever_tool]

6 构建Agent

 from langchain.agents import AgentExecutor, create_openai_tools_agent
 
 agent = create_openai_tools_agent(llm, tools, prompt)
 agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

7 验证

验证一：

 agent_executor.invoke({"question": "这篇COLA博客的主题是讲什么？"})

outputs：

 > Entering new AgentExecutor chain...
 
 Invoking: `db_content_retriever_tool` with `{'query': 'COLA博客主题'}`
 ...
 
 {'question': '这篇COLA博客的主题是讲什么？',
  'output': '这篇COLA博客的主题是"Alibaba COLA 4.0 架构实践"。'}

验证二：

 agent_executor.invoke({"question": "COLA架构分为几层？"})

outputs:

 > Entering new AgentExecutor chain...
 
 Invoking: `db_content_retriever_tool` with `{'query': 'COLA architecture layers'}`
 
 ...
 
 {'question': 'COLA架构分为几层？',
  'output': 'COLA架构分为以下几层：\n1. Adapter层：包括处理页面请求的Controller、处理无线端的适配、处理wap端的适配\n2. App层：包括处理request的executor、处理外部message的consumer、处理定时任务的scheduler\n3. Domain层：包括领域模型model、领域能力的service、领域网关gateway、领域数据访问的repository\n4. Infra层：包括网关实现gatewayimpl、数据库访问实现repositoryimpl、数据库映射的mapperibatis、配置信息的config\n5. Client SDK：包括对外透出的API服务和领域对象的方法对App层提供业务实体和业务逻辑计算\n\n这些层级分别承担着不同的职责和功能，以实现整体架构的良好结构和治理应用复杂度。'}

验证三：

 agent_executor.invoke({"question": "使用COLA架构的好处有哪些？COLA架构和DDD架构以及六边形架构的区别是什么？请用100字以内进行总结。"})

outputs：

 > Entering new AgentExecutor chain...
 
 Invoking: `db_content_retriever_tool` with `{'query': 'Advantages of COLA architecture'}`
 
 ...
 
 {'question': '使用COLA架构的好处有哪些？COLA架构和DDD架构以及六边形架构的区别是什么？请用100字以内进行总结。',
  'output': 'COLA架构的好处包括提供业务实体和业务领域的方法，为应用层提供业务实体和业务逻辑计算。领域是应用的核心，不依赖任何其他层次；基础实施层负责技术细节处理，如数据库的CRUD、搜索引擎、文件系统、分布式服务的RPC等。COLA架构的优点在于领域和功能分包策略，能将混乱控制在该业务领域内。COLA与DDD和六边形架构的区别在于COLA更注重业务实体的分层和业务逻辑的分包，提供清晰的架构规范和代码质量，使团队开发效率更高。'}

验证四：

 agent_executor.invoke({"question": "domain层和app层的区别？"})

outputs:

 > Entering new AgentExecutor chain...
 
 Invoking: `db_content_retriever_tool` with `{'query': 'domain层和app层的区别'}`
 
 ...
 
 {'question': 'domain层和app层的区别？',
  'output': '在应用程序的架构中，Domain层和App层有以下区别：\n\n1. App层（Application Layer）：App层是服务的实现层，存放了各个业务的实现类，并严格按照业务分包。在App层中，按照业务领域分包，然后按照功能实现分包。App层包括以下三种功能：\n   - Executor：处理请求，包括命令（command）和查询（query）。\n   - Consumer：处理外部消息。\n   - Scheduler：处理定时任务。\n   - Converter：处理数据对象之间的转换。\n\n2. Domain层（Domain Layer）：Domain层是应用的核心层，不依赖任何其他层次。在Domain层中，按照不同的领域（如customer和order）进行分包。Domain层包括以下主要文件类型：\n   - 领域实体（Domain Entity）：实体模型可以是充血模型，例如Customer.java中定义了Customer实体的属性和方法。\n   - 领域服务（Domain Service）：提供业务逻辑计算。\n   - 领域网关（Domain Gateway）：用于解耦和访问外部资源。\n\n总的来说，App层主要负责处理请求、消息和定时任务等与外部交互相关的功能，而Domain层则是应用的核心，包含业务实体、业务逻辑计算和领域服务等。两者在架构中扮演不同的角色，分工明确。'}

8 自定义Tool

通过自定义Tool来增强我们的AI Agent。

 from langchain_core.tools import tool
 
 @tool
 def multiply(first_int: int, second_int: int) -> int:
     """Multiply two integers together."""
     return first_int * second_int
 
 
 @tool
 def add(first_int: int, second_int: int) -> int:
     "Add two integers."
     return first_int + second_int
 
 
 @tool
 def exponentiate(base: int, exponent: int) -> int:
     "Exponentiate the base to the exponent power."
     return base**exponent
 
 import wikipedia
 from pydantic import BaseModel, Field
 
 @tool
 def search_wikipedia(query: str) -> str:
     """Run Wikipedia search and get page summaries."""
     page_titles = wikipedia.search(query)
     summaries = []
     for page_title in page_titles[: 3]:
         try:
             wiki_page =  wikipedia.page(title=page_title, auto_suggest=False)
             summaries.append(f"Page: {page_title}\nSummary: {wiki_page.summary}")
         except (
                 self.wiki_client.exceptions.PageError,
                 self.wiki_client.exceptions.DisambiguationError,
         ):
             pass
     if not summaries:
         return "No good Wikipedia Search Result was found"
     return "\n\n".join(summaries)

重新构建Agent：

 # 使用所有的tool
 tools = [retriever_tool, multiply, add, exponentiate, search_wikipedia]
 agent = create_openai_tools_agent(llm, tools, prompt)
 agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

9 再次验证

输出中包含选择的tool名称，以及结果。

验证一：

 agent_executor.invoke({"question": "1+1=?"})

outputs：

 > Entering new AgentExecutor chain...
 
 Invoking: `add` with `{'first_int': 1, 'second_int': 1}`
 
 ...
 
 {'question': '1+1=?', 'output': '1 + 1 equals 2.'}

验证二：

 gent_executor.invoke({"question": "6*8=?"})

outputs：

 > Entering new AgentExecutor chain...
 
 Invoking: `multiply` with `{'first_int': 6, 'second_int': 8}`
 
 ...
 
 {'question': '6*8=?', 'output': '6 multiplied by 8 is equal to 48.'}

验证三：

这里演示了一个错误❌示例，理论上这里问的问题应该选择db_content_retriever_tool 中pdf材料中的上下文进行回答，但是这里选择了search_wikipedia Tool，就是因为前面说的构建tool时，给的描述不够准确，导致langchain选择tool时出现问题。

 agent_executor.invoke({"question": "what did they say about matlab?"})

outputs：

 > Entering new AgentExecutor chain...
 
 Invoking: `search_wikipedia` with `{'query': 'Matlab'}`
 
 ...
 
 {'question': 'what did they say about matlab?',
  'output': 'MATLAB is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. It allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages. MATLAB has more than four million users worldwide, coming from various backgrounds of engineering, science, and economics. Additionally, MATLAB has an optional toolbox that uses the MuPAD symbolic engine for symbolic computing abilities, and Simulink, a graphical multi-domain simulation and model-based design tool for dynamic and embedded systems. Simulink is a MATLAB-based graphical programming environment for modeling, simulating, and analyzing multidomain dynamical systems.'}

验证四：

这里做了调整后，不在选择wiki tool进行问题回答了。

 agent_executor.invoke({"question": "how do I build a RAG agent?"})

outputs：

 > Entering new AgentExecutor chain...
 
 Invoking: `db_content_retriever_tool` with `{'query': 'RAG agent'}`
 
 ...
 
 {'question': 'how do I build a RAG agent?',
  'output': 'Building a RAG (Retrieval-Augmented Generation) agent involves several steps. Here are some key points from the retrieved content:\n\n1. **Understanding RAG**: RAG is like a kernel of a new operating system that connects language models (LMs) to external data sources for information retrieval.\n\n2. **Components of RAG**: RAG consists of the ability to connect LMs to external data sources, such as vector stores, which allow semantic similarity searches.\n\n3. **RAG Architecture**: The RAG bot typically starts with a retrieval step before responding. This architecture is simpler and more streamlined, focusing on a single retrieval step rather than iterative searches.\n\n4. **Implementation**: To build a RAG agent, you would need to set up the necessary code and infrastructure to enable retrieval and generation processes within the agent.\n\nIf you need more detailed information or specific steps on building a RAG agent, further resources or guidance may be required.'}

验证五：

这里我们选择一个新的名词，这次会选择Wikipedia进行回答。

 agent_executor.invoke({"question": "领域驱动设计是什么？"})

outputs：

 > Entering new AgentExecutor chain...
 
 Invoking: `search_wikipedia` with `{'query': '领域驱动设计'}`
 
 ...
 
 {'question': '领域驱动设计是什么？',
  'output': '领域驱动设计（Domain-Driven Design，DDD）是一种软件开发方法论，旨在通过将软件系统建模为领域模型来解决复杂业务领域的设计问题。领域驱动设计强调对业务领域的深入理解，并将这些领域知识直接映射到软件设计中。通过使用领域模型来描述业务领域的概念和关系，开发团队能够更好地沟通、理解和实现业务需求，从而提高软件系统的质量和可维护性。'}

10 加入消息历史记忆

 from langchain.memory import ChatMessageHistory
 from langchain_core.runnables.history import RunnableWithMessageHistory
 
 # 使用内存存储，也可以使用外部系统存储，如：redis等
 chat_history_for_chain = ChatMessageHistory()
 
 conversational_agent_executor = RunnableWithMessageHistory(
     agent_executor,
     lambda session_id: chat_history_for_chain,
     input_messages_key="question",
     output_messages_key="output",
     history_messages_key="chat_history",
 )

验证：

 conversational_agent_executor.invoke(
     {
         "question": "I'm Eric!",
     },
     {"configurable": {"session_id": "session_id_001"}},
 )

 > Entering new AgentExecutor chain...
 Hello Eric! How can I assist you today?
 
 > Finished chain.
 
 {'question': "I'm Eric!",
  'chat_history': [],
  'output': 'Hello Eric! How can I assist you today?'}

 conversational_agent_executor.invoke(
     {
         "question": "What is my name?",
     },
     {"configurable": {"session_id": "session_id_001"}},
 )

 > Entering new AgentExecutor chain...
 Your name is Eric!
 
 > Finished chain.
 
 {'question': 'What is my name?',
  'chat_history': [HumanMessage(content="I'm Eric!"),
   AIMessage(content='Hello Eric! How can I assist you today?')],
  'output': 'Your name is Eric!'}

有很多加入memory的方式，这里使用无限制内存的方式，越来越多的历史上下文传递，也会造成token消耗剧增，可以在当前chain之间加入一个新的chain使用llm对历史记录进行summary。

11 使用Gradio构建简单UI访问你的Agent

 def generate(question):
     response = agent_executor.invoke({"question": question})
     return response.get("output")
 
 import gradio as gr
 with gr.Blocks() as demo:
     gr.Markdown("# Gradio Demo UI 🖍️")
     input_text = gr.Text(label="Your Input")
     btn = gr.Button("Submit")
     result = gr.Textbox(label="Generated Result")
 
     btn.click(fn=generate, inputs=[input_text], outputs=[result])
 
 gr.close_all()
 demo.launch()

验证：

12 暴露出Restful API，可自定义前端UI

 from fastapi import FastAPI
 from langchain.pydantic_v1 import BaseModel, Field
 from langchain_core.messages import BaseMessage
 from langserve import add_routes
 
 # App definition
 app = FastAPI(
     title="LangChain Server",
     version="1.0",
     description="A simple API server using LangChain's Runnable interfaces",
 )
 
 # Adding chain route
 
 # We need to add these input/output schemas because the current AgentExecutor
 # is lacking in schemas.
 
 class Input(BaseModel):
     input: str
     chat_history: List[BaseMessage] = Field(
         ...,
         extra={"widget": {"type": "chat", "input": "location"}},
     )
 
 
 class Output(BaseModel):
     output: str
 
 add_routes(
     app,
     agent_executor.with_types(input_type=Input, output_type=Output),
     path="/agent",
 )
 
 if __name__ == "__main__":
     import uvicorn
 
     uvicorn.run(app, host="localhost", port=8000)

将之前的ipynb代码加上以上代码转成python文件serve-demo.py, 然后执行：

 python serve-demo.py

访问 http://localhost:8000/docs：

13 链路追踪

需要申请下LangChain的api key，地址：https://smith.langchain.com/ 进行注册。

 export LANGCHAIN_TRACING_V2="true"
 export LANGCHAIN_API_KEY=os.getenv('LANGCHAIN_API_KEY')

一些自己的思考

现在很多公司都在卷AI，各种AI平台和工具层出不穷，试用过各种海外和国内的产品，文生文，文生图，图生文等等，在体验这些产品和工具的同时，确实都是非常的强大，那么我作为程序员学习AIGC知识，不时会冒出这些念头：

能否赶得上时代的潮流？现在的工具，框架和产品迭代很快，而且很多基础之前没有进行系统学习，学习门槛比较高，没有参与成熟落地项目。
对于当前自己高认知工作能够提升多少？确实在信息检索方面得到了很大的提升，后续能否探索出更加匹配自身技能的场景
AI生成的结果如何企业级、高质量的进行评估正确性？例如：生成的SQL和PlantUML回过头来还是要人工进行调整；自然语言的相似性回答本来就有很多不确定性。

在后续持续学习相关领域知识的过程中，看能否逐渐解惑。

参考文献

Introduction | 🦜️🔗 Langchain

Quickstart (gradio.app)

🤗 Transformers (huggingface.co)

Pinecone

Milvus documentation

🧪 Usage Guide | Chroma (trychroma.com)

LLM Powered Autonomous Agents | Lil'Log (lilianweng.github.io)

AIGC

License: CC BY 4.0