Skip to content

Commit

Permalink
Merge pull request #6 from upstash/file-loader-support
Browse files Browse the repository at this point in the history
feat: add file loader support - WIP
  • Loading branch information
ogzhanolguncu authored May 27, 2024
2 parents d917f94 + ce73462 commit d429d10
Show file tree
Hide file tree
Showing 23 changed files with 499 additions and 290 deletions.
4 changes: 4 additions & 0 deletions .husky/commit-msg
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env sh
. "$(dirname -- "$0")/_/husky.sh"

bun --no -- commitlint --edit ""
Binary file modified bun.lockb
Binary file not shown.
7 changes: 7 additions & 0 deletions data/list_of_user_info.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Username; Identifier;First name;Last name
booker12;9012;Rachel;Booker
grey07;2070;Laura;Grey
johnson81;4081;Craig;Johnson
jenkins46;9346;Mary;Jenkins
smith79;5079;Jamie;Smith

47 changes: 47 additions & 0 deletions data/the_wonderful_wizard_of_oz_summary.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>The Wonderful Wizard of Oz - Summary</title>
</head>
<body>
<header>
<h1>The Wonderful Wizard of Oz</h1>
<p><em>By L. Frank Baum</em></p>
</header>
<main>
<section>
<h2>Summary</h2>
<p>
The Wonderful Wizard of Oz is a children's novel written by L. Frank Baum. The story
follows a young girl named Dorothy who lives on a Kansas farm with her Aunt Em, Uncle
Henry, and her little dog Toto. A cyclone hits, and Dorothy and Toto are swept away to the
magical land of Oz.
</p>
<p>
In Oz, Dorothy meets the Good Witch of the North and is given silver shoes and a
protective kiss. She is advised to follow the Yellow Brick Road to the Emerald City to
seek the help of the Wizard of Oz to return home. Along her journey, she befriends the
Scarecrow, who desires a brain, the Tin Woodman, who longs for a heart, and the Cowardly
Lion, who seeks courage.
</p>
<p>
The group faces various challenges but eventually reaches the Emerald City. The Wizard
appears in different forms to each of them and agrees to grant their wishes if they kill
the Wicked Witch of the West. They manage to defeat the Witch by melting her with water.
</p>
<p>
Upon their return to the Emerald City, they discover the Wizard is an ordinary man from
Omaha. He grants their wishes through symbolic means: the Scarecrow gets a brain made of
bran, the Tin Woodman gets a silk heart, and the Lion receives a potion for courage.
Dorothy learns that the silver shoes can take her home. She clicks her heels together and
returns to Kansas, where she is joyfully reunited with her family.
</p>
</section>
</main>
<footer>
<p>&copy; 2024 The Wonderful Wizard of Oz Summary</p>
</footer>
</body>
</html>
8 changes: 8 additions & 0 deletions data/the_wonderful_wizard_of_oz_summary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

The Wonderful Wizard of Oz is a children's novel written by L. Frank Baum. The story follows a young girl named Dorothy who lives on a Kansas farm with her Aunt Em, Uncle Henry, and her little dog Toto. A cyclone hits, and Dorothy and Toto are swept away to the magical land of Oz.

In Oz, Dorothy meets the Good Witch of the North and is given silver shoes and a protective kiss. She is advised to follow the Yellow Brick Road to the Emerald City to seek the help of the Wizard of Oz to return home. Along her journey, she befriends the Scarecrow, who desires a brain, the Tin Woodman, who longs for a heart, and the Cowardly Lion, who seeks courage.

The group faces various challenges but eventually reaches the Emerald City. The Wizard appears in different forms to each of them and agrees to grant their wishes if they kill the Wicked Witch of the West. They manage to defeat the Witch by melting her with water.

Upon their return to the Emerald City, they discover the Wizard is an ordinary man from Omaha. He grants their wishes through symbolic means: the Scarecrow gets a brain made of bran, the Tin Woodman gets a silk heart, and the Lion receives a potion for courage. Dorothy learns that the silver shoes can take her home. She clicks her heels together and returns to Kansas, where she is joyfully reunited with her family.
5 changes: 4 additions & 1 deletion index.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
export * from "./src/rag-chat";
export * from "./src/services";
export * from "./src/history";
export * from "./src/database";
export * from "./src/ratelimit";
export * from "./src/error";
export * from "./src/types";
11 changes: 7 additions & 4 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@upstash/rag-chat",
"version": "0.0.23-alpha",
"version": "0.0.24-alpha",
"main": "./dist/index.js",
"module": "./dist/index.mjs",
"types": "./dist/index.d.ts",
Expand Down Expand Up @@ -31,16 +31,16 @@
"type": "git",
"url": "https://github.com/upstash/rag-chat"
},
"license": "ISC",
"license": "MIT",
"devDependencies": {
"@commitlint/cli": "^19.2.2",
"@commitlint/config-conventional": "^19.2.2",
"@typescript-eslint/eslint-plugin": "^7.0.1",
"@typescript-eslint/parser": "^7.0.1",
"eslint-plugin-unicorn": "^51.0.1",
"bun-types": "latest",
"husky": "^9.0.10",
"eslint": "^8",
"eslint-plugin-unicorn": "^51.0.1",
"husky": "^9.0.10",
"prettier": "^3.2.5",
"tsup": "latest",
"typescript": "^5.4.5",
Expand All @@ -54,6 +54,9 @@
"@upstash/redis": "^1.31.1",
"@upstash/vector": "^1.1.1",
"ai": "^3.1.1",
"cheerio": "^1.0.0-rc.12",
"d3-dsv": "^3.0.1",
"html-to-text": "^9.0.5",
"langchain": "^0.2.0",
"nanoid": "^5.0.7",
"pdf-parse": "^1.1.1"
Expand Down
3 changes: 0 additions & 3 deletions src/constants.ts
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@
import type { PreferredRegions } from "./types";

export const DEFAULT_CHAT_SESSION_ID = "upstash-rag-chat-session";
export const DEFAULT_CHAT_RATELIMIT_SESSION_ID = "upstash-rag-chat-ratelimit-session";

export const RATELIMIT_ERROR_MESSAGE = "ERR:USER_RATELIMITED";

export const DEFAULT_VECTOR_DB_NAME = "upstash-rag-chat-vector";
export const DEFAULT_REDIS_DB_NAME = "upstash-rag-chat-redis";
export const PREFERRED_REGION: PreferredRegions = "us-east-1";

//Retrieval related default options
export const DEFAULT_SIMILARITY_THRESHOLD = 0.5;
Expand Down
106 changes: 0 additions & 106 deletions src/custom-llm.ts

This file was deleted.

136 changes: 136 additions & 0 deletions src/database.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
import type { WebBaseLoaderParams } from "@langchain/community/document_loaders/web/cheerio";
import type { Index } from "@upstash/vector";
import type { RecursiveCharacterTextSplitterParams } from "langchain/text_splitter";
import { nanoid } from "nanoid";
import { DEFAULT_SIMILARITY_THRESHOLD, DEFAULT_METADATA_KEY, DEFAULT_TOP_K } from "./constants";
import { FileDataLoader } from "./file-loader";
import { formatFacts } from "./utils";

export type IndexUpsertPayload = { input: number[]; id?: string | number; metadata?: string };
export type FilePath = string;
export type URL = string;

export type DatasWithFileSource =
| {
dataType: "pdf";
fileSource: FilePath | Blob;
opts?: Partial<RecursiveCharacterTextSplitterParams>;
pdfOpts?: { parsedItemSeparator?: string; splitPages?: boolean };
}
| {
dataType: "csv";
fileSource: FilePath | Blob;
csvOpts?: { column?: string; separator?: string };
}
| {
dataType: "text-file";
fileSource: FilePath | Blob;
opts?: Partial<RecursiveCharacterTextSplitterParams>;
}
| (
| {
dataType: "html";
fileSource: URL;
htmlOpts?: WebBaseLoaderParams;
opts: Partial<RecursiveCharacterTextSplitterParams>;
}
| {
dataType: "html";
fileSource: FilePath | Blob;
opts?: Partial<RecursiveCharacterTextSplitterParams>;
}
);

export type AddContextPayload =
| { dataType: "text"; data: string; id?: string | number }
| { dataType: "embedding"; data: IndexUpsertPayload[] }
| DatasWithFileSource;

export type AddContextOptions = {
metadataKey?: string;
};

export type VectorPayload = {
question: string;
similarityThreshold: number;
metadataKey: string;
topK: number;
};

export class Database {
private index: Index;
constructor(index: Index) {
this.index = index;
}

/**
* A method that allows you to query the vector database with plain text.
* It takes care of the text-to-embedding conversion by itself.
* Additionally, it lets consumers pass various options to tweak the output.
*/
async retrieve({
question,
similarityThreshold = DEFAULT_SIMILARITY_THRESHOLD,
metadataKey = DEFAULT_METADATA_KEY,
topK = DEFAULT_TOP_K,
}: VectorPayload): Promise<string> {
const index = this.index;
const result = await index.query<Record<string, string>>({
data: question,
topK,
includeMetadata: true,
includeVectors: false,
});

const allValuesUndefined = result.every(
(embedding) => embedding.metadata?.[metadataKey] === undefined
);

if (allValuesUndefined) {
throw new TypeError(`
Query to the vector store returned ${result.length} vectors but none had "${metadataKey}" field in their metadata.
Text of your vectors should be in the "${metadataKey}" field in the metadata for the RAG Chat.
`);
}

const facts = result
.filter((x) => x.score >= similarityThreshold)
.map(
(embedding, index) => `- Context Item ${index}: ${embedding.metadata?.[metadataKey] ?? ""}`
);
return formatFacts(facts);
}

/**
* A method that allows you to add various data types into a vector database.
* It supports plain text, embeddings, PDF, HTML, Text file and CSV. Additionally, it handles text-splitting for CSV, PDF and Text file.
*/
async save(input: AddContextPayload, options?: AddContextOptions): Promise<string | undefined> {
const { metadataKey = "text" } = options ?? {};

if (input.dataType === "text") {
return this.index.upsert({
data: input.data,
id: input.id ?? nanoid(),
metadata: { [metadataKey]: input.data },
});
} else if (input.dataType === "embedding") {
const items = input.data.map((context) => {
return {
vector: context.input,
id: context.id ?? nanoid(),
metadata: { [metadataKey]: context.metadata },
};
});

return this.index.upsert(items);
} else {
const fileArgs = "pdfOpts" in input ? input.pdfOpts : "csvOpts" in input ? input.csvOpts : {};
const transformOrSplit = await new FileDataLoader(input, metadataKey).loadFile(fileArgs);

const transformArgs = "opts" in input ? input.opts : {};
const transformDocuments = await transformOrSplit(transformArgs);
await this.index.upsert(transformDocuments);
}
}
}
Loading

0 comments on commit d429d10

Please sign in to comment.