-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from upstash/file-loader-support
feat: add file loader support - WIP
- Loading branch information
Showing
23 changed files
with
499 additions
and
290 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
#!/usr/bin/env sh | ||
. "$(dirname -- "$0")/_/husky.sh" | ||
|
||
bun --no -- commitlint --edit "" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Username; Identifier;First name;Last name | ||
booker12;9012;Rachel;Booker | ||
grey07;2070;Laura;Grey | ||
johnson81;4081;Craig;Johnson | ||
jenkins46;9346;Mary;Jenkins | ||
smith79;5079;Jamie;Smith | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
<!doctype html> | ||
<html lang="en"> | ||
<head> | ||
<meta charset="UTF-8" /> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> | ||
<title>The Wonderful Wizard of Oz - Summary</title> | ||
</head> | ||
<body> | ||
<header> | ||
<h1>The Wonderful Wizard of Oz</h1> | ||
<p><em>By L. Frank Baum</em></p> | ||
</header> | ||
<main> | ||
<section> | ||
<h2>Summary</h2> | ||
<p> | ||
The Wonderful Wizard of Oz is a children's novel written by L. Frank Baum. The story | ||
follows a young girl named Dorothy who lives on a Kansas farm with her Aunt Em, Uncle | ||
Henry, and her little dog Toto. A cyclone hits, and Dorothy and Toto are swept away to the | ||
magical land of Oz. | ||
</p> | ||
<p> | ||
In Oz, Dorothy meets the Good Witch of the North and is given silver shoes and a | ||
protective kiss. She is advised to follow the Yellow Brick Road to the Emerald City to | ||
seek the help of the Wizard of Oz to return home. Along her journey, she befriends the | ||
Scarecrow, who desires a brain, the Tin Woodman, who longs for a heart, and the Cowardly | ||
Lion, who seeks courage. | ||
</p> | ||
<p> | ||
The group faces various challenges but eventually reaches the Emerald City. The Wizard | ||
appears in different forms to each of them and agrees to grant their wishes if they kill | ||
the Wicked Witch of the West. They manage to defeat the Witch by melting her with water. | ||
</p> | ||
<p> | ||
Upon their return to the Emerald City, they discover the Wizard is an ordinary man from | ||
Omaha. He grants their wishes through symbolic means: the Scarecrow gets a brain made of | ||
bran, the Tin Woodman gets a silk heart, and the Lion receives a potion for courage. | ||
Dorothy learns that the silver shoes can take her home. She clicks her heels together and | ||
returns to Kansas, where she is joyfully reunited with her family. | ||
</p> | ||
</section> | ||
</main> | ||
<footer> | ||
<p>© 2024 The Wonderful Wizard of Oz Summary</p> | ||
</footer> | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
|
||
The Wonderful Wizard of Oz is a children's novel written by L. Frank Baum. The story follows a young girl named Dorothy who lives on a Kansas farm with her Aunt Em, Uncle Henry, and her little dog Toto. A cyclone hits, and Dorothy and Toto are swept away to the magical land of Oz. | ||
|
||
In Oz, Dorothy meets the Good Witch of the North and is given silver shoes and a protective kiss. She is advised to follow the Yellow Brick Road to the Emerald City to seek the help of the Wizard of Oz to return home. Along her journey, she befriends the Scarecrow, who desires a brain, the Tin Woodman, who longs for a heart, and the Cowardly Lion, who seeks courage. | ||
|
||
The group faces various challenges but eventually reaches the Emerald City. The Wizard appears in different forms to each of them and agrees to grant their wishes if they kill the Wicked Witch of the West. They manage to defeat the Witch by melting her with water. | ||
|
||
Upon their return to the Emerald City, they discover the Wizard is an ordinary man from Omaha. He grants their wishes through symbolic means: the Scarecrow gets a brain made of bran, the Tin Woodman gets a silk heart, and the Lion receives a potion for courage. Dorothy learns that the silver shoes can take her home. She clicks her heels together and returns to Kansas, where she is joyfully reunited with her family. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
export * from "./src/rag-chat"; | ||
export * from "./src/services"; | ||
export * from "./src/history"; | ||
export * from "./src/database"; | ||
export * from "./src/ratelimit"; | ||
export * from "./src/error"; | ||
export * from "./src/types"; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
import type { WebBaseLoaderParams } from "@langchain/community/document_loaders/web/cheerio"; | ||
import type { Index } from "@upstash/vector"; | ||
import type { RecursiveCharacterTextSplitterParams } from "langchain/text_splitter"; | ||
import { nanoid } from "nanoid"; | ||
import { DEFAULT_SIMILARITY_THRESHOLD, DEFAULT_METADATA_KEY, DEFAULT_TOP_K } from "./constants"; | ||
import { FileDataLoader } from "./file-loader"; | ||
import { formatFacts } from "./utils"; | ||
|
||
export type IndexUpsertPayload = { input: number[]; id?: string | number; metadata?: string }; | ||
export type FilePath = string; | ||
export type URL = string; | ||
|
||
export type DatasWithFileSource = | ||
| { | ||
dataType: "pdf"; | ||
fileSource: FilePath | Blob; | ||
opts?: Partial<RecursiveCharacterTextSplitterParams>; | ||
pdfOpts?: { parsedItemSeparator?: string; splitPages?: boolean }; | ||
} | ||
| { | ||
dataType: "csv"; | ||
fileSource: FilePath | Blob; | ||
csvOpts?: { column?: string; separator?: string }; | ||
} | ||
| { | ||
dataType: "text-file"; | ||
fileSource: FilePath | Blob; | ||
opts?: Partial<RecursiveCharacterTextSplitterParams>; | ||
} | ||
| ( | ||
| { | ||
dataType: "html"; | ||
fileSource: URL; | ||
htmlOpts?: WebBaseLoaderParams; | ||
opts: Partial<RecursiveCharacterTextSplitterParams>; | ||
} | ||
| { | ||
dataType: "html"; | ||
fileSource: FilePath | Blob; | ||
opts?: Partial<RecursiveCharacterTextSplitterParams>; | ||
} | ||
); | ||
|
||
export type AddContextPayload = | ||
| { dataType: "text"; data: string; id?: string | number } | ||
| { dataType: "embedding"; data: IndexUpsertPayload[] } | ||
| DatasWithFileSource; | ||
|
||
export type AddContextOptions = { | ||
metadataKey?: string; | ||
}; | ||
|
||
export type VectorPayload = { | ||
question: string; | ||
similarityThreshold: number; | ||
metadataKey: string; | ||
topK: number; | ||
}; | ||
|
||
export class Database { | ||
private index: Index; | ||
constructor(index: Index) { | ||
this.index = index; | ||
} | ||
|
||
/** | ||
* A method that allows you to query the vector database with plain text. | ||
* It takes care of the text-to-embedding conversion by itself. | ||
* Additionally, it lets consumers pass various options to tweak the output. | ||
*/ | ||
async retrieve({ | ||
question, | ||
similarityThreshold = DEFAULT_SIMILARITY_THRESHOLD, | ||
metadataKey = DEFAULT_METADATA_KEY, | ||
topK = DEFAULT_TOP_K, | ||
}: VectorPayload): Promise<string> { | ||
const index = this.index; | ||
const result = await index.query<Record<string, string>>({ | ||
data: question, | ||
topK, | ||
includeMetadata: true, | ||
includeVectors: false, | ||
}); | ||
|
||
const allValuesUndefined = result.every( | ||
(embedding) => embedding.metadata?.[metadataKey] === undefined | ||
); | ||
|
||
if (allValuesUndefined) { | ||
throw new TypeError(` | ||
Query to the vector store returned ${result.length} vectors but none had "${metadataKey}" field in their metadata. | ||
Text of your vectors should be in the "${metadataKey}" field in the metadata for the RAG Chat. | ||
`); | ||
} | ||
|
||
const facts = result | ||
.filter((x) => x.score >= similarityThreshold) | ||
.map( | ||
(embedding, index) => `- Context Item ${index}: ${embedding.metadata?.[metadataKey] ?? ""}` | ||
); | ||
return formatFacts(facts); | ||
} | ||
|
||
/** | ||
* A method that allows you to add various data types into a vector database. | ||
* It supports plain text, embeddings, PDF, HTML, Text file and CSV. Additionally, it handles text-splitting for CSV, PDF and Text file. | ||
*/ | ||
async save(input: AddContextPayload, options?: AddContextOptions): Promise<string | undefined> { | ||
const { metadataKey = "text" } = options ?? {}; | ||
|
||
if (input.dataType === "text") { | ||
return this.index.upsert({ | ||
data: input.data, | ||
id: input.id ?? nanoid(), | ||
metadata: { [metadataKey]: input.data }, | ||
}); | ||
} else if (input.dataType === "embedding") { | ||
const items = input.data.map((context) => { | ||
return { | ||
vector: context.input, | ||
id: context.id ?? nanoid(), | ||
metadata: { [metadataKey]: context.metadata }, | ||
}; | ||
}); | ||
|
||
return this.index.upsert(items); | ||
} else { | ||
const fileArgs = "pdfOpts" in input ? input.pdfOpts : "csvOpts" in input ? input.csvOpts : {}; | ||
const transformOrSplit = await new FileDataLoader(input, metadataKey).loadFile(fileArgs); | ||
|
||
const transformArgs = "opts" in input ? input.opts : {}; | ||
const transformDocuments = await transformOrSplit(transformArgs); | ||
await this.index.upsert(transformDocuments); | ||
} | ||
} | ||
} |
Oops, something went wrong.