Help Needed with Calculating Pricing for Processing Documents with Langchain #26640

DavidNavarroSaiz · 2024-09-19T01:14:31Z

DavidNavarroSaiz
Sep 19, 2024

Hi Langchain Team,

I’m working on a project where I load documents (PDF, DOCX, TXT), split them into smaller chunks using the RecursiveCharacterTextSplitter, and then convert them into graph nodes and relationships with LLMGraphTransformer to store in a graph database.

Here’s a simplified version of my process:

Load the document (different formats like PDF, DOCX, TXT).
Split the document into chunks using RecursiveCharacterTextSplitter (chunk size: 1500, overlap: 30).
Extract nodes and relationships using LLMGraphTransformer.
Store the nodes and relationships in a graph database (e.g., Neo4j).
I would like to calculate the cost for processing each document, considering the following:

Each chunk of text processed by the model contributes to the cost.
I’m using OpenAI’s API for the LLM transformation.
I need to understand how to calculate or estimate the pricing for each document based on its size, the number of tokens, and the number of API calls.
Questions:

Is there an existing Langchain function or utility that helps calculate costs based on the number of tokens or API calls made during the document processing?
What’s the best way to estimate or calculate costs for each document processed, especially when the document is split into multiple chunks?
I appreciate any guidance or examples on how to approach pricing for document conversion with Langchain.

Thank you in advance!

code :
`class DocumentProcessor:
def init(self, llm, allowed_nodes, allowed_relationships):
self.llm = llm
self.allowed_nodes = allowed_nodes
self.allowed_relationships = allowed_relationships

def load_document(self, doc_path):
    """
    Load the document based on its format (PDF, DOCX, TXT)
    """
    if doc_path.endswith(".pdf"):
        loader = PyMuPDFLoader(doc_path)
    elif doc_path.endswith(".docx") or doc_path.endswith(".doc"):
        loader = Docx2txtLoader(doc_path)
    elif doc_path.endswith(".txt"):
        loader = TextLoader(doc_path)
    else:
        raise ValueError("Unsupported file format")

    return loader.load()

def process_document(self, doc_path, document_type="", topic="", user=None, case=None, process=None, num_splits=0):
    try:
        # Load the document
        print("Processing document: ", doc_path)
        doc = self.load_document(doc_path)

        # Implementing the text splitter
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=30)
        documents_split = text_splitter.split_documents(doc)

        # Initialize LLMGraphTransformer
        llm_transformer = LLMGraphTransformer(llm=self.llm, allowed_nodes=self.allowed_nodes, allowed_relationships=self.allowed_relationships)
        
        # Convert document splits into graph documents
        graph_documents = llm_transformer.convert_to_graph_documents(documents_split)

        # Here I would process the `graph_documents` to extract nodes/relationships
        # and store them in a graph database (e.g., Neo4j)
        
        return graph_documents

    except Exception as e:
        print(f"Error processing document {doc_path}: {e}")
        return None

`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help Needed with Calculating Pricing for Processing Documents with Langchain #26640

{{title}}

Replies: 0 comments

Select a reply

Help Needed with Calculating Pricing for Processing Documents with Langchain #26640

DavidNavarroSaiz Sep 19, 2024

Replies: 0 comments

DavidNavarroSaiz
Sep 19, 2024