SuperSCC.rag.SimpleRAG

class SuperSCC.rag.SimpleRAG(file_path: str, file_type: str)[source]

A class that encapsulates a complete Retrieval-Augmented Generation (RAG) pipeline, from data loading and processing to answer generation and citation

Parameters:

(str) (file_type) – The path to the file or the root directory containing the documents to be processed.
(str) – The file extension to look for (e.g., “pdf”, “csv”). This determines which files are loaded.

__init__(file_path: str, file_type: str)[source]

Methods

`__init__`(file_path, file_type)
`add_documents`(file_path, file_type[, ...])
`change_text_embedding`(model_name[, ...])
`create_rag_chain`(vector_store, model, ...[, ...])
`data_loader`(file_path[, mode, metadata_columns])
`format_docs`(docs)
`get_all_ids`()
`get_answer`(gene_list[, query, ...])	The main entry point for asking a question.
`get_relevant_segments`()
`highlight_docs`()
`hybrid_search`([hierarchy_search, key, value])
`recursive_search`(path[, type])
`refine_query`()
`rerank`([model, top_n])
`run_rag`(qdrant_location, ...[, qdrant_host, ...])	Executes the entire RAG pipeline from scratch: loading, splitting, encoding, and creating the chain.
`score_documents`([docs])
`summary_res`(res)
`text_encode`(text, model_name, location[, ...])
`text_split`(docs[, chunk_size, ...])
`translator`([query])
`update_rag_chain`([model, api_key, base_url, ...])	Updates components of the existing RAG chain, such as the LLM or prompt.