Knowledge Graph Generator

Ruhaan Choudhary
Ritesh Kumar

GitHub Repository
Hackathon Submission (Devpost)

Abstract

The Knowledge Graph Generator (KGG) is a tool designed to extract structured information from unstructured text and visualize it as a knowledge graph. Leveraging large language models and modern visualization libraries, KGG enables users to input any text and receive an interactive graph of entities and their relationships. The project is implemented in Python using HuggingFace Transformers, PyVis, and Google Colab for an accessible and interactive experience.

Introduction

The Knowledge Graph Generator (KGG) is an AI-powered application that transforms unstructured text into a structured knowledge graph. By identifying key entities and their relationships, KGG provides a visual and interactive representation of information, aiding in understanding, exploration, and further analysis. The project is built using Python, HuggingFace Transformers, and PyVis, and is designed to run seamlessly in Google Colab.

Approach & Methodology

The core approach of KGG is unique in that, instead of relying on external APIs to access large language models (LLMs), the LLM is loaded and run entirely locally on the user's machine. This ensures data privacy, removes dependency on internet connectivity, and avoids API usage costs. However, running a state-of-the-art LLM such as Open-Orca/Mistral-7B-OpenOrca locally presents significant hardware challenges, as these models typically require more than 15GB of GPU memory.

To overcome this, quantization techniques are employed. Specifically, the model is loaded using BitsAndBytesConfig with 4-bit quantization (nf4), drastically reducing the memory footprint and enabling efficient inference even on consumer-grade GPUs. This allows the entire pipeline—from prompt engineering to entity extraction and graph visualization—to be executed locally, making the solution both powerful and accessible.

  1. Model Loading and Quantization: The Open-Orca/Mistral-7B-OpenOrca model is loaded locally using HuggingFace Transformers with 4-bit quantization (nf4) for efficient inference. This is achieved using the BitsAndBytesConfig for memory and speed optimization, making it feasible to run the model on hardware with limited GPU resources.
  2. Prompt Engineering: A system prompt instructs the model to extract entities and relationships from the context and output them in a JSON format with fields: node1, node2, and relationship.
  3. Text Processing: The user-provided text is formatted into a prompt and passed to the locally running model. The model generates a response, which is parsed to extract the JSON array of relationships.
  4. Knowledge Graph Construction: The extracted entities and relationships are used to build a graph using PyVis, where nodes represent entities and edges represent relationships.
  5. Visualization: The resulting graph is rendered as interactive HTML, allowing users to explore the knowledge graph visually within the notebook or exported as a standalone HTML file.

Features

Applications

Algorithms & Implementation

Model Loading and Quantization


from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model_name = "Open-Orca/Mistral-7B-OpenOrca"
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.config.use_cache = False
model.config.pretraining_tp = 1
            

Prompt Engineering and Extraction


def getprompt(text):
    SYS_PROMPT = (
        "You are an AI assistant tasked with extracting structured information from the context to create a knowledge graph. "
        "Your goal is to identify key entities and their relationships in the context and present this information in a JSON format "
        "with fields: 'node1', 'node2', and 'relationship'."
    )
    USER_PROMPT = f"context: ```{text}``` \\n\\n output: "
    PROMPT = f"{SYS_PROMPT}\\n\\n{USER_PROMPT}"
    return PROMPT

def function(text):
    prompt = getprompt(text)
    inputs = tokenizer.encode(prompt, return_tensors="pt")
    outputs = model.generate(inputs, max_length=1024, num_return_sequences=1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    json_response = response.split("[")[1].split("]")[0]
    json_response = "[\\n" + json_response + "]"
    json_response = json.loads(json_response)
    return json_response
            

Knowledge Graph Construction and Visualization


from pyvis.network import Network

def generate_knowledge_graph(text):
    data = function(text)
    net = Network(notebook=True, directed=True, cdn_resources='remote')
    for relation in data:
        net.add_node(relation['node1'], label=relation['node1'], title=relation['node1'])
        net.add_node(relation['node2'], label=relation['node2'], title=relation['node2'])
        net.add_edge(relation['node1'], relation['node2'], title=relation['relationship'], label=relation['relationship'])
    net.repulsion(node_distance=180, spring_length=100)
    return net.generate_html()
            

User Interface

The user interface is implemented using HTML and JavaScript within the Colab notebook. Users can input text into a search bar, and upon clicking the search button, the knowledge graph is generated and displayed interactively. The UI is styled for clarity and ease of use, with responsive design and dynamic feedback.

Knowledge Graph Generator UI
Knowledge Graph Generator User Interface
Example Knowledge Graph
Example Output: Generated Knowledge Graph

Conclusion

The Knowledge Graph Generator project demonstrates the power of combining large language models with interactive visualization tools to extract and represent structured knowledge from unstructured text. By automating the process of entity and relationship extraction and providing an intuitive interface, KGG makes knowledge discovery accessible and efficient for users in research, education, and industry.

Bibliography