Introduction
The Knowledge Graph Generator (KGG) is an AI-powered application that transforms unstructured text into a structured knowledge graph. By identifying key entities and their relationships, KGG provides a visual and interactive representation of information, aiding in understanding, exploration, and further analysis. The project is built using Python, HuggingFace Transformers, and PyVis, and is designed to run seamlessly in Google Colab.
Approach & Methodology
The core approach of KGG is unique in that, instead of relying on external APIs to access large language models (LLMs), the LLM is loaded and run entirely locally on the user's machine. This ensures data privacy, removes dependency on internet connectivity, and avoids API usage costs. However, running a state-of-the-art LLM such as Open-Orca/Mistral-7B-OpenOrca locally presents significant hardware challenges, as these models typically require more than 15GB of GPU memory.
To overcome this, quantization techniques are employed. Specifically, the model is loaded using BitsAndBytesConfig
with 4-bit quantization (nf4), drastically reducing the memory footprint and enabling efficient inference even on consumer-grade GPUs. This allows the entire pipeline—from prompt engineering to entity extraction and graph visualization—to be executed locally, making the solution both powerful and accessible.
- Model Loading and Quantization: The Open-Orca/Mistral-7B-OpenOrca model is loaded locally using HuggingFace Transformers with 4-bit quantization (nf4) for efficient inference. This is achieved using the
BitsAndBytesConfig
for memory and speed optimization, making it feasible to run the model on hardware with limited GPU resources. - Prompt Engineering: A system prompt instructs the model to extract entities and relationships from the context and output them in a JSON format with fields:
node1
,node2
, andrelationship
. - Text Processing: The user-provided text is formatted into a prompt and passed to the locally running model. The model generates a response, which is parsed to extract the JSON array of relationships.
- Knowledge Graph Construction: The extracted entities and relationships are used to build a graph using PyVis, where nodes represent entities and edges represent relationships.
- Visualization: The resulting graph is rendered as interactive HTML, allowing users to explore the knowledge graph visually within the notebook or exported as a standalone HTML file.
Features
- Automated Entity and Relationship Extraction: Utilizes a large language model to identify and extract entities and their relationships from arbitrary text.
- Interactive Visualization: Generates an interactive knowledge graph using PyVis, allowing users to explore nodes and edges dynamically.
- Efficient Model Inference: Employs 4-bit quantization for the language model, reducing memory usage and improving inference speed.
- Google Colab Integration: Designed for easy use in Google Colab, with a user-friendly interface for input and visualization.
- Customizable Prompts: The prompt engineering approach allows for flexible extraction of different types of relationships as needed.
Applications
- Information Retrieval and Organization
- Data Management: Organize large volumes of textual data into structured, interconnected entities. Useful for creating databases or enhancing existing ones.
- Content Summarization: Summarize key information from long documents or articles by extracting main entities and their relationships.
- Education
- Teaching Aid: Assist educators in creating interactive teaching materials by visually representing complex subjects and their interrelations.
- Student Projects: Provide a tool for students to visualize and present their research or project findings.
- Knowledge Discovery
- Research: Aid researchers in identifying relationships between different concepts, facilitating new insights and hypothesis generation.
- Literature Reviews: Summarize findings from numerous studies by mapping out key terms and their connections.
Algorithms & Implementation
Model Loading and Quantization
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
model_name = "Open-Orca/Mistral-7B-OpenOrca"
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.config.use_cache = False
model.config.pretraining_tp = 1
Prompt Engineering and Extraction
def getprompt(text):
SYS_PROMPT = (
"You are an AI assistant tasked with extracting structured information from the context to create a knowledge graph. "
"Your goal is to identify key entities and their relationships in the context and present this information in a JSON format "
"with fields: 'node1', 'node2', and 'relationship'."
)
USER_PROMPT = f"context: ```{text}``` \\n\\n output: "
PROMPT = f"{SYS_PROMPT}\\n\\n{USER_PROMPT}"
return PROMPT
def function(text):
prompt = getprompt(text)
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=1024, num_return_sequences=1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
json_response = response.split("[")[1].split("]")[0]
json_response = "[\\n" + json_response + "]"
json_response = json.loads(json_response)
return json_response
Knowledge Graph Construction and Visualization
from pyvis.network import Network
def generate_knowledge_graph(text):
data = function(text)
net = Network(notebook=True, directed=True, cdn_resources='remote')
for relation in data:
net.add_node(relation['node1'], label=relation['node1'], title=relation['node1'])
net.add_node(relation['node2'], label=relation['node2'], title=relation['node2'])
net.add_edge(relation['node1'], relation['node2'], title=relation['relationship'], label=relation['relationship'])
net.repulsion(node_distance=180, spring_length=100)
return net.generate_html()
User Interface
The user interface is implemented using HTML and JavaScript within the Colab notebook. Users can input text into a search bar, and upon clicking the search button, the knowledge graph is generated and displayed interactively. The UI is styled for clarity and ease of use, with responsive design and dynamic feedback.


Conclusion
The Knowledge Graph Generator project demonstrates the power of combining large language models with interactive visualization tools to extract and represent structured knowledge from unstructured text. By automating the process of entity and relationship extraction and providing an intuitive interface, KGG makes knowledge discovery accessible and efficient for users in research, education, and industry.
Bibliography
- Open-Orca/Mistral-7B-OpenOrca: HuggingFace Model Card
- PyVis Documentation: https://pyvis.readthedocs.io/en/latest/
- HuggingFace Transformers: https://huggingface.co/docs/transformers/index
- Google Colab: https://colab.research.google.com/