How to Intelligently Create Nodes from PDFs Using Generative AI in Drupal

large volumes of legal documents can be tedious in any Drupal-based system.
In this tutorial, we’ll see how to combine Drupal, PDF parsing, and Generative AI to automatically extract key details from a court order PDF and create structured content nodes — no manual entry required.
As a use case we are taking a court order PDF file from below website which is available for public access https://malappuram.dcourts.gov.in/court-orders-search-by-order-date/
Using Gen AI we are going to analyse the judgement document and extract required details so that common people can easily understand the judgment.
- Overview and Use Case
- Installing Required Libraries
- Creating the Custom Import Form
- Extracting Text from PDF
- Analyzing Content with Generative AI
- Creating the Drupal Node
- Populating Taxonomy Terms
- Batch Process and Final Output
- Overview and Use Case
We are going to create a custom Drupal form that allows users to upload a judgment order PDF. When the form is submitted, Generative AI will analyse the PDF and extract the required information in JSON format, and the node will be created programmatically based on that data.
We are asking LLM to analyse this document and extract the details below.
- Case number
- Court Name
- Court Location
- Judge Name
- Order date
- Case Type
- Judgment favour – To whom Judgement favour.
- Relief Granted – what kind of relief got to the plaintiff/ defendant.
- Case Summary – short simple description about the case.
- Defendant’s details.
- Plaintiffs details.
- Why plaintiff/defendant win – Simple description why plaintiff/defendant has won the case.
- Plaintiff submitted docs – summary of documents submitted by the Plaintiff.
- Defendants submitted docs – summary of documents submitted by the defendants.
This is the PDF file.
The workflow looks like this:
- The response JSON is decoded and used to create a Drupal content node programmatically.
- User uploads a court order PDF.
- PDF is parsed using Smalot PDF Parser.
- The extracted text is sent to ChatGPT (via Drupal AI module) with a well-structured prompt.
2- Installing Required Libraries
Install Drupal AI module .https://www.drupal.org/project/ai Enable AI Core in module list.
Install Open AI provider module – https://www.drupal.org/project/ai_provider_openai
Go to Configuration -> System -> Keys provide Open AI key
Go to Configuration-> AI -> Provider Settings select Open AI authentication. Select the Key.
https://github.com/smalot/pdfparser install this PDF parser using below composer command.
composer require smalot/pdfparserSo our content type is we are naming as munsif_legal, where we insert data is having below fields.


Here entity tags is a reference to tags vocabulary.
3. Creating the Custom Import Form
Suppose our custom module is dn_import1.
create routing file in path – This code resides under modules/custom/dn_import1/dn_import.routing.yml
dn_import.import_form1:
path: '/admin/content/case-import1'
defaults:
_form: '\Drupal\dn_import1\Form\LegalDocumentImportForm'
_title: 'Import Case'
requirements:
_permission: 'administer nodes'Inside src/Form create a custom form LegalDocumentImportForm.
This code resides under modules/custom/dn_import1/src/Form/LegalDocumentImportForm.php.
Include below classes.
use Drupal\Core\Form\FormBase;
use Drupal\Core\Form\FormStateInterface;
use Smalot\PdfParser\Parser;
use Drupal\ai\OperationType\Chat\ChatInput;
use Drupal\ai\OperationType\Chat\ChatMessage;
Create form fields
Inside buildForm function, create required fields as below. We have created PDF upload field with upload location.
public function buildForm(array $form, FormStateInterface $form_state) {
$form['info'] = [
'#markup' => $this->t('<p>Upload a legal court order PDF. The AI will extract case details, parties, court information, and judgment.</p>'),
];
$form['pdf_file'] = [
'#type' => 'managed_file',
'#title' => $this->t('Upload Legal Document PDF'),
'#description' => $this->t('Select a PDF file containing a court order or legal judgment.'),
'#upload_location' => 'public://legal_documents/',
'#upload_validators' => [
'file_validate_extensions' => ['pdf'],
],
'#required' => TRUE,
];
$form['submit'] = [
'#type' => 'submit',
'#value' => $this->t('Process Legal Document'),
];
return $form;
}
Submit handler – save uploaded file
Inside submit handler, first save the uploaded file.
$fids = $form_state->getValue('pdf_file');
if (!empty($fids[0])) {
$file = \Drupal\file\Entity\File::load($fids[0]);
if ($file) {
$file->setPermanent();
$file->save();
$filepath = $file->getFileUri();
$realpath = \Drupal::service('file_system')->realpath($filepath);
}
}
4 – Extracting Text from PDF
Using below code extract texts from PDF file.
$parser = new Parser();
$pdf = $parser->parseFile($realpath);
$text = $pdf->getText();
Batch operation: Extract legal case data using AI
We use Drupal’s Batch API because AI processing and PDF parsing can take several seconds.
The batch ensures the process runs safely without PHP timeout issues.
Set the batch process to process the document.
$batch = [
'title' => $this->t('Processing Legal Document with AI...'),
'operations' => [
[[get_class($this), 'extractLegalDataWithAI'], [$text,$file]],
],
'finished' => [self::class, 'batchFinished'],
];
batch_set($batch);
5- Analyzing Content with Generative AI
Here extractLegalDataWithAI function is a batch call back function where we are extracting data with the help of AI.
We are passing extracted text and PDF link to this function
/**
* Extracts structured legal data from PDF using AI provider.
*/
public static function extractLegalDataWithAI($text,$file, array &$context) {
}
We are going the detail implementation of above function. First get the AI provider. Here we are using Chat gpt as provider which we configured in first step.
/**
* Extracts structured legal data from PDF using AI provider.
*/
public static function extractLegalDataWithAI($text,$file, array &$context) {
// Get AI provider
$sets = \Drupal::service('ai.provider')->getDefaultProviderForOperationType('chat');
$service = \Drupal::service('ai.provider');
$provider = $service->createInstance($sets['provider_id']);
}
Create comprehensive AI prompt for legal document extraction
$systemPrompt = 'You are an expert legal document analyzer. Extract structured information from the provided court order/judgment and return it as valid JSON.
Extract the following information:
1. **case_number**: The official case number (e.g., "Original Suit No. 226/2023")
2. **case_title**: Create a descriptive title in format: "Case between [Plaintiffs] vs [Defendants] regarding [matter]"
3. **case_summary**: A simple 2-3 sentence summary that anyone can understand, including who is involved and what the case is about
4. **court_name**: Full name of the court
5. **court_location**: Location/jurisdiction of the court
6. **judge_name**: Name of the presiding judge/magistrate
7. **order_date**: Date of the order (format: YYYY-MM-DD)
8. **case_type**: Type of case (e.g., "Civil Suit", "Criminal Case", "Injunction")
9. **plaintiffs**: Array of plaintiff details, each with:
- name: Full name
- age: Age if mentioned
- address: Full address
- relation: Relationship description (e.g., "W/o", "S/o")
10. **defendants**: Array of defendant details with same structure as plaintiffs
11. **entity_tags**: Array of entity types involved (check if any party is: "Government", "Devaswom Board", "Waqf Board", "Corporation", "Municipality", "Panchayat", "Temple", "Mosque", "Church", "Private Individual")
12. **judgment_summary**: Simple explanation of the final judgment in 1-2 sentences
13. **judgment_favor**: Who won - either "Plaintiff" or "Defendant" or "Partial" or "Dismissed"
14. **relief_granted**: What relief/remedy was granted by the court
15. **Why did the plaintiffs/defendants win?**: simple explanations what kind of documents submitted by the plaintiffs/defendants. Any normal people should understand what are the documents and reason for the win.
16. **Documents the plaintiffs submitted**: list of documents in simple sentences , then in detail list
17. **Documents the defendants submitted**: list of documents in simple sentences , then detail list
Return ONLY valid JSON in this exact format:
{
"case_number": "Original Suit No. 226/2023",
"case_title": "Case between Raji and Shylaja vs Mulakkal Thanka and Ramesh regarding right of way",
"case_summary": "Two plaintiffs filed suit against defendants seeking permanent injunction to prevent obstruction of their right of way to access their property.",
"court_name": "Court of the Munsiff-Magistrate",
"court_location": "Ponnani",
"judge_name": "Smt. Sowmya.T.M.",
"order_date": "2024-07-22",
"case_type": "Civil Suit - Injunction",
"plaintiffs": [
{
"name": "Raji",
"age": "46",
"address": "W/o.Unniyath Valappil Mohandas, P.O.Mookkuthala, Vadakkumuri desom, Pallikkara amsom, Ponnani Taluk - 679 574",
"relation": "Wife of Unniyath Valappil Mohandas"
}
],
"defendants": [
{
"name": "Mulakkal Thanka",
"age": "77",
"address": "W/o.Mulakkal Kesavan, P.O.Mookkuthala, Vadakkumuri desom, Pallikkara amsom, Ponnani Taluk - 679 574",
"relation": "Wife of Mulakkal Kesavan"
}
],
"entity_tags": ["Private Individual"],
"judgment_summary": "The court granted permanent injunction restraining defendants from obstructing plaintiffs use of the right of way.",
"judgment_favor": "Plaintiff",
"relief_granted": "Permanent prohibitory injunction",
"why_win": "The plaintiffs proved that the disputed property (Plaint A schedule) legally belongs to them by producing title deeds and tax receipts.They also showed that they have a pathway (C schedule property) for access to their land.",
"plaintiffs_docs": "Kanam Assignment deed No.4643/2013 (title deed), Tax receipt (proof of ownership/use).",
"defendants_docs": "Kanam Assignment deed No.3160/2011 (previous deed)."
}
Do not include any explanations, markdown formatting, or text outside the JSON object.';As you can see above, we are creating a prompt with sample case details. also providing sample response json format so that LLM can provide data in exact format.
we will get the response in $response variable as below.
$messages = new ChatInput([
new ChatMessage('system', $systemPrompt),
new ChatMessage('user', "Extract legal information from this court order:\n\n" . $text),
]);
$message = $provider->chat($messages, $sets['model_id'])->getNormalized();
$response = $message->getText();
Extract the data from response json.
// Clean response (remove markdown code blocks if present)
$response = preg_replace('/```json\s*|\s*```/', '', $response);
$response = trim($response);
// Parse JSON response
$case_data = json_decode($response, true);
if (json_last_error() !== JSON_ERROR_NONE) {
$context['results']['error'] = 'AI returned invalid JSON: ' . json_last_error_msg() . "\nResponse: " . substr($response, 0, 500);
return;
}
Store extracted data
// Store extracted data
$context['results']['case_data'] = $case_data;
$context['message'] = t('Extracted case data: @case', ['@case' => $case_data['case_number'] ?? 'Unknown']);6. Creating the Drupal Node
At last of function LegalDocumentImportForm we are setting a batch process to create the node.
$batch = [
'title' => t('Creating Legal Case Entity...'),
'operations' => [
[[self::class, 'createLegalCaseNode'], [$case_data,$file]],
],
'finished' => [self::class, 'batchFinished'],
];
batch_set($batch);Batch operation: Create legal case node from extracted data
In below function, we are dynamically creating node and assigning values to respective content type fields.
public static function createLegalCaseNode($case_data,$file, array &$context) {
try {
// Validate required fields
if (empty($case_data['case_number'])) {
$context['results']['error'] = 'Missing case number in extracted data';
return;
}
// Check for duplicate case
$existing = \Drupal::entityTypeManager()->getStorage('node')
->loadByProperties([
'type' => 'legal_case',
'field_case_number' => $case_data['case_number'],
]);
if (!empty($existing)) {
$context['results']['error'] = 'Case already exists: ' . $case_data['case_number'];
return;
}
// Process entity tags and create/get taxonomy terms
$tag_tids = [];
if (!empty($case_data['entity_tags'])) {
$tag_tids = self::processEntityTags($case_data['entity_tags']);
}
// Prepare plaintiffs text
$plaintiffs_text = '';
if (!empty($case_data['plaintiffs'])) {
foreach ($case_data['plaintiffs'] as $plaintiff) {
$plaintiffs_text .= $plaintiff['name'] . ' (' . ($plaintiff['age'] ?? 'Age not specified') . ' years)' . "\n";
$plaintiffs_text .= $plaintiff['address'] . "\n\n";
}
}
// Prepare defendants text
$defendants_text = '';
if (!empty($case_data['defendants'])) {
foreach ($case_data['defendants'] as $defendant) {
$defendants_text .= $defendant['name'] . ' (' . ($defendant['age'] ?? 'Age not specified') . ' years)' . "\n";
$defendants_text .= $defendant['address'] . "\n\n";
}
}
// Create node - adjust field names according to your content type
$node = \Drupal\node\Entity\Node::create([
'type' => 'munsif_legal',
'title' => $case_data['case_title'] ?? $case_data['case_number'],
'field_case_number' => $case_data['case_number'] ?? '',
'field_case_summary' => [
'value' => $case_data['case_summary'] ?? '',
'format' => 'basic_html',
],
'field_why_win' => [
'value' => $case_data['why_win'] ?? '',
'format' => 'basic_html',
],
'field_plaintiffs_docs' => [
'value' => $case_data['plaintiffs_docs'] ?? '',
'format' => 'basic_html',
],
'field_defendants_docs' => [
'value' => $case_data['defendants_docs'] ?? '',
'format' => 'basic_html',
],
'field_court_name' => $case_data['court_name'] ?? '',
'field_court_location' => $case_data['court_location'] ?? '',
'field_judge_name' => $case_data['judge_name'] ?? '',
'field_order_date' => $case_data['order_date'] ?? '',
'field_case_type' => $case_data['case_type'] ?? '',
'field_plaintiffs' => [
'value' => $plaintiffs_text,
'format' => 'basic_html',
],
'field_defendants' => [
'value' => $defendants_text,
'format' => 'basic_html',
],
'field_entity_tags' => $tag_tids,
'field_judgment_summary' => [
'value' => $case_data['judgment_summary'] ?? '',
'format' => 'basic_html',
],
'field_judgment_favor' => $case_data['judgment_favor'] ?? '',
'field_relief_granted' => [
'value' => $case_data['relief_granted'] ?? '',
'format' => 'basic_html',
],
]);
$node->save();
//$node = \Drupal\node\Entity\Node::load($nid);
if ($node) {
$node->set('field_pdf', [
'target_id' => $file->id(),
]);
$node->save();
}
$context['results']['success'] = TRUE;
$context['results']['node_id'] = $node->id();
$context['results']['case_number'] = $case_data['case_number'];
$context['results']['case_title'] = $case_data['case_title'];
$context['message'] = t('Created case: @title', ['@title' => $case_data['case_title']]);
} catch (\Exception $e) {
$context['results']['error'] = 'Error creating node: ' . $e->getMessage();
}
}
7- Populating Taxonomy Terms
In order to populate tags fields, we are dynamically creating taxonomy terms as below if term is not present in existing taxonomy terms.
/**
* Process entity tags and return taxonomy term IDs
*/
private static function processEntityTags($tags) {
$tids = [];
$vocab = 'tags'; //
foreach ($tags as $tag_name) {
// Search for existing term
$terms = \Drupal::entityTypeManager()
->getStorage('taxonomy_term')
->loadByProperties([
'vid' => $vocab,
'name' => $tag_name,
]);
if (!empty($terms)) {
// Use existing term
$term = reset($terms);
$tids[] = ['target_id' => $term->id()];
} else {
// Create new term
$term = \Drupal\taxonomy\Entity\Term::create([
'vid' => $vocab,
'name' => $tag_name,
]);
$term->save();
$tids[] = ['target_id' => $term->id()];
}
}
return $tids;
}
For both batch operations, we have to show the final message inside below batchFinished function
/**
* Batch finished callback
*/
public static function batchFinished($success, $results, $operations) {
if (isset($results['error'])) {
\Drupal::messenger()->addError(t('Error: @error', ['@error' => $results['error']]));
return;
}
if (isset($results['success']) && $results['success']) {
$message = t('Successfully created legal case: <strong>@title</strong> (Case No: @case_no)', [
'@title' => $results['case_title'],
'@case_no' => $results['case_number'],
]);
\Drupal::messenger()->addStatus($message);
// Provide link to view the created node
$node_url = \Drupal\Core\Url::fromRoute('entity.node.canonical', ['node' => $results['node_id']]);
\Drupal::messenger()->addStatus(t('View case: <a href="@url">@url</a>', [
'@url' => $node_url->toString(),
]));
} else {
\Drupal::messenger()->addWarning(t('No legal case was created.'));
}
}
Clear the cache.
8-Batch Process and Final Output
This is the admin form we just created.

After submission it will start the batch process.

After completing all the batch process, we can see the success message as below.

You can see node has been created with all details.




In this tutorial, we built a fully automated legal document importer using Drupal’s modular architecture and Generative AI.
This same concept can be extended to process invoices, research papers, or HR documents by adjusting the AI prompt and content type fields.
Download Sample Module here