How to Intelligently Create Nodes from PDFs Using Generative AI in Drupal

by admin · Published November 15, 2025 · Updated November 15, 2025

large volumes of legal documents can be tedious in any Drupal-based system.
In this tutorial, we’ll see how to combine Drupal, PDF parsing, and Generative AI to automatically extract key details from a court order PDF and create structured content nodes — no manual entry required.

As a use case we are taking a court order PDF file from below website which is available for public access https://malappuram.dcourts.gov.in/court-orders-search-by-order-date/

Using Gen AI we are going to analyse the judgement document and extract required details so that common people can easily understand the judgment.

Overview and Use Case
Installing Required Libraries
Creating the Custom Import Form
Extracting Text from PDF
Analyzing Content with Generative AI
Creating the Drupal Node
Populating Taxonomy Terms
Batch Process and Final Output

Overview and Use Case

We are going to create a custom Drupal form that allows users to upload a judgment order PDF. When the form is submitted, Generative AI will analyse the PDF and extract the required information in JSON format, and the node will be created programmatically based on that data.

We are asking LLM to analyse this document and extract the details below.

Case number
Court Name
Court Location
Judge Name
Order date
Case Type
Judgment favour – To whom Judgement favour.
Relief Granted – what kind of relief got to the plaintiff/ defendant.
Case Summary – short simple description about the case.
Defendant’s details.
Plaintiffs details.
Why plaintiff/defendant win – Simple description why plaintiff/defendant has won the case.
Plaintiff submitted docs – summary of documents submitted by the Plaintiff.
Defendants submitted docs – summary of documents submitted by the defendants.

This is the PDF file.

KLML230000912024_1_2024-09-09 Download

The workflow looks like this:

The response JSON is decoded and used to create a Drupal content node programmatically.
User uploads a court order PDF.
PDF is parsed using Smalot PDF Parser.
The extracted text is sent to ChatGPT (via Drupal AI module) with a well-structured prompt.

2- Installing Required Libraries

Install Drupal AI module .https://www.drupal.org/project/ai Enable AI Core in module list.

Install Open AI provider module – https://www.drupal.org/project/ai_provider_openai

Go to Configuration -> System -> Keys provide Open AI key

Go to Configuration-> AI -> Provider Settings select Open AI authentication. Select the Key.

https://github.com/smalot/pdfparser install this PDF parser using below composer command.

composer require smalot/pdfparser

So our content type is we are naming as munsif_legal, where we insert data is having below fields.

Here entity tags is a reference to tags vocabulary.

3. Creating the Custom Import Form

Suppose our custom module is dn_import1.

create routing file in path – This code resides under modules/custom/dn_import1/dn_import.routing.yml

dn_import.import_form1:
 path: '/admin/content/case-import1'
 defaults:
   _form: '\Drupal\dn_import1\Form\LegalDocumentImportForm'
   _title: 'Import Case'
 requirements:
   _permission: 'administer nodes'

Inside src/Form create a custom form LegalDocumentImportForm.

This code resides under modules/custom/dn_import1/src/Form/LegalDocumentImportForm.php.

Include below classes.

use Drupal\Core\Form\FormBase;
use Drupal\Core\Form\FormStateInterface;
use Smalot\PdfParser\Parser;
use Drupal\ai\OperationType\Chat\ChatInput;
use Drupal\ai\OperationType\Chat\ChatMessage;

Create form fields

Inside buildForm function, create required fields as below. We have created PDF upload field with upload location.

public function buildForm(array $form, FormStateInterface $form_state) {
   $form['info'] = [
     '#markup' => $this->t('<p>Upload a legal court order PDF. The AI will extract case details, parties, court information, and judgment.</p>'),
   ];


   $form['pdf_file'] = [
     '#type' => 'managed_file',
     '#title' => $this->t('Upload Legal Document PDF'),
     '#description' => $this->t('Select a PDF file containing a court order or legal judgment.'),
     '#upload_location' => 'public://legal_documents/',
     '#upload_validators' => [
       'file_validate_extensions' => ['pdf'],
     ],
     '#required' => TRUE,
   ];
  
   $form['submit'] = [
     '#type' => 'submit',
     '#value' => $this->t('Process Legal Document'),
   ];
  
   return $form;
 }

Submit handler – save uploaded file

Inside submit handler, first save the uploaded file.

$fids = $form_state->getValue('pdf_file');
   if (!empty($fids[0])) {
     $file = \Drupal\file\Entity\File::load($fids[0]);
     if ($file) {
         $file->setPermanent();
         $file->save();
          $filepath = $file->getFileUri();
           $realpath = \Drupal::service('file_system')->realpath($filepath);
 }
}

4 – Extracting Text from PDF

Using below code extract texts from PDF file.

$parser = new Parser();
$pdf = $parser->parseFile($realpath);
$text = $pdf->getText();

Batch operation: Extract legal case data using AI

We use Drupal’s Batch API because AI processing and PDF parsing can take several seconds.
The batch ensures the process runs safely without PHP timeout issues.

Set the batch process to process the document.

$batch = [
             'title' => $this->t('Processing Legal Document with AI...'),
             'operations' => [
               [[get_class($this), 'extractLegalDataWithAI'], [$text,$file]],
             ],
             'finished' => [self::class, 'batchFinished'],
           ];


           batch_set($batch);

5- Analyzing Content with Generative AI

Here extractLegalDataWithAI function is a batch call back function where we are extracting data with the help of AI.

We are passing extracted text and PDF link to this function

/**
 * Extracts structured legal data from PDF using AI provider.
 */
public static function extractLegalDataWithAI($text,$file, array &$context) {
}

We are going the detail implementation of above function. First get the AI provider. Here we are using Chat gpt as provider which we configured in first step.

/**
 * Extracts structured legal data from PDF using AI provider.
 */
public static function extractLegalDataWithAI($text,$file, array &$context) {


// Get AI provider
     $sets = \Drupal::service('ai.provider')->getDefaultProviderForOperationType('chat');
     $service = \Drupal::service('ai.provider');
     $provider = $service->createInstance($sets['provider_id']);
}

Create comprehensive AI prompt for legal document extraction

$systemPrompt = 'You are an expert legal document analyzer. Extract structured information from the provided court order/judgment and return it as valid JSON.


Extract the following information:


1. **case_number**: The official case number (e.g., "Original Suit No. 226/2023")
2. **case_title**: Create a descriptive title in format: "Case between [Plaintiffs] vs [Defendants] regarding [matter]"
3. **case_summary**: A simple 2-3 sentence summary that anyone can understand, including who is involved and what the case is about
4. **court_name**: Full name of the court
5. **court_location**: Location/jurisdiction of the court
6. **judge_name**: Name of the presiding judge/magistrate
7. **order_date**: Date of the order (format: YYYY-MM-DD)
8. **case_type**: Type of case (e.g., "Civil Suit", "Criminal Case", "Injunction")
9. **plaintiffs**: Array of plaintiff details, each with:
  - name: Full name
  - age: Age if mentioned
  - address: Full address
  - relation: Relationship description (e.g., "W/o", "S/o")
10. **defendants**: Array of defendant details with same structure as plaintiffs
11. **entity_tags**: Array of entity types involved (check if any party is: "Government", "Devaswom Board", "Waqf Board", "Corporation", "Municipality", "Panchayat", "Temple", "Mosque", "Church", "Private Individual")
12. **judgment_summary**: Simple explanation of the final judgment in 1-2 sentences
13. **judgment_favor**: Who won - either "Plaintiff" or "Defendant" or "Partial" or "Dismissed"
14. **relief_granted**: What relief/remedy was granted by the court
15. **Why did the plaintiffs/defendants win?**: simple explanations what kind of documents submitted by the plaintiffs/defendants. Any normal people should understand what are the documents and reason for the win.
16. **Documents the plaintiffs submitted**: list of documents in simple sentences , then in detail list
17. **Documents the defendants submitted**: list of documents in simple sentences , then detail list


Return ONLY valid JSON in this exact format:
{
 "case_number": "Original Suit No. 226/2023",
 "case_title": "Case between Raji and Shylaja vs Mulakkal Thanka and Ramesh regarding right of way",
 "case_summary": "Two plaintiffs filed suit against defendants seeking permanent injunction to prevent obstruction of their right of way to access their property.",
 "court_name": "Court of the Munsiff-Magistrate",
 "court_location": "Ponnani",
 "judge_name": "Smt. Sowmya.T.M.",
 "order_date": "2024-07-22",
 "case_type": "Civil Suit - Injunction",
 "plaintiffs": [
   {
     "name": "Raji",
     "age": "46",
     "address": "W/o.Unniyath Valappil Mohandas, P.O.Mookkuthala, Vadakkumuri desom, Pallikkara amsom, Ponnani Taluk - 679 574",
     "relation": "Wife of Unniyath Valappil Mohandas"
   }
 ],
 "defendants": [
   {
     "name": "Mulakkal Thanka",
     "age": "77",
     "address": "W/o.Mulakkal Kesavan, P.O.Mookkuthala, Vadakkumuri desom, Pallikkara amsom, Ponnani Taluk - 679 574",
     "relation": "Wife of Mulakkal Kesavan"
   }
 ],
 "entity_tags": ["Private Individual"],
 "judgment_summary": "The court granted permanent injunction restraining defendants from obstructing plaintiffs use of the right of way.",
 "judgment_favor": "Plaintiff",
 "relief_granted": "Permanent prohibitory injunction",
 "why_win": "The plaintiffs proved that the disputed property (Plaint A schedule) legally belongs to them by producing title deeds and tax receipts.They also showed that they have a pathway (C schedule property) for access to their land.",
 "plaintiffs_docs": "Kanam Assignment deed No.4643/2013 (title deed), Tax receipt (proof of ownership/use).",
 "defendants_docs": "Kanam Assignment deed No.3160/2011 (previous deed)."


}


Do not include any explanations, markdown formatting, or text outside the JSON object.';

As you can see above, we are creating a prompt with sample case details. also providing sample response json format so that LLM can provide data in exact format.

we will get the response in $response variable as below.

$messages = new ChatInput([
       new ChatMessage('system', $systemPrompt),
       new ChatMessage('user', "Extract legal information from this court order:\n\n" . $text),
     ]);
    
     $message = $provider->chat($messages, $sets['model_id'])->getNormalized();
     $response = $message->getText();

Extract the data from response json.

// Clean response (remove markdown code blocks if present)
     $response = preg_replace('/```json\s*|\s*```/', '', $response);
     $response = trim($response);
    
     // Parse JSON response
     $case_data = json_decode($response, true);
    
     if (json_last_error() !== JSON_ERROR_NONE) {
       $context['results']['error'] = 'AI returned invalid JSON: ' . json_last_error_msg() . "\nResponse: " . substr($response, 0, 500);
       return;
     }

Store extracted data

// Store extracted data
     $context['results']['case_data'] = $case_data;
     $context['message'] = t('Extracted case data: @case', ['@case' => $case_data['case_number'] ?? 'Unknown']);

6. Creating the Drupal Node

At last of function LegalDocumentImportForm we are setting a batch process to create the node.

$batch = [
       'title' => t('Creating Legal Case Entity...'),
       'operations' => [
         [[self::class, 'createLegalCaseNode'], [$case_data,$file]],
       ],
       'finished' => [self::class, 'batchFinished'],
     ];
    
     batch_set($batch);

Batch operation: Create legal case node from extracted data

In below function, we are dynamically creating node and assigning values to respective content type fields.

public static function createLegalCaseNode($case_data,$file, array &$context) {
   try {
     // Validate required fields
     if (empty($case_data['case_number'])) {
       $context['results']['error'] = 'Missing case number in extracted data';
       return;
     }
    
     // Check for duplicate case
     $existing = \Drupal::entityTypeManager()->getStorage('node')
       ->loadByProperties([
         'type' => 'legal_case',
         'field_case_number' => $case_data['case_number'],
       ]);
    
     if (!empty($existing)) {
       $context['results']['error'] = 'Case already exists: ' . $case_data['case_number'];
       return;
     }
    
     // Process entity tags and create/get taxonomy terms
     $tag_tids = [];
     if (!empty($case_data['entity_tags'])) {
       $tag_tids = self::processEntityTags($case_data['entity_tags']);
     }
    
     // Prepare plaintiffs text
     $plaintiffs_text = '';
     if (!empty($case_data['plaintiffs'])) {
       foreach ($case_data['plaintiffs'] as $plaintiff) {
         $plaintiffs_text .= $plaintiff['name'] . ' (' . ($plaintiff['age'] ?? 'Age not specified') . ' years)' . "\n";
         $plaintiffs_text .= $plaintiff['address'] . "\n\n";
       }
     }
    
     // Prepare defendants text
     $defendants_text = '';
     if (!empty($case_data['defendants'])) {
       foreach ($case_data['defendants'] as $defendant) {
         $defendants_text .= $defendant['name'] . ' (' . ($defendant['age'] ?? 'Age not specified') . ' years)' . "\n";
         $defendants_text .= $defendant['address'] . "\n\n";
       }
     }
    
     // Create node - adjust field names according to your content type
     $node = \Drupal\node\Entity\Node::create([
       'type' => 'munsif_legal',
       'title' => $case_data['case_title'] ?? $case_data['case_number'],
       'field_case_number' => $case_data['case_number'] ?? '',
       'field_case_summary' => [
         'value' => $case_data['case_summary'] ?? '',
         'format' => 'basic_html',
       ],
       'field_why_win' => [
         'value' => $case_data['why_win'] ?? '',
         'format' => 'basic_html',
       ],
       'field_plaintiffs_docs' => [
         'value' => $case_data['plaintiffs_docs'] ?? '',
         'format' => 'basic_html',
       ],
       'field_defendants_docs' => [
         'value' => $case_data['defendants_docs'] ?? '',
         'format' => 'basic_html',
       ],
       'field_court_name' => $case_data['court_name'] ?? '',
       'field_court_location' => $case_data['court_location'] ?? '',
       'field_judge_name' => $case_data['judge_name'] ?? '',
       'field_order_date' => $case_data['order_date'] ?? '',
       'field_case_type' => $case_data['case_type'] ?? '',
       'field_plaintiffs' => [
         'value' => $plaintiffs_text,
         'format' => 'basic_html',
       ],
       'field_defendants' => [
         'value' => $defendants_text,
         'format' => 'basic_html',
       ],
       'field_entity_tags' => $tag_tids,
       'field_judgment_summary' => [
         'value' => $case_data['judgment_summary'] ?? '',
         'format' => 'basic_html',
       ],
       'field_judgment_favor' => $case_data['judgment_favor'] ?? '',
       'field_relief_granted' => [
         'value' => $case_data['relief_granted'] ?? '',
         'format' => 'basic_html',
       ],
     ]);
    
     $node->save();
     //$node = \Drupal\node\Entity\Node::load($nid);
       if ($node) {
         $node->set('field_pdf', [
           'target_id' => $file->id(),
         ]);
         $node->save();
       }
    
     $context['results']['success'] = TRUE;
     $context['results']['node_id'] = $node->id();
     $context['results']['case_number'] = $case_data['case_number'];
     $context['results']['case_title'] = $case_data['case_title'];
    
     $context['message'] = t('Created case: @title', ['@title' => $case_data['case_title']]);
    
   } catch (\Exception $e) {
     $context['results']['error'] = 'Error creating node: ' . $e->getMessage();
   }
 }

7- Populating Taxonomy Terms

In order to populate tags fields, we are dynamically creating taxonomy terms as below if term is not present in existing taxonomy terms.

/**
  * Process entity tags and return taxonomy term IDs
  */
 private static function processEntityTags($tags) {
   $tids = [];
   $vocab = 'tags'; // 
  
   foreach ($tags as $tag_name) {
     // Search for existing term
     $terms = \Drupal::entityTypeManager()
       ->getStorage('taxonomy_term')
       ->loadByProperties([
         'vid' => $vocab,
         'name' => $tag_name,
       ]);
    
     if (!empty($terms)) {
       // Use existing term
       $term = reset($terms);
       $tids[] = ['target_id' => $term->id()];
     } else {
       // Create new term
       $term = \Drupal\taxonomy\Entity\Term::create([
         'vid' => $vocab,
         'name' => $tag_name,
       ]);
       $term->save();
       $tids[] = ['target_id' => $term->id()];
     }
   }
  
   return $tids;
 }

For both batch operations, we have to show the final message inside below batchFinished function

/**
  * Batch finished callback
  */
 public static function batchFinished($success, $results, $operations) {
   if (isset($results['error'])) {
     \Drupal::messenger()->addError(t('Error: @error', ['@error' => $results['error']]));
     return;
   }
  
   if (isset($results['success']) && $results['success']) {
     $message = t('Successfully created legal case: <strong>@title</strong> (Case No: @case_no)', [
       '@title' => $results['case_title'],
       '@case_no' => $results['case_number'],
     ]);
     \Drupal::messenger()->addStatus($message);
    
     // Provide link to view the created node
     $node_url = \Drupal\Core\Url::fromRoute('entity.node.canonical', ['node' => $results['node_id']]);
     \Drupal::messenger()->addStatus(t('View case: <a href="@url">@url</a>', [
       '@url' => $node_url->toString(),
     ]));
   } else {
     \Drupal::messenger()->addWarning(t('No legal case was created.'));
   }
 }

Clear the cache.

8-Batch Process and Final Output

This is the admin form we just created.

After submission it will start the batch process.

After completing all the batch process, we can see the success message as below.

You can see node has been created with all details.

In this tutorial, we built a fully automated legal document importer using Drupal’s modular architecture and Generative AI.
This same concept can be extended to process invoices, research papers, or HR documents by adjusting the AI prompt and content type fields.

Download Sample Module here

How to Intelligently Create Nodes from PDFs Using Generative AI in Drupal

You may also like...

Get a Free Ebook

Recent Posts

Categories

How to Intelligently Create Nodes from PDFs Using Generative AI in Drupal

You may also like...

How to customize login page with custom login fields in Drupal 9 & 10

How to store temporary data in Drupal 9 and Drupal 10

Drupal Commerce Rest API – list Products and Product details

Get a Free Ebook

Recent Posts

Categories