KhemeiaTM automates the entire transformation process and delivers much more than that. Using Artificial Intelligence techniques, KhemeiaTM systematically extracts and semantically tags meta-data, it structures and hierarchically organizes information, generates Table of Contents and converts them to XML-based outputs – all in real-time.

KhemeiaTM creates structured content from Paper and PDF, Word, ASCII, OCR (Optical Character Recognition), RTF, Excel, CSV, SGML, QuarkExpress, Adobe InDesign and HTML.

Khemeia’s 4 step transformation process.

Detection of content elements in a class of documents as defined in the customer DTD (Document Type Definition) or XML Schema,for example: Section titles, Numbers, Header, Paragraphs, Hyperlinks, Tables, Graphics.

Content elements extracted are semantically tagged – Section titles (court name), Header (case name), Numbers (page numbers), Paragraphs (alinea), Tables (Evidence List).

This involves: Splitting the document into relevant modules, Create the hierarchy, Format bullet lists, Generate Table of Contents, Create linkages.

Output types XML, PDF, HTML, DITA, JPEG, XMP, NITF, NewsML, S1000D, Customer-specific.

Advantages of KhemeiaTM


Enables the deployment of effective search algorithms based on semantic tags, automatically extracted from within documents


Enables users to reuse the content to attain maximum benefit


Ensures interoperability of heterogeneous documents


Processes 20 pages of documents in 5 minutes, compared to 36 hours done manually