Sr. Design Strategist / Consultant, AWS WWCO ProServe, Gen AI Specialty Practice
AWS Internal Project — Intended for multiple customers
To develop a custom document ingestion solution for customers with minimal knowledge of model training, allowing them to easily and efficiently make document data searchable within a NoSQL database.
Until now, customers have had to rely on manual data entry or invest in costly, custom-built solutions and data extraction pipelines to handle document-based data as part of their essential daily workflows. The introduction of the Clerical Foundation Model (FM), a specialized FM designed for document tasks, revolutionizes this process. Clerical FM enables automation of document data extraction through simple prompts, allowing users to define document-specific tasks in natural language. This model supports direct classification of document types, extraction of entities (like dates, names, product IDs, tables, images, and charts), and facilitates inquiries akin to many existing Document QnA GenAI applications. Questions such as "Is this document signed?" or "How much did overhead costs increase last year?" can now be answered through a straightforward API call, bypassing the need for OCR and complex cloud solutions.
The images to the right showcase three screens from an early iteration tailored specifically for the iPad, developed just before the introduction of the Claude 3 model. Previously, our approach involved techniques akin to Amazon SageMaker’s GroundTruth for capturing OCR data. Users would train the model by marking key text areas with bounding boxes, which allowed the model to intelligently ingest and process the visual information into a database with remarkable accuracy. Once trained, the model could autonomously continue data ingestion, significantly reducing the time and manpower typically required for such tasks.
Clerical FM is designed for document-centric tasks, pre-trained on hundreds of millions of documents to perform data entry with high accuracy and minimal additional training. For more specialized tasks, it can be fine-tuned via the Bedrock API, allowing customization to specific needs. This model simplifies document automation, eliminating traditional methods like OCR and complex pipelines, enabling rapid testing and scalable task automation through its APIs.
An exciting opportunity emerged when a colleague sought my expertise in document processing to harness the advanced capabilities of Anthropic's Claude 3 model. This model has transformed document handling with improved accuracy, speed, and cost efficiency, unlocking new solution possibilities. Traditionally, using machine learning required a specialized skill set, but my goal was to simplify the process. I designed an intuitive interface to streamline prompt generation, making it accessible to non-technical users. These wireframes not only enhanced the user experience but also generated significant interest by seamlessly integrating complex machine learning into everyday tasks.
By leveraging a query builder-style interface, I've designed a system that enables users to focus on their core ideas while the system generates prompts in the background, refining them with proper grammar and contextual nuances. The interface allows intuitive sentiment construction using drag-and-drop functionality, making it easy to arrange elements. Users can also save prompts as templates for future use, streamlining the process. This feature set simplifies complex interactions and boosts productivity, making prompt generation more efficient and accessible.
The integration of Clerical AI with Anthropic's Claude 3 model marks a significant advancement in intelligent document processing. This powerful combination simplifies complex tasks by automating data extraction and improving accuracy with minimal user input. Clerical AI’s query builder interface, featuring drag-and-drop functionality and reusable templates, exemplifies a user-centric approach that enhances productivity. By drastically reducing processing time and overhead, this innovation transforms document management, positioning Clerical AI powered by Claude 3 as a game-changing solution that sets new industry standards.