Forum Discussion
Swami_Nawale
Feb 19, 2025Copper Contributor
Issues with Microsoft Syntex Document Processing Model: Incomplete Extraction for Multi-Page PDFs
I'm facing several challenges with the Microsoft Syntex document processing model, particularly when dealing with multi-page PDFs and large tables. I'd appreciate any insights or suggestions from fellow users or Microsoft experts who may have encountered similar issues. Below are the specific problems I'm experiencing:
Unsupported PDF Formats & Multiple Tables on a Single Page: Some PDFs that I try to process seem to be in unsupported formats. In addition, pages containing multiple tables often result in extraction errors or incomplete data. Has anyone else encountered this with complex table layouts in PDFs, and what approaches have you used to resolve it?
Data Extraction from Multi-Page PDFs: When processing PDFs longer than two pages (e.g., six-page PDFs), the model often extracts data correctly from only the first two or three pages. The remaining pages, particularly those with tables spanning multiple pages, are either incomplete or entirely missing. Additionally, large tables (100+ rows) in multi-page PDFs tend to result in inaccurate extraction. Are there any best practices for handling these multi-page table scenarios?
Automatic Processing Issues: Sometimes, the Syntex model doesn't process files automatically. I have to manually select the file and click "Classify" to trigger processing. Is this a known issue, or is there something I might be missing in my setup?
Model Publishing Delays: After publishing changes to the model, it often takes an extended period (up to 30 minutes or more) for the new model to start processing files. In some cases, the files aren't processed at all. Has anyone experienced similar delays after publishing a model, and what could be causing this?
Low Confidence Scores for Multi-Page PDFs: When processing multi-page PDFs with tables, the model returns low confidence scores (below 60%). What steps can I take to improve these accuracy scores, particularly for documents with complex table structures?
- DivyaAkulaBrass Contributor
We notice in general there is delay in model extraction and we have raised it with Microsoft , what is the methos you are using for extraction?
- Swami_NawaleCopper Contributor
Thanks for you reply. I am using Structured document processing method