Executive Summary: SharePoint Classifiers & Extractors
SharePoint document classifiers automatically identify and label incoming files based on their content. Once trained with sample documents, they recognize specific document types—such as contracts, invoices, or case files—without requiring users to manually tag anything.
Extractors then pull structured data from those classified documents. By highlighting examples during setup, you teach the model to capture key information like dates, names, IDs, or terms. The extracted values populate SharePoint columns automatically, making metadata consistent, searchable, and ready for workflows.
Together, classifiers and extractors transform unstructured documents into organized, actionable data across your libraries, powering better search, compliance, and automation through SharePoint Premium.
1. Classifiers identify document types automatically
Classifiers are trained models that recognize specific kinds of documents as they enter a SharePoint library. Examples:
- Contract renewals
- Legal opinions
- Case briefs
- Invoices Once trained, the classifier tags each incoming file with the correct document type, removing the need for users to manually choose categories.
2. Extractors pull structured information out of documents
After a classifier identifies a document type, extractors take over to pull out specific pieces of information — turning unstructured text into usable metadata. Examples of extractable fields:
- Service start date
- Case number
- Jurisdiction
- Parties involved
- Keywords
Each extractor corresponds to one “entity” you want to capture. You highlight examples in training files, and the model learns to find that information in future documents.
🧩 How Extractors Are Created (Simplified)
- Name the extractor You choose a meaningful name (e.g., “Service Start Date”).
- Choose or create a column Extracted data is stored in a SharePoint column. You can reuse existing columns or create new ones.
- Label examples You highlight the relevant text in sample documents so the model learns what to extract.
- Train and apply Once trained, the extractor automatically fills metadata for every new document of that type.
⚙️ How This Fits Into SharePoint Premium (formerly Syntex)
SharePoint Premium’s content processing features allow you to:
- Build classifiers to detect document types
- Build extractors to pull structured data
- Apply models to libraries so metadata is generated automatically
- Use extracted metadata in Power Apps, Power Automate, Dataverse, and search
This works across more than 20 file formats, including Word, PDF, and PowerPoint.
🚀 Why This Matters
- No more manual tagging — users just upload files.
- Metadata becomes consistent and complete, improving search, compliance, and automation.
- AI handles complex documents, not just simple text fields.
- Integrates with the Power Platform, enabling advanced workflows.

