đ Extract: PPTX Content Extraction
Powerful extraction tools to analyze and extract charts, tables, text, images, and media from PPTX presentations. Perfect for data analysis, content migration, and automation workflows.
âšī¸ Overview
Understanding PPTX extraction capabilities
What is PPTX Extraction?
Extraction allows you to analyze and extract structured data from PPTX presentations. PPTX Studio provides two powerful interfaces:
REST API
For developers to programmatically extract data
- Automated extraction workflows
- Multiple language support
- System integration ready
- Batch processing capability
PPTX Studio
Visual interface for interactive extraction
- Drag-and-drop file upload
- Interactive data preview
- One-click downloads
- No coding required
Key Capabilities
- 8 Extraction Modes: Flexible options from simple to detailed with media/embedding combinations
- Content Filtering: 8 filter types to extract specific content (charts, tables, images, text, shapes, etc.)
- Multiple Output Formats: JSON, YAML, TOML, CSV for different use cases
- Media Extraction: Optional extraction of images, videos, and embedded files
- Chart Consolidation: Combine multiple chart Excel files into single workbook
- Batch Processing: Extract multiple content types in one operation
- API Parity: All studio features available via REST API with 6 language examples
â Extraction Features
Chart Extraction
Advanced chart data extraction with color intelligence
- Chart types: Column, Pie, Line, Scatter, Stock
- Series and category data
- Color mapping and themes
- Excel format with original data
- JSON format for programmatic use
Table Extraction
Preserve formatting and structure of tables
- Cell data and formatting
- Merged cells handling
- Color and font preservation
- Excel export capability
- CSV format support
Image Extraction
Extract all visual elements from presentations
- Multiple formats: PNG, JPG, SVG
- Embedded and inserted images
- Chart images with high quality
- Shape renderings
- Batch download capability
Media Extraction
Download embedded videos, audio, and objects
- Video formats: MP4, AVI, MOV, WMV
- Audio formats: MP3, WAV, AAC
- Embedded documents: Excel, Word
- Object preservation
- Organized folder structure
Text Extraction
Get all text content organized by slide
- Slide titles and body text
- Shape and textbox content
- Formatting preservation
- Plain text or formatted output
- Slide-by-slide organization
Metadata
Extract presentation properties and statistics
- Author and creation date
- Slide count and sizes
- Master slide information
- Theme and color details
- Statistics and analytics
đ§ Extraction Modes
Choose how much data to extract based on your needs
Mode 1: Detailed
Full extraction with complete object properties, including all shapes, text, metadata, and styling information.
Best for: Complete content analysis, data migration, full reconstruction
Mode 2: Simple
Basic extraction with minimal information - just the essential data without extended properties.
Best for: Quick analysis, reducing file size, core data only
Mode 3: Images Only
Extract only images and media files from the presentation, skipping all text and shape data.
Best for: Media library creation, image extraction, thumbnail collection
Mode 4: Embeddings Only
Extract only embedded objects (OLE, linked files, external resources).
Best for: Recovering embedded files, object extraction
Mode 5: Simple + Images
Combine basic extraction with image files in a single operation.
Best for: Quick extraction with visual assets
Mode 6: Simple + Embeddings
Combine basic extraction with embedded objects.
Best for: Core data plus external resources
Mode 7: Detailed + Images
Full extraction plus all image and media files.
Best for: Full archival with visual content
Mode 8: Detailed + Embeddings
Complete extraction with all embedded objects and external resources.
Best for: Full reconstruction, complete archival
đģ API Usage Examples
Learn how to extract content from PPTX presentations using the API in your preferred programming language
Extract Presentation Examples
# Extract with detailed mode, organized by content type
curl -X POST https://powerfile.io/pptx/api/extract \
-H "Authorization: YOUR_API_TOKEN" \
-F "file=@presentation.pptx" \
-F "mode=detailed" \
-F "file_format=by_type" \
-F "extract_media=true"
# Output structure: extracted_data/
# âââ charts/
# â âââ chart_metadata.json
# â âââ chart_1.xlsx
# â âââ chart_2.xlsx
# âââ tables/
# â âââ tables_metadata.json
# â âââ table_1.xlsx
# âââ text/
# â âââ text_content.json
# â âââ slides.txt
# âââ media/
# âââ images/
# â âââ image_1.png
# â âââ image_2.jpg
# âââ videos/
# âââ video_1.mp4
import requests
import json
# Configure API endpoint and authentication
url = "https://powerfile.io/pptx/api/extract"
headers = {"Authorization": "YOUR_API_TOKEN"}
# Open and upload the PPTX file
with open("presentation.pptx", "rb") as f:
files = {"file": ("presentation.pptx", f, "application/vnd.openxmlformats-officedocument.presentationml.presentation")}
data = {
"mode": "detailed", # detailed, simple, images, etc.
"file_format": "json", # json, yaml, toml, csv
"extract_media": "true", # Extract images/videos
"consolidate_charts": "false" # Consolidate Excel files
}
response = requests.post(url, headers=headers, files=files, data=data)
if response.status_code == 200:
result = response.json()
print("â
Extraction successful!")
print(f"Output directory: {result['output_dir']}")
print(f"Charts extracted: {result['charts_count']}")
print(f"Tables extracted: {result['tables_count']}")
print(f"Media files: {result['media_count']}")
else:
print(f"â Error: {response.status_code}")
print(response.json())
const fs = require('fs');
const FormData = require('form-data');
const axios = require('axios');
// Configure API endpoint and authentication
const url = 'https://powerfile.io/pptx/api/extract';
const apiKey = 'YOUR_API_KEY';
// Create form data with file and parameters
const form = new FormData();
form.append('file', fs.createReadStream('presentation.pptx'));
form.append('mode', 'detailed'); // detailed, simple, images, etc.
form.append('file_format', 'json'); // json, yaml, toml, csv
form.append('extract_media', 'true'); // Extract images/videos
form.append('consolidate_charts', 'false'); // Consolidate Excel files
// Make the API request
axios.post(url, form, {
headers: {
...form.getHeaders(),
'Authorization': apiKey
}
}).then(response => {
console.log('â
Extraction successful!');
console.log('Output directory:', response.data.output_dir);
console.log('Charts extracted:', response.data.charts_count);
console.log('Tables extracted:', response.data.tables_count);
console.log('Media files:', response.data.media_count);
}).catch(error => {
console.error('â Error:', error.response?.data || error.message);
});
import java.io.File;
import java.io.IOException;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
// Configure API endpoint and authentication
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost uploadFile = new HttpPost("https://powerfile.io/pptx/api/extract");
uploadFile.setHeader("Authorization", "YOUR_API_TOKEN");
// Build multipart form data
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addBinaryBody(
"file",
new File("presentation.pptx"),
ContentType.create("application/vnd.openxmlformats-officedocument.presentationml.presentation"),
"presentation.pptx"
);
builder.addTextBody("mode", "detailed"); // detailed, simple, images
builder.addTextBody("file_format", "json"); // json, yaml, toml, csv
builder.addTextBody("extract_media", "true"); // Extract images/videos
builder.addTextBody("consolidate_charts", "false"); // Consolidate Excel
HttpEntity multipart = builder.build();
uploadFile.setEntity(multipart);
// Execute request
try (CloseableHttpResponse response = httpClient.execute(uploadFile)) {
String responseBody = EntityUtils.toString(response.getEntity());
JsonObject result = JsonParser.parseString(responseBody).getAsJsonObject();
if (response.getStatusLine().getStatusCode() == 200) {
System.out.println("â
Extraction successful!");
System.out.println("Output: " + result.get("output_dir").getAsString());
System.out.println("Charts: " + result.get("charts_count").getAsInt());
System.out.println("Tables: " + result.get("tables_count").getAsInt());
} else {
System.err.println("â Error: " + response.getStatusLine());
}
} catch (IOException e) {
e.printStackTrace();
}
using System;
using System.IO;
using System.Net.Http;
using System.Text.Json;
using System.Threading.Tasks;
// Configure API endpoint and authentication
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Authorization", "YOUR_API_TOKEN");
// Create multipart form content
var content = new MultipartFormDataContent();
using (var fileStream = File.OpenRead("presentation.pptx"))
{
var streamContent = new StreamContent(fileStream);
streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue(
"application/vnd.openxmlformats-officedocument.presentationml.presentation"
);
content.Add(streamContent, "file", "presentation.pptx");
content.Add(new StringContent("detailed"), "mode"); // detailed, simple, images
content.Add(new StringContent("json"), "file_format"); // json, yaml, toml, csv
content.Add(new StringContent("true"), "extract_media"); // Extract images/videos
content.Add(new StringContent("false"), "consolidate_charts"); // Consolidate Excel
// Make the API request
var response = await client.PostAsync("https://powerfile.io/pptx/api/extract", content);
var responseBody = await response.Content.ReadAsStringAsync();
if (response.IsSuccessStatusCode)
{
var result = JsonSerializer.Deserialize(responseBody);
Console.WriteLine("â
Extraction successful!");
Console.WriteLine($"Output directory: {result.GetProperty(\"output_dir\").GetString()}");
Console.WriteLine($"Charts: {result.GetProperty(\"charts_count\").GetInt32()}");
Console.WriteLine($"Tables: {result.GetProperty(\"tables_count\").GetInt32()}");
Console.WriteLine($"Media: {result.GetProperty(\"media_count\").GetInt32()}");
}
else
{
Console.WriteLine($"â Error: {response.StatusCode}");
Console.WriteLine(responseBody);
}
}
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"mime/multipart"
"net/http"
"os"
)
func main() {
// Open the PPTX file
file, err := os.Open("presentation.pptx")
if err != nil {
panic(err)
}
defer file.Close()
// Create multipart form data
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
part, _ := writer.CreateFormFile("file", "presentation.pptx")
io.Copy(part, file)
// Add form fields
writer.WriteField("mode", "detailed") // detailed, simple, images
writer.WriteField("file_format", "json") // json, yaml, toml, csv
writer.WriteField("extract_media", "true") // Extract images/videos
writer.WriteField("consolidate_charts", "false") // Consolidate Excel
writer.Close()
// Create and configure request
req, _ := http.NewRequest("POST", "https://powerfile.io/pptx/api/extract", body)
req.Header.Set("Authorization", "YOUR_API_TOKEN")
req.Header.Set("Content-Type", writer.FormDataContentType())
// Execute request
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
// Parse response
var result map[string]interface{}
json.NewDecoder(resp.Body).Decode(&result)
if resp.StatusCode == 200 {
fmt.Println("â
Extraction successful!")
fmt.Println("Output:", result["output_dir"])
fmt.Println("Charts:", result["charts_count"])
fmt.Println("Tables:", result["tables_count"])
fmt.Println("Media:", result["media_count"])
} else {
fmt.Printf("â Error: %d\n", resp.StatusCode)
}
}
đ¨ PPTX Studio Examples
Visual interface for interactive extraction
Getting Started in PPTX Studio Extraction
- Navigate to PPTX Studio Extraction Tab
- Upload PPTX file:
- Click or drag and drop a .pptx or .ppt file (Max 50MB)
- Supports single or multiple file uploads
- Select an Extraction Mode:
- Detailed - Full extraction with all properties
- Simple - Basic extraction with minimal information
- Images Only - Extract only images and media
- Embeddings - Extract only embedded objects
- Simple + Images - Basic data plus images
- Simple + Embeddings - Basic data plus embeddings
- Detailed + Images - Full data plus images
- Detailed + Embeddings - Full data plus embeddings
- Apply Content Filters (optional):
- None (All) - Extract all content types (default)
- Charts - Extract only charts and graphs
- Tables - Extract only tabular data
- Pictures - Extract only images
- Text Boxes - Extract only text boxes
- Placeholders - Extract only placeholder shapes
- Freeform - Extract only freeform shapes
- Embedded Objects - Extract only OLE objects
- Choose Output Format: JSON, YAML, TOML, or CSV (impact file structure and readability)
- Additional Options (optional):
- Extract Media - Include images, videos, and embedded files
- Consolidate Charts - Combine all chart Excel files into single workbook
- Click Extract: Process your file with selected options
- Download Results: Individual files, summary, or complete ZIP archive
- View API Examples: Switch to the API tab to see ready-to-use code in your preferred language
Studio Interface Components
- File Upload Section: Drag-and-drop or click to upload PPTX files (PPTX, PPT)
- Extraction Mode Selector: 8 mode buttons for different extraction scenarios
- Content Filter Section: Toggle buttons to filter by shape/object types
- Output Format Buttons: Choose between JSON, YAML, TOML, or CSV formats
- Additional Options: Checkboxes for media extraction and chart consolidation
- API Code Examples Tab: Live code examples in cURL, Python, JavaScript, Java, C#, and Go
- Real-time API Updates: Code examples automatically update as you modify settings
đĄ Tips & Tricks
đ For Chart Extraction
- Extraction includes series names and colors
- Chart data exports to Excel for easy analysis
- Use JSON format for programmatic processing
- Color mapping preserves theme information
đŧī¸ For Image Extraction
- Maintain original image quality
- SVG format preserves scalability
- Batch download saves time
- File names preserve original metadata
đš For Media Extraction
- Videos maintain original formats
- Separate folders for different media types
- Check file size before batch download
- Embedded objects are extracted separately
⥠Performance Tips
- Large files may take longer to process
- Use selective extraction to speed up
- Request media extraction separately if not needed
- Batch operations are more efficient
đ Next Steps
What you can do with extracted data
After Extraction, You Can:
- Analyze Data: Use extracted charts and tables for data analysis
- Generate New Presentations: Create presentations from extracted data using our Generate feature
- Archive Content: Store extracted images and media in your organization's archive
- Share Data: Export extracted data in various formats for team sharing