📊 Extract: PPTX Content Extraction

Powerful extraction tools to analyze and extract charts, tables, text, images, and media from PPTX presentations. Perfect for data analysis, content migration, and automation workflows.

â„šī¸ Overview

Understanding PPTX extraction capabilities

What is PPTX Extraction?

Extraction allows you to analyze and extract structured data from PPTX presentations. PPTX Studio provides two powerful interfaces:

REST API

For developers to programmatically extract data

  • Automated extraction workflows
  • Multiple language support
  • System integration ready
  • Batch processing capability
PPTX Studio

Visual interface for interactive extraction

  • Drag-and-drop file upload
  • Interactive data preview
  • One-click downloads
  • No coding required
Key Capabilities
  • 8 Extraction Modes: Flexible options from simple to detailed with media/embedding combinations
  • Content Filtering: 8 filter types to extract specific content (charts, tables, images, text, shapes, etc.)
  • Multiple Output Formats: JSON, YAML, TOML, CSV for different use cases
  • Media Extraction: Optional extraction of images, videos, and embedded files
  • Chart Consolidation: Combine multiple chart Excel files into single workbook
  • Batch Processing: Extract multiple content types in one operation
  • API Parity: All studio features available via REST API with 6 language examples

⭐ Extraction Features

📈

Chart Extraction

Advanced chart data extraction with color intelligence

  • Chart types: Column, Pie, Line, Scatter, Stock
  • Series and category data
  • Color mapping and themes
  • Excel format with original data
  • JSON format for programmatic use
📋

Table Extraction

Preserve formatting and structure of tables

  • Cell data and formatting
  • Merged cells handling
  • Color and font preservation
  • Excel export capability
  • CSV format support
đŸ–ŧī¸

Image Extraction

Extract all visual elements from presentations

  • Multiple formats: PNG, JPG, SVG
  • Embedded and inserted images
  • Chart images with high quality
  • Shape renderings
  • Batch download capability
📹

Media Extraction

Download embedded videos, audio, and objects

  • Video formats: MP4, AVI, MOV, WMV
  • Audio formats: MP3, WAV, AAC
  • Embedded documents: Excel, Word
  • Object preservation
  • Organized folder structure
📄

Text Extraction

Get all text content organized by slide

  • Slide titles and body text
  • Shape and textbox content
  • Formatting preservation
  • Plain text or formatted output
  • Slide-by-slide organization
📊

Metadata

Extract presentation properties and statistics

  • Author and creation date
  • Slide count and sizes
  • Master slide information
  • Theme and color details
  • Statistics and analytics

🔧 Extraction Modes

Choose how much data to extract based on your needs

Mode 1: Detailed

Full extraction with complete object properties, including all shapes, text, metadata, and styling information.

Output: Complete structured data with all properties
Best for: Complete content analysis, data migration, full reconstruction
Mode 2: Simple

Basic extraction with minimal information - just the essential data without extended properties.

Output: Essential data only
Best for: Quick analysis, reducing file size, core data only
Mode 3: Images Only

Extract only images and media files from the presentation, skipping all text and shape data.

Output: Images and media metadata
Best for: Media library creation, image extraction, thumbnail collection
Mode 4: Embeddings Only

Extract only embedded objects (OLE, linked files, external resources).

Output: Embedded objects only
Best for: Recovering embedded files, object extraction
Mode 5: Simple + Images

Combine basic extraction with image files in a single operation.

Output: Essential data + media files
Best for: Quick extraction with visual assets
Mode 6: Simple + Embeddings

Combine basic extraction with embedded objects.

Output: Essential data + embedded objects
Best for: Core data plus external resources
Mode 7: Detailed + Images

Full extraction plus all image and media files.

Output: Complete data + all media files
Best for: Full archival with visual content
Mode 8: Detailed + Embeddings

Complete extraction with all embedded objects and external resources.

Output: Complete data + all embedded objects
Best for: Full reconstruction, complete archival

đŸ’ģ API Usage Examples

Learn how to extract content from PPTX presentations using the API in your preferred programming language

Extract Presentation Examples

# Extract with detailed mode, organized by content type
curl -X POST https://powerfile.io/pptx/api/extract \
  -H "Authorization: YOUR_API_TOKEN" \
  -F "file=@presentation.pptx" \
  -F "mode=detailed" \
  -F "file_format=by_type" \
  -F "extract_media=true"

# Output structure: extracted_data/
#   ├── charts/
#   │   ├── chart_metadata.json
#   │   ├── chart_1.xlsx
#   │   └── chart_2.xlsx
#   ├── tables/
#   │   ├── tables_metadata.json
#   │   └── table_1.xlsx
#   ├── text/
#   │   ├── text_content.json
#   │   └── slides.txt
#   └── media/
#       ├── images/
#       │   ├── image_1.png
#       │   └── image_2.jpg
#       └── videos/
#           └── video_1.mp4
import requests
import json

# Configure API endpoint and authentication
url = "https://powerfile.io/pptx/api/extract"
headers = {"Authorization": "YOUR_API_TOKEN"}

# Open and upload the PPTX file
with open("presentation.pptx", "rb") as f:
    files = {"file": ("presentation.pptx", f, "application/vnd.openxmlformats-officedocument.presentationml.presentation")}
    data = {
        "mode": "detailed",              # detailed, simple, images, etc.
        "file_format": "json",           # json, yaml, toml, csv
        "extract_media": "true",         # Extract images/videos
        "consolidate_charts": "false"   # Consolidate Excel files
    }
    
    response = requests.post(url, headers=headers, files=files, data=data)
    
    if response.status_code == 200:
        result = response.json()
        print("✅ Extraction successful!")
        print(f"Output directory: {result['output_dir']}")
        print(f"Charts extracted: {result['charts_count']}")
        print(f"Tables extracted: {result['tables_count']}")
        print(f"Media files: {result['media_count']}")
    else:
        print(f"❌ Error: {response.status_code}")
        print(response.json())
const fs = require('fs');
const FormData = require('form-data');
const axios = require('axios');

// Configure API endpoint and authentication
const url = 'https://powerfile.io/pptx/api/extract';
const apiKey = 'YOUR_API_KEY';

// Create form data with file and parameters
const form = new FormData();
form.append('file', fs.createReadStream('presentation.pptx'));
form.append('mode', 'detailed');              // detailed, simple, images, etc.
form.append('file_format', 'json');           // json, yaml, toml, csv
form.append('extract_media', 'true');         // Extract images/videos
form.append('consolidate_charts', 'false');   // Consolidate Excel files

// Make the API request
axios.post(url, form, {
  headers: {
    ...form.getHeaders(),
    'Authorization': apiKey
  }
}).then(response => {
  console.log('✅ Extraction successful!');
  console.log('Output directory:', response.data.output_dir);
  console.log('Charts extracted:', response.data.charts_count);
  console.log('Tables extracted:', response.data.tables_count);
  console.log('Media files:', response.data.media_count);
}).catch(error => {
  console.error('❌ Error:', error.response?.data || error.message);
});
import java.io.File;
import java.io.IOException;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;

// Configure API endpoint and authentication
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost uploadFile = new HttpPost("https://powerfile.io/pptx/api/extract");
uploadFile.setHeader("Authorization", "YOUR_API_TOKEN");

// Build multipart form data
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addBinaryBody(
    "file", 
    new File("presentation.pptx"),
    ContentType.create("application/vnd.openxmlformats-officedocument.presentationml.presentation"),
    "presentation.pptx"
);
builder.addTextBody("mode", "detailed");              // detailed, simple, images
builder.addTextBody("file_format", "json");          // json, yaml, toml, csv
builder.addTextBody("extract_media", "true");        // Extract images/videos
builder.addTextBody("consolidate_charts", "false");  // Consolidate Excel

HttpEntity multipart = builder.build();
uploadFile.setEntity(multipart);

// Execute request
try (CloseableHttpResponse response = httpClient.execute(uploadFile)) {
    String responseBody = EntityUtils.toString(response.getEntity());
    JsonObject result = JsonParser.parseString(responseBody).getAsJsonObject();
    
    if (response.getStatusLine().getStatusCode() == 200) {
        System.out.println("✅ Extraction successful!");
        System.out.println("Output: " + result.get("output_dir").getAsString());
        System.out.println("Charts: " + result.get("charts_count").getAsInt());
        System.out.println("Tables: " + result.get("tables_count").getAsInt());
    } else {
        System.err.println("❌ Error: " + response.getStatusLine());
    }
} catch (IOException e) {
    e.printStackTrace();
}
using System;
using System.IO;
using System.Net.Http;
using System.Text.Json;
using System.Threading.Tasks;

// Configure API endpoint and authentication
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Authorization", "YOUR_API_TOKEN");

// Create multipart form content
var content = new MultipartFormDataContent();
using (var fileStream = File.OpenRead("presentation.pptx"))
{
    var streamContent = new StreamContent(fileStream);
    streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue(
        "application/vnd.openxmlformats-officedocument.presentationml.presentation"
    );
    content.Add(streamContent, "file", "presentation.pptx");
    content.Add(new StringContent("detailed"), "mode");               // detailed, simple, images
    content.Add(new StringContent("json"), "file_format");            // json, yaml, toml, csv
    content.Add(new StringContent("true"), "extract_media");          // Extract images/videos
    content.Add(new StringContent("false"), "consolidate_charts");   // Consolidate Excel

    // Make the API request
    var response = await client.PostAsync("https://powerfile.io/pptx/api/extract", content);
    var responseBody = await response.Content.ReadAsStringAsync();
    
    if (response.IsSuccessStatusCode)
    {
        var result = JsonSerializer.Deserialize(responseBody);
        Console.WriteLine("✅ Extraction successful!");
        Console.WriteLine($"Output directory: {result.GetProperty(\"output_dir\").GetString()}");
        Console.WriteLine($"Charts: {result.GetProperty(\"charts_count\").GetInt32()}");
        Console.WriteLine($"Tables: {result.GetProperty(\"tables_count\").GetInt32()}");
        Console.WriteLine($"Media: {result.GetProperty(\"media_count\").GetInt32()}");
    }
    else
    {
        Console.WriteLine($"❌ Error: {response.StatusCode}");
        Console.WriteLine(responseBody);
    }
}
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"mime/multipart"
	"net/http"
	"os"
)

func main() {
	// Open the PPTX file
	file, err := os.Open("presentation.pptx")
	if err != nil {
		panic(err)
	}
	defer file.Close()

	// Create multipart form data
	body := &bytes.Buffer{}
	writer := multipart.NewWriter(body)
	part, _ := writer.CreateFormFile("file", "presentation.pptx")
	io.Copy(part, file)
	
	// Add form fields
	writer.WriteField("mode", "detailed")              // detailed, simple, images
	writer.WriteField("file_format", "json")          // json, yaml, toml, csv
	writer.WriteField("extract_media", "true")        // Extract images/videos
	writer.WriteField("consolidate_charts", "false")  // Consolidate Excel
	writer.Close()

	// Create and configure request
	req, _ := http.NewRequest("POST", "https://powerfile.io/pptx/api/extract", body)
	req.Header.Set("Authorization", "YOUR_API_TOKEN")
	req.Header.Set("Content-Type", writer.FormDataContentType())

	// Execute request
	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()

	// Parse response
	var result map[string]interface{}
	json.NewDecoder(resp.Body).Decode(&result)
	
	if resp.StatusCode == 200 {
		fmt.Println("✅ Extraction successful!")
		fmt.Println("Output:", result["output_dir"])
		fmt.Println("Charts:", result["charts_count"])
		fmt.Println("Tables:", result["tables_count"])
		fmt.Println("Media:", result["media_count"])
	} else {
		fmt.Printf("❌ Error: %d\n", resp.StatusCode)
	}
}

🎨 PPTX Studio Examples

Visual interface for interactive extraction

Getting Started in PPTX Studio Extraction
  1. Navigate to PPTX Studio Extraction Tab
  2. Upload PPTX file:
    • Click or drag and drop a .pptx or .ppt file (Max 50MB)
    • Supports single or multiple file uploads
  3. Select an Extraction Mode:
    • Detailed - Full extraction with all properties
    • Simple - Basic extraction with minimal information
    • Images Only - Extract only images and media
    • Embeddings - Extract only embedded objects
    • Simple + Images - Basic data plus images
    • Simple + Embeddings - Basic data plus embeddings
    • Detailed + Images - Full data plus images
    • Detailed + Embeddings - Full data plus embeddings
  4. Apply Content Filters (optional):
    • None (All) - Extract all content types (default)
    • Charts - Extract only charts and graphs
    • Tables - Extract only tabular data
    • Pictures - Extract only images
    • Text Boxes - Extract only text boxes
    • Placeholders - Extract only placeholder shapes
    • Freeform - Extract only freeform shapes
    • Embedded Objects - Extract only OLE objects
  5. Choose Output Format: JSON, YAML, TOML, or CSV (impact file structure and readability)
  6. Additional Options (optional):
    • Extract Media - Include images, videos, and embedded files
    • Consolidate Charts - Combine all chart Excel files into single workbook
  7. Click Extract: Process your file with selected options
  8. Download Results: Individual files, summary, or complete ZIP archive
  9. View API Examples: Switch to the API tab to see ready-to-use code in your preferred language
Studio Interface Components
  • File Upload Section: Drag-and-drop or click to upload PPTX files (PPTX, PPT)
  • Extraction Mode Selector: 8 mode buttons for different extraction scenarios
  • Content Filter Section: Toggle buttons to filter by shape/object types
  • Output Format Buttons: Choose between JSON, YAML, TOML, or CSV formats
  • Additional Options: Checkboxes for media extraction and chart consolidation
  • API Code Examples Tab: Live code examples in cURL, Python, JavaScript, Java, C#, and Go
  • Real-time API Updates: Code examples automatically update as you modify settings

💡 Tips & Tricks

📈 For Chart Extraction
  • Extraction includes series names and colors
  • Chart data exports to Excel for easy analysis
  • Use JSON format for programmatic processing
  • Color mapping preserves theme information
đŸ–ŧī¸ For Image Extraction
  • Maintain original image quality
  • SVG format preserves scalability
  • Batch download saves time
  • File names preserve original metadata
📹 For Media Extraction
  • Videos maintain original formats
  • Separate folders for different media types
  • Check file size before batch download
  • Embedded objects are extracted separately
⚡ Performance Tips
  • Large files may take longer to process
  • Use selective extraction to speed up
  • Request media extraction separately if not needed
  • Batch operations are more efficient

🚀 Next Steps

What you can do with extracted data

After Extraction, You Can:
  • Analyze Data: Use extracted charts and tables for data analysis
  • Generate New Presentations: Create presentations from extracted data using our Generate feature
  • Archive Content: Store extracted images and media in your organization's archive
  • Share Data: Export extracted data in various formats for team sharing
Learn More