PowerFile Web Portal

ℹ️ Overview

Understanding PPTX extraction capabilities

What is PPTX Extraction?

Extraction allows you to analyze and extract structured data from PPTX presentations. PPTX Studio provides two powerful interfaces:

REST API

For developers to programmatically extract data

Automated extraction workflows
Multiple language support
System integration ready
Batch processing capability

PPTX Studio

Visual interface for interactive extraction

Drag-and-drop file upload
Interactive data preview
One-click downloads
No coding required

Key Capabilities

8 Extraction Modes: Flexible options from simple to detailed with media/embedding combinations
Content Filtering: 8 filter types to extract specific content (charts, tables, images, text, shapes, etc.)
Multiple Output Formats: JSON, YAML, TOML, CSV for different use cases
Media Extraction: Optional extraction of images, videos, and embedded files
Chart Consolidation: Combine multiple chart Excel files into single workbook
Batch Processing: Extract multiple content types in one operation
API Parity: All studio features available via REST API with 6 language examples

⭐ Extraction Features

📈

Chart Extraction

Advanced chart data extraction with color intelligence

Chart types: Column, Pie, Line, Scatter, Stock
Series and category data
Color mapping and themes
Excel format with original data
JSON format for programmatic use

📋

Table Extraction

Preserve formatting and structure of tables

Cell data and formatting
Merged cells handling
Color and font preservation
Excel export capability
CSV format support

🖼️

Image Extraction

Extract all visual elements from presentations

Multiple formats: PNG, JPG, SVG
Embedded and inserted images
Chart images with high quality
Shape renderings
Batch download capability

📹

Media Extraction

Download embedded videos, audio, and objects

Video formats: MP4, AVI, MOV, WMV
Audio formats: MP3, WAV, AAC
Embedded documents: Excel, Word
Object preservation
Organized folder structure

📄

Text Extraction

Get all text content organized by slide

Slide titles and body text
Shape and textbox content
Formatting preservation
Plain text or formatted output
Slide-by-slide organization

📊

Metadata

Extract presentation properties and statistics

Author and creation date
Slide count and sizes
Master slide information
Theme and color details
Statistics and analytics

🔧 Extraction Modes

Choose how much data to extract based on your needs

Mode 1: Detailed

Full extraction with complete object properties, including all shapes, text, metadata, and styling information.

Output: Complete structured data with all properties
Best for: Complete content analysis, data migration, full reconstruction

Mode 2: Simple

Basic extraction with minimal information - just the essential data without extended properties.

Output: Essential data only
Best for: Quick analysis, reducing file size, core data only

Mode 3: Images Only

Extract only images and media files from the presentation, skipping all text and shape data.

Output: Images and media metadata
Best for: Media library creation, image extraction, thumbnail collection

Mode 4: Embeddings Only

Extract only embedded objects (OLE, linked files, external resources).

Output: Embedded objects only
Best for: Recovering embedded files, object extraction

Mode 5: Simple + Images

Combine basic extraction with image files in a single operation.

Output: Essential data + media files
Best for: Quick extraction with visual assets

Mode 6: Simple + Embeddings

Combine basic extraction with embedded objects.

Output: Essential data + embedded objects
Best for: Core data plus external resources

Mode 7: Detailed + Images

Full extraction plus all image and media files.

Output: Complete data + all media files
Best for: Full archival with visual content

Mode 8: Detailed + Embeddings

Complete extraction with all embedded objects and external resources.

Output: Complete data + all embedded objects
Best for: Full reconstruction, complete archival

💻 API Usage Examples

Learn how to extract content from PPTX presentations using the API in your preferred programming language

Extract Presentation Examples

# Extract with detailed mode, organized by content type
curl -X POST https://powerfile.io/pptx/api/extract \
  -H "Authorization: YOUR_API_TOKEN" \
  -F "file=@presentation.pptx" \
  -F "mode=detailed" \
  -F "file_format=by_type" \
  -F "extract_media=true"

# Output structure: extracted_data/
#   ├── charts/
#   │   ├── chart_metadata.json
#   │   ├── chart_1.xlsx
#   │   └── chart_2.xlsx
#   ├── tables/
#   │   ├── tables_metadata.json
#   │   └── table_1.xlsx
#   ├── text/
#   │   ├── text_content.json
#   │   └── slides.txt
#   └── media/
#       ├── images/
#       │   ├── image_1.png
#       │   └── image_2.jpg
#       └── videos/
#           └── video_1.mp4

import requests
import json

# Configure API endpoint and authentication
url = "https://powerfile.io/pptx/api/extract"
headers = {"Authorization": "YOUR_API_TOKEN"}

# Open and upload the PPTX file
with open("presentation.pptx", "rb") as f:
    files = {"file": ("presentation.pptx", f, "application/vnd.openxmlformats-officedocument.presentationml.presentation")}
    data = {
        "mode": "detailed",              # detailed, simple, images, etc.
        "file_format": "json",           # json, yaml, toml, csv
        "extract_media": "true",         # Extract images/videos
        "consolidate_charts": "false"   # Consolidate Excel files
    }
    
    response = requests.post(url, headers=headers, files=files, data=data)
    
    if response.status_code == 200:
        result = response.json()
        print("✅ Extraction successful!")
        print(f"Output directory: {result['output_dir']}")
        print(f"Charts extracted: {result['charts_count']}")
        print(f"Tables extracted: {result['tables_count']}")
        print(f"Media files: {result['media_count']}")
    else:
        print(f"❌ Error: {response.status_code}")
        print(response.json())

const fs = require('fs');
const FormData = require('form-data');
const axios = require('axios');

// Configure API endpoint and authentication
const url = 'https://powerfile.io/pptx/api/extract';
const apiKey = 'YOUR_API_KEY';

// Create form data with file and parameters
const form = new FormData();
form.append('file', fs.createReadStream('presentation.pptx'));
form.append('mode', 'detailed');              // detailed, simple, images, etc.
form.append('file_format', 'json');           // json, yaml, toml, csv
form.append('extract_media', 'true');         // Extract images/videos
form.append('consolidate_charts', 'false');   // Consolidate Excel files

// Make the API request
axios.post(url, form, {
  headers: {
    ...form.getHeaders(),
    'Authorization': apiKey
  }
}).then(response => {
  console.log('✅ Extraction successful!');
  console.log('Output directory:', response.data.output_dir);
  console.log('Charts extracted:', response.data.charts_count);
  console.log('Tables extracted:', response.data.tables_count);
  console.log('Media files:', response.data.media_count);
}).catch(error => {
  console.error('❌ Error:', error.response?.data || error.message);
});

import java.io.File;
import java.io.IOException;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;

// Configure API endpoint and authentication
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost uploadFile = new HttpPost("https://powerfile.io/pptx/api/extract");
uploadFile.setHeader("Authorization", "YOUR_API_TOKEN");

// Build multipart form data
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addBinaryBody(
    "file", 
    new File("presentation.pptx"),
    ContentType.create("application/vnd.openxmlformats-officedocument.presentationml.presentation"),
    "presentation.pptx"
);
builder.addTextBody("mode", "detailed");              // detailed, simple, images
builder.addTextBody("file_format", "json");          // json, yaml, toml, csv
builder.addTextBody("extract_media", "true");        // Extract images/videos
builder.addTextBody("consolidate_charts", "false");  // Consolidate Excel

HttpEntity multipart = builder.build();
uploadFile.setEntity(multipart);

// Execute request
try (CloseableHttpResponse response = httpClient.execute(uploadFile)) {
    String responseBody = EntityUtils.toString(response.getEntity());
    JsonObject result = JsonParser.parseString(responseBody).getAsJsonObject();
    
    if (response.getStatusLine().getStatusCode() == 200) {
        System.out.println("✅ Extraction successful!");
        System.out.println("Output: " + result.get("output_dir").getAsString());
        System.out.println("Charts: " + result.get("charts_count").getAsInt());
        System.out.println("Tables: " + result.get("tables_count").getAsInt());
    } else {
        System.err.println("❌ Error: " + response.getStatusLine());
    }
} catch (IOException e) {
    e.printStackTrace();
}

using System;
using System.IO;
using System.Net.Http;
using System.Text.Json;
using System.Threading.Tasks;

// Configure API endpoint and authentication
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Authorization", "YOUR_API_TOKEN");

// Create multipart form content
var content = new MultipartFormDataContent();
using (var fileStream = File.OpenRead("presentation.pptx"))
{
    var streamContent = new StreamContent(fileStream);
    streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue(
        "application/vnd.openxmlformats-officedocument.presentationml.presentation"
    );
    content.Add(streamContent, "file", "presentation.pptx");
    content.Add(new StringContent("detailed"), "mode");               // detailed, simple, images
    content.Add(new StringContent("json"), "file_format");            // json, yaml, toml, csv
    content.Add(new StringContent("true"), "extract_media");          // Extract images/videos
    content.Add(new StringContent("false"), "consolidate_charts");   // Consolidate Excel

    // Make the API request
    var response = await client.PostAsync("https://powerfile.io/pptx/api/extract", content);
    var responseBody = await response.Content.ReadAsStringAsync();
    
    if (response.IsSuccessStatusCode)
    {
        var result = JsonSerializer.Deserialize(responseBody);
        Console.WriteLine("✅ Extraction successful!");
        Console.WriteLine($"Output directory: {result.GetProperty(\"output_dir\").GetString()}");
        Console.WriteLine($"Charts: {result.GetProperty(\"charts_count\").GetInt32()}");
        Console.WriteLine($"Tables: {result.GetProperty(\"tables_count\").GetInt32()}");
        Console.WriteLine($"Media: {result.GetProperty(\"media_count\").GetInt32()}");
    }
    else
    {
        Console.WriteLine($"❌ Error: {response.StatusCode}");
        Console.WriteLine(responseBody);
    }
}

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"mime/multipart"
	"net/http"
	"os"
)

func main() {
	// Open the PPTX file
	file, err := os.Open("presentation.pptx")
	if err != nil {
		panic(err)
	}
	defer file.Close()

	// Create multipart form data
	body := &bytes.Buffer{}
	writer := multipart.NewWriter(body)
	part, _ := writer.CreateFormFile("file", "presentation.pptx")
	io.Copy(part, file)
	
	// Add form fields
	writer.WriteField("mode", "detailed")              // detailed, simple, images
	writer.WriteField("file_format", "json")          // json, yaml, toml, csv
	writer.WriteField("extract_media", "true")        // Extract images/videos
	writer.WriteField("consolidate_charts", "false")  // Consolidate Excel
	writer.Close()

	// Create and configure request
	req, _ := http.NewRequest("POST", "https://powerfile.io/pptx/api/extract", body)
	req.Header.Set("Authorization", "YOUR_API_TOKEN")
	req.Header.Set("Content-Type", writer.FormDataContentType())

	// Execute request
	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()

	// Parse response
	var result map[string]interface{}
	json.NewDecoder(resp.Body).Decode(&result)
	
	if resp.StatusCode == 200 {
		fmt.Println("✅ Extraction successful!")
		fmt.Println("Output:", result["output_dir"])
		fmt.Println("Charts:", result["charts_count"])
		fmt.Println("Tables:", result["tables_count"])
		fmt.Println("Media:", result["media_count"])
	} else {
		fmt.Printf("❌ Error: %d\n", resp.StatusCode)
	}
}

🎨 PPTX Studio Examples