From 3964e72269e781405634fb25c0c8a386b1ba4e97 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81sgeir=20Thor=20Johnson?= Date: Thu, 2 Oct 2025 08:12:47 +0000 Subject: [PATCH] Create pptx.md --- Anthropic/pptx.md | 410 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 410 insertions(+) create mode 100644 Anthropic/pptx.md diff --git a/Anthropic/pptx.md b/Anthropic/pptx.md new file mode 100644 index 0000000..f51b7af --- /dev/null +++ b/Anthropic/pptx.md @@ -0,0 +1,410 @@ +# PowerPoint Suite (/mnt/skills/public/pptx/SKILL.md) + +--- +name: PowerPoint Suite +description: Presentation creation, editing, and analysis. +when_to_use: "When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks" +version: 0.0.3 +--- + +# PPTX creation, editing, and analysis + +## Overview + +A user may ask you to create, edit, or analyze the contents of a .pptx file. A .pptx file is essentially a ZIP archive containing XML files and other resources that you can read or edit. You have different tools and workflows available for different tasks. + +## Reading and analyzing content + +### Text extraction +If you just need to read the text contents of a presentation, you should convert the document to markdown: +```bash +# Convert document to markdown +python -m markitdown path-to-file.pptx +``` + +### Raw XML access +You need raw XML access for: comments, speaker notes, slide layouts, animations, design elements, and complex formatting. For any of these features, you'll need to unpack a presentation and read its raw XML contents. + +#### Unpacking a file +`python ooxml/scripts/unpack.py ` + +**Note**: The unpack.py script is located at `skills/pptx/ooxml/scripts/unpack.py` relative to the project root. If the script doesn't exist at this path, use `find . -name "unpack.py"` to locate it. + +#### Key file structures +* `ppt/presentation.xml` - Main presentation metadata and slide references +* `ppt/slides/slide{N}.xml` - Individual slide contents (slide1.xml, slide2.xml, etc.) +* `ppt/notesSlides/notesSlide{N}.xml` - Speaker notes for each slide +* `ppt/comments/modernComment_*.xml` - Comments for specific slides +* `ppt/slideLayouts/` - Layout templates for slides +* `ppt/slideMasters/` - Master slide templates +* `ppt/theme/` - Theme and styling information +* `ppt/media/` - Images and other media files + +#### Typography and color extraction +**When given an example design to emulate**: Always analyze the presentation's typography and colors first using the methods below: +1. **Read theme file**: Check `ppt/theme/theme1.xml` for colors (``) and fonts (``) +2. **Sample slide content**: Examine `ppt/slides/slide1.xml` for actual font usage (``) and colors +3. **Search for patterns**: Use grep to find color (``, ``) and font references across all XML files + +## Creating a new PowerPoint presentation **without a template** + +When creating a new PowerPoint presentation from scratch, use the **html2pptx** workflow to convert HTML slides to PowerPoint with accurate positioning. + +### Design Principles + +**CRITICAL**: Before creating any presentation, analyze the content and choose appropriate design elements: +1. **Consider the subject matter**: What is this presentation about? What tone, industry, or mood does it suggest? +2. **Check for branding**: If the user mentions a company/organization, consider their brand colors and identity +3. **Match palette to content**: Select colors that reflect the subject +4. **State your approach**: Explain your design choices before writing code + +**Requirements**: +- ✅ State your content-informed design approach BEFORE writing code +- ✅ Use web-safe fonts only: Arial, Helvetica, Times New Roman, Georgia, Courier New, Verdana, Tahoma, Trebuchet MS, Impact +- ✅ Create clear visual hierarchy through size, weight, and color +- ✅ Ensure readability: strong contrast, appropriately sized text, clean alignment +- ✅ Be consistent: repeat patterns, spacing, and visual language across slides + +#### Color Palette Selection + +**Choosing colors creatively**: +- **Think beyond defaults**: What colors genuinely match this specific topic? Avoid autopilot choices. +- **Consider multiple angles**: Topic, industry, mood, energy level, target audience, brand identity (if mentioned) +- **Be adventurous**: Try unexpected combinations - a healthcare presentation doesn't have to be green, finance doesn't have to be navy +- **Build your palette**: Pick 3-5 colors that work together (dominant colors + supporting tones + accent) +- **Ensure contrast**: Text must be clearly readable on backgrounds + +**Example color palettes** (use these to spark creativity - choose one, adapt it, or create your own): + +1. **Classic Blue**: Deep navy (#1C2833), slate gray (#2E4053), silver (#AAB7B8), off-white (#F4F6F6) +2. **Teal & Coral**: Teal (#5EA8A7), deep teal (#277884), coral (#FE4447), white (#FFFFFF) +3. **Bold Red**: Red (#C0392B), bright red (#E74C3C), orange (#F39C12), yellow (#F1C40F), green (#2ECC71) +4. **Warm Blush**: Mauve (#A49393), blush (#EED6D3), rose (#E8B4B8), cream (#FAF7F2) +5. **Burgundy Luxury**: Burgundy (#5D1D2E), crimson (#951233), rust (#C15937), gold (#997929) +6. **Deep Purple & Emerald**: Purple (#B165FB), dark blue (#181B24), emerald (#40695B), white (#FFFFFF) +7. **Cream & Forest Green**: Cream (#FFE1C7), forest green (#40695B), white (#FCFCFC) +8. **Pink & Purple**: Pink (#F8275B), coral (#FF574A), rose (#FF737D), purple (#3D2F68) +9. **Lime & Plum**: Lime (#C5DE82), plum (#7C3A5F), coral (#FD8C6E), blue-gray (#98ACB5) +10. **Black & Gold**: Gold (#BF9A4A), black (#000000), cream (#F4F6F6) +11. **Sage & Terracotta**: Sage (#87A96B), terracotta (#E07A5F), cream (#F4F1DE), charcoal (#2C2C2C) +12. **Charcoal & Red**: Charcoal (#292929), red (#E33737), light gray (#CCCBCB) +13. **Vibrant Orange**: Orange (#F96D00), light gray (#F2F2F2), charcoal (#222831) +14. **Forest Green**: Black (#191A19), green (#4E9F3D), dark green (#1E5128), white (#FFFFFF) +15. **Retro Rainbow**: Purple (#722880), pink (#D72D51), orange (#EB5C18), amber (#F08800), gold (#DEB600) +16. **Vintage Earthy**: Mustard (#E3B448), sage (#CBD18F), forest green (#3A6B35), cream (#F4F1DE) +17. **Coastal Rose**: Old rose (#AD7670), beaver (#B49886), eggshell (#F3ECDC), ash gray (#BFD5BE) +18. **Orange & Turquoise**: Light orange (#FC993E), grayish turquoise (#667C6F), white (#FCFCFC) + +#### Visual Details Options + +**Geometric Patterns**: +- Diagonal section dividers instead of horizontal +- Asymmetric column widths (30/70, 40/60, 25/75) +- Rotated text headers at 90° or 270° +- Circular/hexagonal frames for images +- Triangular accent shapes in corners +- Overlapping shapes for depth + +**Border & Frame Treatments**: +- Thick single-color borders (10-20pt) on one side only +- Double-line borders with contrasting colors +- Corner brackets instead of full frames +- L-shaped borders (top+left or bottom+right) +- Underline accents beneath headers (3-5pt thick) + +**Typography Treatments**: +- Extreme size contrast (72pt headlines vs 11pt body) +- All-caps headers with wide letter spacing +- Numbered sections in oversized display type +- Monospace (Courier New) for data/stats/technical content +- Condensed fonts (Arial Narrow) for dense information +- Outlined text for emphasis + +**Chart & Data Styling**: +- Monochrome charts with single accent color for key data +- Horizontal bar charts instead of vertical +- Dot plots instead of bar charts +- Minimal gridlines or none at all +- Data labels directly on elements (no legends) +- Oversized numbers for key metrics + +**Layout Innovations**: +- Full-bleed images with text overlays +- Sidebar column (20-30% width) for navigation/context +- Modular grid systems (3×3, 4×4 blocks) +- Z-pattern or F-pattern content flow +- Floating text boxes over colored shapes +- Magazine-style multi-column layouts + +**Background Treatments**: +- Solid color blocks occupying 40-60% of slide +- Gradient fills (vertical or diagonal only) +- Split backgrounds (two colors, diagonal or vertical) +- Edge-to-edge color bands +- Negative space as a design element + +### Layout Tips +**When creating slides with charts or tables:** +- **Two-column layout (PREFERRED)**: Use a header spanning the full width, then two columns below - text/bullets in one column and the featured content in the other. This provides better balance and makes charts/tables more readable. Use flexbox with unequal column widths (e.g., 40%/60% split) to optimize space for each content type. +- **Full-slide layout**: Let the featured content (chart/table) take up the entire slide for maximum impact and readability +- **NEVER vertically stack**: Do not place charts/tables below text in a single column - this causes poor readability and layout issues + +### Workflow +1. **MANDATORY - READ ENTIRE FILE**: Read [`html2pptx.md`](html2pptx.md) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with presentation creation. +2. Create an HTML file for each slide with proper dimensions (e.g., 720pt × 405pt for 16:9) + - Use `

`, `

`-`

`, `
    `, `
      ` for all text content + - Use `class="placeholder"` for areas where charts/tables will be added (render with gray background for visibility) + - **CRITICAL**: Rasterize gradients and icons as PNG images FIRST using Sharp, then reference in HTML + - **LAYOUT**: For slides with charts/tables/images, use either full-slide layout or two-column layout for better readability +3. Create and run a JavaScript file using the [`html2pptx.js`](scripts/html2pptx.js) library to convert HTML slides to PowerPoint and save the presentation + - Use the `html2pptx()` function to process each HTML file + - Add charts and tables to placeholder areas using PptxGenJS API + - Save the presentation using `pptx.writeFile()` +4. **Visual validation**: Generate thumbnails and inspect for layout issues + - Create thumbnail grid: `python scripts/thumbnail.py output.pptx workspace/thumbnails --cols 4` + - Read and carefully examine the thumbnail image for: + * Text overflow or truncation + * Misaligned elements + * Incorrect colors or fonts + * Missing content + * Layout problems + - If issues found, diagnose and fix before proceeding + +## Creating a new PowerPoint presentation **from a template** + +When given a PowerPoint template, you can create a new presentation by replacing the text content in the template slides. + +### Workflow + +1. **Unpack the template**: Extract the template's XML structure +```bash + python ooxml/scripts/unpack.py template.pptx unpacked_template +``` + +2. **Read the presentation structure**: Read `unpacked_template/ppt/presentation.xml` to understand the overall structure and slide references + +3. **Examine template slides**: Check the first few slide XML files to understand the structure +```bash + # View slide structure + python -c "from lxml import etree; tree = etree.parse('unpacked_template/ppt/slides/slide1.xml'); print(etree.tostring(tree, pretty_print=True, encoding='unicode'))" +``` + +4. **Copy template to working file**: Make a copy of the template for editing +```bash + cp template.pptx working.pptx +``` + +5. **Generate text shape inventory**: +```bash + python scripts/inventory.py working.pptx > template-inventory.json +``` + + The inventory provides a structured view of ALL text shapes in the presentation: +```json + { + "slide-0": { + "shape-0": { + "shape_id": "2", + "shape_name": "Title 1", + "placeholder_type": "TITLE", + "text_content": "Original title text here...", + "default_font_size": 44.0, + "default_font_name": "Calibri Light" + }, + "shape-1": { + "shape_id": "3", + "shape_name": "Content Placeholder 2", + "placeholder_type": "BODY", + "text_content": "Original content text...", + "default_font_size": 18.0 + } + }, + "slide-1": { + ... + } + } +``` + + **Understanding the inventory**: + - Each slide is identified as "slide-N" (zero-indexed) + - Each text shape within a slide is identified as "shape-N" (zero-indexed by occurrence) + - `placeholder_type` indicates the shape's role: TITLE, BODY, SUBTITLE, etc. + - `text_content` shows the current text (useful for identifying which shape to replace) + - `default_font_size` and `default_font_name` show the shape's default formatting + +6. **Create replacement text JSON**: Based on the inventory, create a JSON file specifying which shapes to update with new text + - **IMPORTANT**: Reference shapes using the slide and shape identifiers from the inventory (e.g., "slide-0", "shape-1") + - **CRITICAL**: Each shape's "paragraphs" field must contain **properly formatted paragraph objects**, not plain text strings + - Each paragraph object can include: + - `text`: The actual text content (required) + - `alignment`: Text alignment (e.g., "CENTER", "LEFT", "RIGHT") + - `bold`: Boolean for bold text + - `italic`: Boolean for italic text + - `bullet`: Boolean to enable bullet points (when true, `level` is also required) + - `level`: Integer for bullet indent level (0 = no indent, 1 = first level, etc.) + - `font_size`: Float for custom font size + - `font_name`: String for custom font name + - `color`: String for RGB color (e.g., "FF0000" for red) + - `theme_color`: String for theme-based color (e.g., "DARK_1", "ACCENT_1") + - **IMPORTANT**: When bullet: true, do NOT include bullet symbols (•, -, *) in text - they're added automatically + - **ESSENTIAL FORMATTING RULES**: + - Headers/titles should typically have `"bold": true` + - List items should have `"bullet": true, "level": 0` (level is required when bullet is true) + - Preserve any alignment properties (e.g., `"alignment": "CENTER"` for centered text) + - Include font properties when different from default (e.g., `"font_size": 14.0`, `"font_name": "Lora"`) + - Colors: Use `"color": "FF0000"` for RGB or `"theme_color": "DARK_1"` for theme colors + - The replacement script expects **properly formatted paragraphs**, not just text strings + - **Overlapping shapes**: Prefer shapes with larger default_font_size or more appropriate placeholder_type + - Save the updated inventory with replacements to `replacement-text.json` + - **WARNING**: Different template layouts have different shape counts - always check the actual inventory before creating replacements + + Example paragraphs field showing proper formatting: +```json + "paragraphs": [ + { + "text": "New presentation title text", + "alignment": "CENTER", + "bold": true + }, + { + "text": "Section Header", + "bold": true + }, + { + "text": "First bullet point without bullet symbol", + "bullet": true, + "level": 0 + }, + { + "text": "Red colored text", + "color": "FF0000" + }, + { + "text": "Theme colored text", + "theme_color": "DARK_1" + }, + { + "text": "Regular paragraph text without special formatting" + } + ] +``` + + **Shapes not listed in the replacement JSON are automatically cleared**: +```json + { + "slide-0": { + "shape-0": { + "paragraphs": [...] // This shape gets new text + } + // shape-1 and shape-2 from inventory will be cleared automatically + } + } +``` + + **Common formatting patterns for presentations**: + - Title slides: Bold text, sometimes centered + - Section headers within slides: Bold text + - Bullet lists: Each item needs `"bullet": true, "level": 0` + - Body text: Usually no special properties needed + - Quotes: May have special alignment or font properties + +7. **Apply replacements using the `replace.py` script** +```bash + python scripts/replace.py working.pptx replacement-text.json output.pptx +``` + + The script will: + - First extract the inventory of ALL text shapes using functions from inventory.py + - Validate that all shapes in the replacement JSON exist in the inventory + - Clear text from ALL shapes identified in the inventory + - Apply new text only to shapes with "paragraphs" defined in the replacement JSON + - Preserve formatting by applying paragraph properties from the JSON + - Handle bullets, alignment, font properties, and colors automatically + - Save the updated presentation + + Example validation errors: +``` + ERROR: Invalid shapes in replacement JSON: + - Shape 'shape-99' not found on 'slide-0'. Available shapes: shape-0, shape-1, shape-4 + - Slide 'slide-999' not found in inventory +``` +``` + ERROR: Replacement text made overflow worse in these shapes: + - slide-0/shape-2: overflow worsened by 1.25" (was 0.00", now 1.25") +``` + +## Creating Thumbnail Grids + +To create visual thumbnail grids of PowerPoint slides for quick analysis and reference: +```bash +python scripts/thumbnail.py template.pptx [output_prefix] +``` + +**Features**: +- Creates: `thumbnails.jpg` (or `thumbnails-1.jpg`, `thumbnails-2.jpg`, etc. for large decks) +- Default: 5 columns, max 30 slides per grid (5×6) +- Custom prefix: `python scripts/thumbnail.py template.pptx my-grid` + - Note: The output prefix should include the path if you want output in a specific directory (e.g., `workspace/my-grid`) +- Adjust columns: `--cols 4` (range: 3-6, affects slides per grid) +- Grid limits: 3 cols = 12 slides/grid, 4 cols = 20, 5 cols = 30, 6 cols = 42 +- Slides are zero-indexed (Slide 0, Slide 1, etc.) + +**Use cases**: +- Template analysis: Quickly understand slide layouts and design patterns +- Content review: Visual overview of entire presentation +- Navigation reference: Find specific slides by their visual appearance +- Quality check: Verify all slides are properly formatted + +**Examples**: +```bash +# Basic usage +python scripts/thumbnail.py presentation.pptx + +# Combine options: custom name, columns +python scripts/thumbnail.py template.pptx analysis --cols 4 +``` + +## Converting Slides to Images + +To visually analyze PowerPoint slides, convert them to images using a two-step process: + +1. **Convert PPTX to PDF**: +```bash + soffice --headless --convert-to pdf template.pptx +``` + +2. **Convert PDF pages to JPEG images**: +```bash + pdftoppm -jpeg -r 150 template.pdf slide +``` + This creates files like `slide-1.jpg`, `slide-2.jpg`, etc. + +Options: +- `-r 150`: Sets resolution to 150 DPI (adjust for quality/size balance) +- `-jpeg`: Output JPEG format (use `-png` for PNG if preferred) +- `-f N`: First page to convert (e.g., `-f 2` starts from page 2) +- `-l N`: Last page to convert (e.g., `-l 5` stops at page 5) +- `slide`: Prefix for output files + +Example for specific range: +```bash +pdftoppm -jpeg -r 150 -f 2 -l 5 template.pdf slide # Converts only pages 2-5 +``` + +## Code Style Guidelines +**IMPORTANT**: When generating code for PPTX operations: +- Write concise code +- Avoid verbose variable names and redundant operations +- Avoid unnecessary print statements + +## Dependencies + +Required dependencies (should already be installed): + +- **markitdown**: `pip install "markitdown[pptx]"` (for text extraction from presentations) +- **pptxgenjs**: `npm install -g pptxgenjs` (for creating presentations via html2pptx) +- **playwright**: `npm install -g playwright` (for HTML rendering in html2pptx) +- **react-icons**: `npm install -g react-icons react react-dom` (for icons) +- **sharp**: `npm install -g sharp` (for SVG rasterization and image processing) +- **LibreOffice**: `sudo apt-get install libreoffice` (for PDF conversion) +- **Poppler**: `sudo apt-get install poppler-utils` (for pdftoppm to convert PDF to images)