

Complete guide to converting documents between DOCX, PDF, TXT, ODT, RTF, and HTML. Learn the best tools, preserve formatting, and avoid common conversion issues.
Document Conversion Guide: Everything You Need to Know
Converting documents between formats (DOCX, PDF, TXT, ODT, RTF, HTML) is a daily task for millions. This comprehensive guide shows you the best methods, tools, and techniques to convert documents while preserving formatting, images, and metadata.
Common Document Formats Explained
| Format | Full Name | Best For | Editable | Universal |
|---|---|---|---|---|
| Portable Document Format | Sharing, archiving | No* | Yes | |
| DOCX | Microsoft Word Document | Editing, collaboration | Yes | Wide |
| TXT | Plain Text | Simple text, code | Yes | Yes |
| ODT | OpenDocument Text | Open-source editing | Yes | Medium |
| RTF | Rich Text Format | Cross-platform editing | Yes | Wide |
| HTML | HyperText Markup Language | Web content | Yes | Yes |
| EPUB | Electronic Publication | E-books | Limited | Medium |
| MD | Markdown | Documentation, blogs | Yes | Medium |
*PDF can be edited with specialized tools
Most Common Conversions
1. DOCX to PDF (Most Popular)
Why: Share documents without editing, preserve formatting perfectly.
Method 1: Microsoft Word (Best Quality)
1. Open DOCX in Word
2. File โ Save As
3. Format: PDF
4. Options:
โ Optimize for: Standard (best for printing)
โ Document structure tags for accessibility
5. Save
Quality: Excellent (preserves all formatting)
Method 2: Google Docs (Free, Online)
1. Upload DOCX to Google Drive
2. Right-click โ Open with โ Google Docs
3. File โ Download โ PDF Document
Quality: Very good (may change some formatting slightly)
Method 3: LibreOffice (Free, Offline)
1. Open DOCX in LibreOffice Writer
2. File โ Export as PDF
3. Settings:
- Range: All pages
- Images: Lossless compression
- Quality: Best
4. Export
Quality: Good (some advanced features may not convert perfectly)
Method 4: Command Line (Pandoc)
# Install pandoc
brew install pandoc # macOS
sudo apt install pandoc # Linux
# Convert DOCX to PDF
pandoc input.docx -o output.pdf
# With better PDF engine
pandoc input.docx --pdf-engine=xelatex -o output.pdf
Quality: Good (best for simple documents)
2. PDF to DOCX (Challenging)
Challenge: PDF doesn't store editable text structure, so conversion is never perfect.
Method 1: Adobe Acrobat Pro ($239/year)
1. Open PDF in Acrobat Pro
2. File โ Export To โ Microsoft Word โ Word Document
3. Settings:
โ Retain flowing text
โ Include comments
4. Save
Quality: Best available (80-95% accuracy depending on PDF)
Method 2: Microsoft Word (Free for Office 365 users)
1. Word โ File โ Open
2. Select PDF file
3. Word converts PDF to editable document
4. Edit as needed
5. Save as DOCX
Quality: Good (70-85% accuracy)
Limitations:
- Works best with text-heavy PDFs
- Struggles with complex layouts
- May lose some formatting
Method 3: Google Docs (Free)
1. Upload PDF to Google Drive
2. Right-click โ Open with โ Google Docs
3. Edit document
4. File โ Download โ Microsoft Word (.docx)
Quality: Fair (60-75% accuracy)
Method 4: Online Converters
Smallpdf, ILovePDF, Zamzar:
- Free (with limits)
- Decent quality
- Privacy concern (uploads sensitive docs)
Quality: Fair to Good (depends on PDF complexity)
3. DOCX to TXT (Simple Text Extraction)
Why: Remove all formatting, get plain text only.
Method 1: Text Editor
1. Open DOCX in Word
2. File โ Save As
3. Format: Plain Text (.txt)
4. Encoding: UTF-8
5. Save
Result: All formatting removed, plain text only.
Method 2: Command Line
# Using pandoc
pandoc input.docx -o output.txt
# Using textutil (macOS)
textutil -convert txt input.docx
# Using antiword (Linux)
antiword input.docx > output.txt
Method 3: Python (Automation)
from docx import Document
def docx_to_txt(docx_path, txt_path):
doc = Document(docx_path)
with open(txt_path, 'w', encoding='utf-8') as f:
for para in doc.paragraphs:
f.write(para.text + '\n')
docx_to_txt('input.docx', 'output.txt')
4. PDF to TXT (Text Extraction)
Method 1: Copy-Paste
1. Open PDF in Preview/Acrobat
2. Select all text (Cmd+A / Ctrl+A)
3. Copy (Cmd+C / Ctrl+C)
4. Paste into text editor
5. Save as TXT
Limitations: Doesn't work on scanned PDFs.
Method 2: Command Line (pdftotext)
# Install poppler-utils
brew install poppler # macOS
sudo apt install poppler-utils # Linux
# Extract text
pdftotext input.pdf output.txt
# Maintain layout
pdftotext -layout input.pdf output.txt
# Extract specific pages
pdftotext -f 1 -l 10 input.pdf output.txt
Method 3: OCR for Scanned PDFs
# Install tesseract
brew install tesseract # macOS
# Convert PDF to text with OCR
tesseract input.pdf output -l eng
5. TXT to DOCX (Add Formatting)
Method 1: Word
1. Open TXT in Word
2. Apply formatting (fonts, styles, headers)
3. Save as DOCX
Method 2: Pandoc (Markdown to DOCX)
# If your TXT uses Markdown syntax
pandoc input.md -o output.docx
# With reference document for styling
pandoc input.md --reference-doc=template.docx -o output.docx
6. DOCX to HTML (Web Publishing)
Method 1: Word
1. File โ Save As
2. Format: Web Page (.html)
3. Save
Warning: Creates bloated HTML with Microsoft-specific styles.
Method 2: Pandoc (Clean HTML)
# Convert to clean HTML
pandoc input.docx -o output.html
# With CSS styling
pandoc input.docx -c style.css -o output.html --standalone
Result: Much cleaner HTML, suitable for websites.
7. HTML to DOCX
# Using pandoc
pandoc input.html -o output.docx
# Preserve images
pandoc input.html --extract-media=./media -o output.docx
8. ODT โ DOCX (LibreOffice โ Word)
ODT to DOCX
LibreOffice:
1. File โ Save As
2. Format: Microsoft Word 2007-365 (.docx)
3. Save
DOCX to ODT
Word:
1. File โ Save As
2. Format: OpenDocument Text (.odt)
3. Save
Compatibility: Generally good, may lose some advanced features.
Batch Conversion Scripts
Convert Multiple DOCX to PDF
macOS/Linux (using LibreOffice):
#!/bin/bash
# Convert all DOCX files to PDF
for docx in *.docx; do
echo "Converting: $docx"
libreoffice --headless --convert-to pdf "$docx"
done
echo "Conversion complete!"
Windows (PowerShell):
# Convert all DOCX to PDF using Word
$word = New-Object -ComObject Word.Application
$word.Visible = $false
Get-ChildItem *.docx | ForEach-Object {
$doc = $word.Documents.Open($_.FullName)
$pdfPath = $_.FullName -replace '\.docx$', '.pdf'
$doc.SaveAs($pdfPath, 17) # 17 = wdFormatPDF
$doc.Close()
}
$word.Quit()
Python Script (Universal)
from docx2pdf import convert
import os
# Convert single file
convert("input.docx", "output.pdf")
# Batch convert directory
for filename in os.listdir('.'):
if filename.endswith('.docx'):
pdf_name = filename.replace('.docx', '.pdf')
convert(filename, pdf_name)
print(f"Converted: {filename} โ {pdf_name}")
Install requirements:
pip install docx2pdf
Preserving Formatting
What Usually Converts Well
โ
Text content - Almost always perfect
โ
Basic formatting - Bold, italic, underline
โ
Font sizes - Preserved accurately
โ
Colors - RGB colors transfer well
โ
Bullet lists - Usually correct
โ
Tables - Simple tables convert well
โ
Images - Embedded images transfer (quality may vary)
What Often Has Issues
โ ๏ธ Complex layouts - Multi-column, text boxes
โ ๏ธ Advanced tables - Merged cells, nested tables
โ ๏ธ Fonts - Custom fonts may be substituted
โ ๏ธ Comments - May be lost or moved
โ ๏ธ Track changes - Usually lost in conversion
โ ๏ธ Headers/footers - Can break in PDF to DOCX
โ ๏ธ Page breaks - May shift in conversion
Tips for Better Conversion
- Use standard fonts (Arial, Times New Roman, Calibri)
- Simplify layout before conversion
- Avoid text boxes (use tables instead)
- Embed fonts in Word (File โ Options โ Save โ Embed fonts)
- Test conversion on a sample first
- Keep backup of original file
Format-Specific Best Practices
Creating Universal PDFs
Word โ PDF Settings:
โ ISO 19005-1 compliant (PDF/A) [archival]
โ Optimize for: Standard (printing and viewing)
โ Document structure tags [accessibility]
โ Bitmap text when fonts cannot be embedded
Result: PDF works on all devices for decades.
Creating Editable DOCXs from PDFs
Best results when:
- Source PDF was originally a Word document
- Text is selectable (not scanned image)
- Simple, single-column layout
- Standard fonts used
Poor results when:
- Scanned PDF (image-based)
- Complex multi-column layout
- Heavy graphics/design elements
- Forms with fillable fields
Solution for scanned PDFs:
- OCR the PDF first (Adobe Acrobat, Tesseract)
- Then convert OCR'd PDF to DOCX
Creating Clean HTML from Word
Avoid Word's HTML export. Instead:
# Use pandoc for clean HTML
pandoc input.docx -o output.html \
--standalone \
--self-contained \
--css=style.css
Result: Semantic HTML without Microsoft bloat.
Common Problems & Solutions
Problem 1: "Converted PDF looks different"
Causes:
- Missing fonts
- Different PDF renderer
- Embedded vs outlined fonts
Solutions:
Word โ PDF Options:
โ Embed fonts
โ Use PDF/A standard
โ Check "high quality" option
Problem 2: "Can't edit converted DOCX"
Cause: PDF had complex layout or was scanned
Solutions:
- Try different converter (Adobe > Word > Google Docs)
- Use OCR if scanned
- Manually retype if necessary
- Accept imperfect conversion and fix manually
Problem 3: "Images missing after conversion"
Causes:
- Images were linked, not embedded
- Conversion tool doesn't support images
- File size limit hit
Solutions:
Word: Right-click image โ "Save Picture As" โ Re-embed
Or: File โ Options โ Advanced โ "Save pictures in compressed format"
Problem 4: "Formatting completely broken"
Cause: Complex document with incompatible features
Solution:
- Simplify document before conversion
- Remove text boxes, complex tables
- Use simpler layout
- Accept manual formatting fixes needed
Problem 5: "Converted file is huge"
Causes:
- Uncompressed images
- Embedded fonts
- Hidden metadata
Solutions:
Word:
1. Compress all images (Picture Format โ Compress Pictures)
2. Remove personal information (File โ Info โ Check for Issues)
3. Don't embed fonts unless necessary
Security Considerations
Removing Metadata
Word documents contain hidden metadata:
- Author name
- Company name
- Edit history
- Comments (even deleted ones)
- Document properties
Remove before sharing:
Word:
File โ Info โ Check for Issues โ Inspect Document
โ Comments, Revisions, Versions
โ Document Properties and Personal Information
โ Custom XML Data
โ Headers, Footers, Watermarks
Remove All
Password-Protected Documents
Word to PDF with password:
Word โ Save As โ PDF โ Options
โ Encrypt the document with a password
PDF to Word:
- Must unlock PDF first
- Adobe Acrobat: Remove Security
- Or use password when converting
Automation & Integration
Google Drive Automation
Use Apps Script to auto-convert uploads:
function convertDocxToPdf() {
var folder = DriveApp.getFolderById('FOLDER_ID');
var files = folder.getFilesByType(MimeType.MICROSOFT_WORD);
while (files.hasNext()) {
var file = files.next();
var docId = file.getId();
// Open in Docs and export as PDF
var doc = DocumentApp.openById(docId);
var pdf = DriveApp.getFileById(docId).getAs('application/pdf');
folder.createFile(pdf);
Logger.log('Converted: ' + file.getName());
}
}
Zapier Integration
Create workflow:
1. Trigger: New file in Dropbox (DOCX)
2. Action: Convert with CloudConvert
3. Action: Save PDF to Google Drive
4. Action: Send email notification
Recommended Tools Summary
| Task | Free Tool | Paid Tool |
|---|---|---|
| DOCX โ PDF | LibreOffice | Microsoft Word |
| PDF โ DOCX | Google Docs | Adobe Acrobat Pro |
| Any โ Any | Pandoc | CloudConvert |
| Batch convert | LibreOffice CLI | Adobe Acrobat Pro |
| OCR (scanned) | Tesseract | Adobe Acrobat Pro |
| API automation | Free tier (CloudConvert) | CloudConvert Pro |
Command-Line Reference
Pandoc (Universal Converter)
# DOCX to PDF
pandoc input.docx -o output.pdf
# DOCX to HTML
pandoc input.docx -o output.html --standalone
# Markdown to DOCX
pandoc input.md -o output.docx
# HTML to PDF
pandoc input.html -o output.pdf
# With table of contents
pandoc input.docx --toc -o output.pdf
# Multiple inputs
pandoc chapter1.md chapter2.md chapter3.md -o book.pdf
LibreOffice (Headless Conversion)
# DOCX to PDF
libreoffice --headless --convert-to pdf input.docx
# ODT to DOCX
libreoffice --headless --convert-to docx input.odt
# Batch convert
libreoffice --headless --convert-to pdf *.docx
# Specify output directory
libreoffice --headless --convert-to pdf --outdir ./pdfs *.docx
Conclusion & Best Practices
For everyday use:
- DOCX โ PDF: Use Microsoft Word or Google Docs
- PDF โ DOCX: Use Adobe Acrobat or Word if you have it, Google Docs for free
- Any format conversion: Use online converter for convenience
For automation:
- Small projects: Use Pandoc (free, powerful)
- Large enterprises: Use Adobe Acrobat Pro API or CloudConvert API
To preserve quality:
- Always keep original files
- Test conversion on sample first
- Embed fonts when sharing
- Use standard fonts (Arial, Times) for compatibility
- Simplify complex layouts before converting
- Remove sensitive metadata before sharing
File size optimization:
- Compress images before embedding
- Don't embed unnecessary fonts
- Use PDF/A for archival (smaller)
- Remove hidden metadata
Need to convert documents? Use our free document converter supporting DOCX, PDF, TXT, ODT, RTF, HTML and more. Fast, secure, and preserves formatting!
About the Author

1CONVERTER Technical Team
Official TeamFile Format Specialists
Our technical team specializes in file format technologies and conversion algorithms. With combined expertise spanning document processing, media encoding, and archive formats, we ensure accurate and efficient conversions across 243+ supported formats.
๐ฌ Get More Tips & Guides
Join 10,000+ readers who get our weekly newsletter with file conversion tips, tricks, and exclusive tutorials.
๐ We respect your privacy. Unsubscribe at any time. No spam, ever.
