Regular Expressions Cookbook. 2nd Edition - Helion
ISBN: 978-14-493-2748-4
stron: 612, Format: ebook
Data wydania: 2012-08-13
Księgarnia: Helion
Cena książki: 186,15 zł (poprzednio: 216,45 zł)
Oszczędzasz: 14% (-30,30 zł)
Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook provides everything you need to solve a wide range of real-world problems. Novices will learn basic skills and tools, and programmers and experienced users will find a wealth of detail. Each recipe provides samples you can use right away.
This revised edition covers the regular expression flavors used by C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. You’ll learn powerful new tricks, avoid flavor-specific gotchas, and save valuable time with this huge library of practical solutions.
- Learn regular expressions basics through a detailed tutorial
- Use code listings to implement regular expressions with your language of choice
- Understand how regular expressions differ from language to language
- Handle common user input with recipes for validation and formatting
- Find and manipulate words, special characters, and lines of text
- Detect integers, floating-point numbers, and other numerical formats
- Parse source code and process log files
- Use regular expressions in URLs, paths, and IP addresses
- Manipulate HTML, XML, and data exchange formats
- Discover little-known regular expression tricks and techniques
Osoby które kupowały "Regular Expressions Cookbook. 2nd Edition", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Regular Expressions Cookbook. Detailed Solutions in Eight Programming Languages. 2nd Edition eBook -- spis treści
- Regular Expressions Cookbook
- SPECIAL OFFER: Upgrade this ebook with OReilly
- Preface
- Caught in the Snarls of Different Versions
- Intended Audience
- Technology Covered
- Organization of This Book
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
- 1. Introduction to Regular Expressions
- Regular Expressions Defined
- Many Flavors of Regular Expressions
- Regex Flavors Covered by This Book
- Search and Replace with Regular Expressions
- Many Flavors of Replacement Text
- Tools for Working with Regular Expressions
- RegexBuddy
- RegexPal
- RegexMagic
- More Online Regex Testers
- RegexPlanet
- regex.larsolavtorvik.com
- Nregex
- Rubular
- myregexp.com
- More Desktop Regular Expression Testers
- Expresso
- The Regulator
- SDL Regex Fuzzer
- grep
- PowerGREP
- Windows Grep
- RegexRenamer
- Popular Text Editors
- Regular Expressions Defined
- 2. Basic Regular Expression Skills
- 2.1. Match Literal Text
- Problem
- Solution
- Discussion
- Variations
- Block escape
- Case-insensitive matching
- See Also
- 2.2. Match Nonprintable Characters
- Problem
- Solution
- Discussion
- Variations on Representations of Nonprinting Characters
- The 26 control characters
- The 7-bit character set
- See Also
- 2.3. Match One of Many Characters
- Problem
- Solution
- Calendar with misspellings
- Hexadecimal character
- Nonhexadecimal character
- Discussion
- Variations
- Shorthands
- Case insensitivity
- Flavor-Specific Features
- .NET character class subtraction
- Java character class union, intersection, and subtraction
- See Also
- 2.4. Match Any Character
- Problem
- Solution
- Any character except line breaks
- Any character including line breaks
- Discussion
- Any character except line breaks
- Any character including line breaks
- Dot abuse
- Variations
- See Also
- 2.5. Match Something at the Start and/or the End of a Line
- Problem
- Solution
- Start of the subject
- End of the subject
- Start of a line
- End of a line
- Discussion
- Anchors and lines
- Start of the subject
- End of the subject
- Start of a line
- End of a line
- Zero-length matches
- Variations
- See Also
- 2.6. Match Whole Words
- Problem
- Solution
- Word boundaries
- Nonboundaries
- Discussion
- Word boundaries
- Nonboundaries
- Word Characters
- See Also
- 2.7. Unicode Code Points, Categories, Blocks, and Scripts
- Problem
- Solution
- Unicode code point
- Unicode category
- Unicode block
- Unicode script
- Unicode grapheme
- Discussion
- Unicode code point
- Unicode category
- Unicode block
- Unicode script
- Unicode grapheme
- Variations
- Negated variant
- Character classes
- Listing all characters
- See Also
- 2.8. Match One of Several Alternatives
- Problem
- Solution
- Discussion
- See Also
- 2.9. Group and Capture Parts of the Match
- Problem
- Solution
- Discussion
- Variations
- Noncapturing groups
- Group with mode modifiers
- See Also
- 2.10. Match Previously Matched Text Again
- Problem
- Solution
- Discussion
- See Also
- 2.11. Capture and Name Parts of the Match
- Problem
- Solution
- Named capture
- Named backreferences
- Discussion
- Named capture
- Named backreferences
- Groups with the same name
- See Also
- 2.12. Repeat Part of the Regex a Certain Number of Times
- Problem
- Solution
- Googol
- Hexadecimal number
- Hexadecimal number with optional suffix
- Floating-point number
- Discussion
- Fixed repetition
- Variable repetition
- Infinite repetition
- Making something optional
- Repeating groups
- See Also
- 2.13. Choose Minimal or Maximal Repetition
- Problem
- Solution
- Discussion
- See Also
- 2.14. Eliminate Needless Backtracking
- Problem
- Solution
- Discussion
- See Also
- 2.15. Prevent Runaway Repetition
- Problem
- Solution
- Discussion
- Variations
- See Also
- 2.16. Test for a Match Without Adding It to the Overall Match
- Problem
- Solution
- Discussion
- Lookaround
- Negative lookaround
- Different levels of lookbehind
- Matching the same text twice
- Lookaround is atomic
- Alternative to Lookbehind
- Solution Without Lookbehind
- See Also
- 2.17. Match One of Two Alternatives Based on a Condition
- Problem
- Solution
- Discussion
- See Also
- 2.18. Add Comments to a Regular Expression
- Problem
- Solution
- Discussion
- Free-spacing mode
- Java has free-spacing character classes
- Variations
- 2.19. Insert Literal Text into the Replacement Text
- Problem
- Solution
- Discussion
- When and how to escape characters in replacement text
- .NET and JavaScript
- Java
- PHP
- Perl
- Python and Ruby
- More escape rules for string literals
- See Also
- 2.20. Insert the Regex Match into the Replacement Text
- Problem
- Solution
- Regular expression
- Replacement
- Discussion
- See Also
- 2.21. Insert Part of the Regex Match into the Replacement Text
- Problem
- Solution
- Regular expression
- Replacement
- Discussion
- Replacements using capturing groups
- $10 and higher
- References to nonexistent groups
- Solution Using Named Capture
- Regular expression
- Replacement
- Flavors that support named capture
- See Also
- 2.22. Insert Match Context into the Replacement Text
- Problem
- Solution
- Discussion
- See Also
- 2.1. Match Literal Text
- 3. Programming with Regular Expressions
- Programming Languages and Regex Flavors
- Languages Covered in This Chapter
- More Programming Languages
- 3.1. Literal Regular Expressions in Source Code
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Discussion
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.2. Import the Regular Expression Library
- Problem
- Solution
- C#
- VB.NET
- XRegExp
- Java
- Python
- Discussion
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- 3.3. Create Regular Expression Objects
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Compiling a Regular Expression Down to CIL
- C#
- VB.NET
- Discussion
- See Also
- 3.4. Set Regular Expression Options
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Additional Language-Specific Options
- .NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.5. Test If a Match Can Be Found Within a Subject String
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- C# and VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.6. Test Whether a Regex Matches the Subject String Entirely
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- C# and VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.7. Retrieve the Matched Text
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.8. Determine the Position and Length of the Match
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.9. Retrieve Part of the Matched Text
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Named Capture
- C#
- VB.NET
- Java
- XRegExp
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.10. Retrieve a List of All Matches
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.11. Iterate over All Matches
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.12. Validate Matches in Procedural Code
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Discussion
- See Also
- 3.13. Find a Match Within Another Match
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Discussion
- See Also
- 3.14. Replace All Matches
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.15. Replace Matches Reusing Parts of the Match
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Named Capture
- C#
- VB.NET
- Java 7
- XRegExp
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.16. Replace Matches with Replacements Generated in Code
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.17. Replace All Matches Within the Matches of Another Regex
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- See Also
- 3.18. Replace All Matches Between the Matches of Another Regex
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- Perl and Ruby
- Python
- See Also
- 3.19. Split a String
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Discussion
- C# and VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.20. Split a String, Keeping the Regex Matches
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- Discussion
- .NET
- Java
- JavaScript
- XRegExp
- PHP
- Perl
- Python
- Ruby
- See Also
- 3.21. Search Line by Line
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- PHP
- Perl
- Python
- Ruby
- Discussion
- See Also
- Construct a Parser
- Problem
- Solution
- C#
- VB.NET
- Java
- JavaScript
- XRegExp
- Perl
- Python
- PHP
- Ruby
- Discussion
- See Also
- Programming Languages and Regex Flavors
- 4. Validation and Formatting
- 4.1. Validate Email Addresses
- Problem
- Solution
- Simple
- Simple, with restrictions on characters
- Simple, with all valid local part characters
- No leading, trailing, or consecutive dots
- Top-level domain has two to six letters
- Discussion
- About email addresses
- Regular expression syntax
- Building a regex step-by-step
- Variations
- See Also
- 4.2. Validate and Format North American Phone Numbers
- Problem
- Solution
- Regular expression
- Replacement
- C# example
- JavaScript example
- Other programming languages
- Discussion
- Variations
- Eliminate invalid phone numbers
- Find phone numbers in documents
- Allow a leading 1
- Allow seven-digit phone numbers
- See Also
- 4.3. Validate International Phone Numbers
- Problem
- Solution
- Regular expression
- JavaScript example
- Discussion
- Variations
- Validate international phone numbers in EPP format
- See Also
- 4.4. Validate Traditional Date Formats
- Problem
- Solution
- Discussion
- Variations
- See Also
- 4.5. Validate Traditional Date Formats, Excluding Invalid Dates
- Problem
- Solution
- C#
- Perl
- Pure regular expression
- Discussion
- Regex with procedural code
- Pure regular expression
- Variations
- See Also
- 4.6. Validate Traditional Time Formats
- Problem
- Solution
- Discussion
- Variations
- See Also
- 4.7. Validate ISO 8601 Dates and Times
- Problem
- Solution
- Dates
- Weeks
- Times
- Date and time
- XML Schema dates and times
- Discussion
- See Also
- 4.8. Limit Input to Alphanumeric Characters
- Problem
- Solution
- Regular expression
- Ruby example
- Discussion
- Variations
- Limit input to ASCII characters
- Limit input to ASCII noncontrol characters and line breaks
- Limit input to shared ISO-8859-1 and Windows-1252 characters
- Limit input to alphanumeric characters in any language
- See Also
- 4.9. Limit the Length of Text
- Problem
- Solution
- Regular expression
- Perl example
- Discussion
- Variations
- Limit the length of an arbitrary pattern
- Limit the number of nonwhitespace characters
- Limit the number of words
- See Also
- 4.10. Limit the Number of Lines in Text
- Problem
- Solution
- Regular expression
- PHP (PCRE) example
- Discussion
- Variations
- Working with esoteric line separators
- See Also
- 4.11. Validate Affirmative Responses
- Problem
- Solution
- Regular expression
- JavaScript example
- Discussion
- See Also
- 4.12. Validate Social Security Numbers
- Problem
- Solution
- Regular expression
- Python example
- Discussion
- Variations
- Find Social Security numbers in documents
- See Also
- 4.13. Validate ISBNs
- Problem
- Solution
- Regular expressions
- JavaScript example, with checksum validation
- Python example, with checksum validation
- Discussion
- ISBN-10 checksum
- ISBN-13 checksum
- Variations
- Find ISBNs in documents
- Eliminate incorrect ISBN identifiers
- See Also
- 4.14. Validate ZIP Codes
- Problem
- Solution
- Regular expression
- VB.NET example
- Discussion
- See Also
- 4.15. Validate Canadian Postal Codes
- Problem
- Solution
- Discussion
- See Also
- 4.16. Validate U.K. Postcodes
- Problem
- Solution
- Discussion
- See Also
- 4.17. Find Addresses with Post Office Boxes
- Problem
- Solution
- Regular expression
- C# example
- Discussion
- See Also
- 4.18. Reformat Names From FirstName LastName to LastName, FirstName
- Problem
- Solution
- Regular expression
- Replacement
- JavaScript example
- Discussion
- Variations
- List surname particles at the beginning of the name
- See Also
- 4.19. Validate Password Complexity
- Problem
- Solution
- Length between 8 and 32 characters
- ASCII visible and space characters only
- One or more uppercase letters
- One or more lowercase letters
- One or more numbers
- One or more special characters
- Disallow three or more sequential identical characters
- Example JavaScript solution, basic
- Example JavaScript solution, with x out of y validation
- Example JavaScript solution, with password security ranking
- Discussion
- Example JavaScript solutions
- Variations
- Validate multiple password rules with a single regex
- See Also
- 4.20. Validate Credit Card Numbers
- Problem
- Solution
- Strip spaces and hyphens
- Validate the number
- Example web page with JavaScript
- Discussion
- Strip spaces and hyphens
- Validate the number
- Incorporating the solution into a web page
- Extra Validation with the Luhn Algorithm
- See Also
- 4.21. European VAT Numbers
- Problem
- Solution
- Strip whitespace and punctuation
- Validate the number
- Discussion
- Strip whitespace and punctuation
- Validate the number
- Variations
- See Also
- 4.1. Validate Email Addresses
- 5. Words, Lines, and Special Characters
- 5.1. Find a Specific Word
- Problem
- Solution
- Discussion
- See Also
- 5.2. Find Any of Multiple Words
- Problem
- Solution
- Using alternation
- Example JavaScript solution
- Discussion
- Using alternation
- Example JavaScript solution
- See Also
- 5.3. Find Similar Words
- Problem
- Solution
- Color or colour
- Bat, cat, or rat
- Words ending with phobia
- Steve, Steven, or Stephen
- Variations of regular expression
- Discussion
- Use word boundaries to match complete words
- Color or colour
- Bat, cat, or rat
- Words ending with phobia
- Steve, Steven, or Stephen
- Variations of regular expression
- See Also
- 5.4. Find All Except a Specific Word
- Problem
- Solution
- Discussion
- Variations
- Find words that dont contain another word
- See Also
- 5.5. Find Any Word Not Followed by a Specific Word
- Problem
- Solution
- Discussion
- Variations
- See Also
- 5.6. Find Any Word Not Preceded by a Specific Word
- Problem
- Solution
- Lookbehind you
- Words not preceded by cat
- Simulate lookbehind
- Discussion
- Fixed, finite, and infinite length lookbehind
- Simulate lookbehind
- Variations
- See Also
- 5.7. Find Words Near Each Other
- Problem
- Solution
- Discussion
- Variations
- Using a conditional
- Match three or more words near each other
- Exponentially increasing permutations
- The ugly solution
- Exploiting empty backreferences
- JavaScript backreferences by its own rules
- Multiple words, any distance from each other
- See Also
- 5.8. Find Repeated Words
- Problem
- Solution
- Discussion
- Variations
- See Also
- 5.9. Remove Duplicate Lines
- Problem
- Solution
- Option 1: Sort lines and remove adjacent duplicates
- Option 2: Keep the last occurrence of each duplicate line in an unsorted file
- Option 3: Keep the first occurrence of each duplicate line in an unsorted file
- Discussion
- Option 1: Sort lines and remove adjacent duplicates
- Option 2: Keep the last occurrence of each duplicate line in an unsorted file
- Option 3: Keep the first occurrence of each duplicate line in an unsorted file
- See Also
- 5.10. Match Complete Lines That Contain a Word
- Problem
- Solution
- Discussion
- Variations
- See Also
- 5.11. Match Complete Lines That Do Not Contain a Word
- Problem
- Solution
- Discussion
- See Also
- 5.12. Trim Leading and Trailing Whitespace
- Problem
- Solution
- Discussion
- Variations
- See Also
- 5.13. Replace Repeated Whitespace with a Single Space
- Problem
- Solution
- Clean any whitespace characters
- Clean horizontal whitespace characters
- Discussion
- Clean any whitespace characters
- Clean horizontal whitespace characters
- See Also
- 5.14. Escape Regular Expression Metacharacters
- Problem
- Solution
- Built-in solutions
- Regular expression
- Replacement
- Example JavaScript function
- Discussion
- Variations
- See Also
- 5.1. Find a Specific Word
- 6. Numbers
- 6.1. Integer Numbers
- Problem
- Solution
- Discussion
- See Also
- 6.2. Hexadecimal Numbers
- Problem
- Solution
- Discussion
- See Also
- 6.3. Binary Numbers
- Problem
- Solution
- Discussion
- See Also
- 6.4. Octal Numbers
- Problem
- Solution
- Discussion
- See Also
- 6.5. Decimal Numbers
- Problem
- Solution
- Discussion
- See Also
- 6.6. Strip Leading Zeros
- Problem
- Solution
- Regular expression
- Replacement
- Getting the numbers in Perl
- Stripping leading zeros in PHP
- Discussion
- See Also
- 6.7. Numbers Within a Certain Range
- Problem
- Solution
- Discussion
- See Also
- 6.8. Hexadecimal Numbers Within a Certain Range
- Problem
- Solution
- Discussion
- See Also
- 6.9. Integer Numbers with Separators
- Problem
- Solution
- Discussion
- See Also
- 6.10. Floating-Point Numbers
- Problem
- Solution
- Discussion
- See Also
- 6.11. Numbers with Thousand Separators
- Problem
- Solution
- Discussion
- See Also
- 6.12. Add Thousand Separators to Numbers
- Problem
- Solution
- Basic solution
- Match separator positions only, using lookbehind
- Discussion
- Introduction
- Basic solution
- Match separator positions only, using lookbehind
- Variations
- Dont add commas after a decimal point
- Use infinite lookbehind
- Search-and-replace within matched numbers
- Dont add commas after a decimal point
- See Also
- 6.13. Roman Numerals
- Problem
- Solution
- Discussion
- Convert Roman Numerals to Decimal
- See Also
- 6.1. Integer Numbers
- 7. Source Code and Log Files
- Keywords
- Problem
- Solution
- Discussion
- Variations
- See Also
- Identifiers
- Problem
- Solution
- Discussion
- See Also
- Numeric Constants
- Problem
- Solution
- Discussion
- See Also
- Operators
- Problem
- Solution
- Discussion
- Single-Line Comments
- Problem
- Solution
- Discussion
- See Also
- Multiline Comments
- Problem
- Solution
- Discussion
- Variations
- See Also
- All Comments
- Problem
- Solution
- Discussion
- See Also
- Strings
- Problem
- Solution
- Discussion
- Variations
- See Also
- Strings with Escapes
- Problem
- Solution
- Discussion
- Variations
- See Also
- Regex Literals
- Problem
- Solution
- Discussion
- See Also
- Here Documents
- Problem
- Solution
- Discussion
- See Also
- Common Log Format
- Problem
- Solution
- Discussion
- Variations
- See Also
- Combined Log Format
- Problem
- Solution
- Discussion
- See Also
- Broken Links Reported in Web Logs
- Problem
- Solution
- Discussion
- See Also
- Keywords
- 8. URLs, Paths, and Internet Addresses
- 8.1. Validating URLs
- Problem
- Solution
- Discussion
- See Also
- 8.2. Finding URLs Within Full Text
- Problem
- Solution
- Discussion
- See Also
- 8.3. Finding Quoted URLs in Full Text
- Problem
- Solution
- Discussion
- See Also
- 8.4. Finding URLs with Parentheses in Full Text
- Problem
- Solution
- Discussion
- See Also
- 8.5. Turn URLs into Links
- Problem
- Solution
- Discussion
- See Also
- 8.6. Validating URNs
- Problem
- Solution
- Discussion
- See Also
- 8.7. Validating Generic URLs
- Problem
- Solution
- Discussion
- See Also
- 8.8. Extracting the Scheme from a URL
- Problem
- Solution
- Extract the scheme from a URL known to be valid
- Extract the scheme while validating the URL
- Discussion
- See Also
- 8.9. Extracting the User from a URL
- Problem
- Solution
- Extract the user from a URL known to be valid
- Extract the user while validating the URL
- Discussion
- See Also
- 8.10. Extracting the Host from a URL
- Problem
- Solution
- Extract the host from a URL known to be valid
- Extract the host while validating the URL
- Discussion
- See Also
- 8.11. Extracting the Port from a URL
- Problem
- Solution
- Extract the port from a URL known to be valid
- Extract the port while validating the URL
- Discussion
- See Also
- 8.12. Extracting the Path from a URL
- Problem
- Solution
- Discussion
- See Also
- 8.13. Extracting the Query from a URL
- Problem
- Solution
- Discussion
- See Also
- 8.14. Extracting the Fragment from a URL
- Problem
- Solution
- Discussion
- See Also
- 8.15. Validating Domain Names
- Problem
- Solution
- Discussion
- See Also
- 8.16. Matching IPv4 Addresses
- Problem
- Solution
- Regular expression
- Perl
- Discussion
- See Also
- 8.17. Matching IPv6 Addresses
- Problem
- Solution
- Standard notation
- Mixed notation
- Standard or mixed notation
- Compressed notation
- Compressed mixed notation
- Standard, mixed, or compressed notation
- Discussion
- Standard notation
- Mixed notation
- Standard or mixed notation
- Compressed notation
- Compressed mixed notation
- Standard, mixed, or compressed notation
- See Also
- 8.18. Validate Windows Paths
- Problem
- Solution
- Drive letter paths
- Drive letter and UNC paths
- Drive letter, UNC, and relative paths
- Discussion
- Drive letter paths
- Drive letter and UNC paths
- Drive letter, UNC, and relative paths
- See Also
- 8.19. Split Windows Paths into Their Parts
- Problem
- Solution
- Drive letter paths
- Drive letter and UNC paths
- Drive letter, UNC, and relative paths
- Discussion
- Drive letter paths
- Drive letter and UNC paths
- Drive letter, UNC, and relative paths
- See Also
- 8.20. Extract the Drive Letter from a Windows Path
- Problem
- Solution
- Discussion
- See Also
- 8.21. Extract the Server and Share from a UNC Path
- Problem
- Solution
- Discussion
- See Also
- 8.22. Extract the Folder from a Windows Path
- Problem
- Solution
- Discussion
- See Also
- 8.23. Extract the Filename from a Windows Path
- Problem
- Solution
- Discussion
- See Also
- 8.24. Extract the File Extension from a Windows Path
- Problem
- Solution
- Discussion
- See Also
- 8.25. Strip Invalid Characters from Filenames
- Problem
- Solution
- Regular expression
- Replacement
- Discussion
- See Also
- 8.1. Validating URLs
- 9. Markup and Data Formats
- Processing Markup and Data Formats with Regular Expressions
- Basic Rules for Formats Covered in This Chapter
- 9.1. Find XML-Style Tags
- Problem
- Solution
- Quick and dirty
- Allow > in attribute values
- (X)HTML tags (loose)
- (X)HTML tags (strict)
- XML tags (strict)
- Discussion
- A few words of caution
- Quick and dirty
- Allow > in attribute values
- (X)HTML tags (loose)
- (X)HTML tags (strict)
- XML tags (strict)
- Skip Tricky (X)HTML and XML Sections
- Outer regex for (X)HTML
- Outer regex for XML
- See Also
- 9.2. Replace <b> Tags with <strong>
- Problem
- Solution
- Discussion
- Variations
- Replace a list of tags
- See Also
- 9.3. Remove All XML-Style Tags Except <em> and <strong>
- Problem
- Solution
- Solution 1: Match tags except <em> and <strong>
- Solution 2: Match tags except <em> and <strong>, and any tags that contain attributes
- Discussion
- Variations
- Whitelist specific attributes
- See Also
- 9.4. Match XML Names
- Problem
- Solution
- XML 1.0 names (approximate)
- XML 1.1 names (exact)
- Discussion
- XML 1.0 names
- XML 1.1 names
- Variations
- See Also
- 9.5. Convert Plain Text to HTML by Adding <p> and <br> Tags
- Problem
- Solution
- Step 1: Replace HTML special characters with named character references
- Step 2: Replace all line breaks with <br>
- Step 3: Replace double <br> tags with </p><p>
- Step 4: Wrap the entire string with <p></p>
- Example JavaScript solution
- Discussion
- Step 1: Replace HTML special characters with named character references
- Step 2: Replace all line breaks with <br>
- Step 3: Replace double <br> tags with </p><p>
- Step 4: Wrap the entire string with <p></p>
- See Also
- 9.6. Decode XML Entities
- Problem
- Solution
- Regular expression
- Replace matches with their corresponding literal characters
- Example JavaScript solution
- Discussion
- See Also
- 9.7. Find a Specific Attribute in XML-Style Tags
- Problem
- Solution
- Tags that contain an id attribute (quick and dirty)
- Tags that contain an id attribute (more reliable)
- <div> tags that contain an id attribute
- Tags that contain an id attribute with the value my-id
- Tags that contain my-class within their class attribute value
- Discussion
- See Also
- 9.8. Add a cellspacing Attribute to <table> Tags That Do Not Already Include It
- Problem
- Solution
- Solution 1, simplistic
- Solution 2, more reliable
- Insert the new attribute
- Discussion
- See Also
- 9.9. Remove XML-Style Comments
- Problem
- Solution
- Discussion
- How it works
- When comments cant be removed
- Variations
- Find valid XML comments
- Find valid HTML comments
- See Also
- 9.10. Find Words Within XML-Style Comments
- Problem
- Solution
- Two-step approach
- Single-step approach
- Discussion
- Two-step approach
- Single-step approach
- Variations
- See Also
- 9.11. Change the Delimiter Used in CSV Files
- Problem
- Solution
- Example web page with JavaScript
- Discussion
- See Also
- 9.12. Extract CSV Fields from a Specific Column
- Problem
- Solution
- Example web page with JavaScript
- Discussion
- Variations
- Match a CSV record and capture the field in column 1 to backreference 1
- Match a CSV record and capture the field in column 2 to backreference 1
- Match a CSV record and capture the field in column 3 or higher to backreference 1
- Replacement string
- See Also
- 9.13. Match INI Section Headers
- Problem
- Solution
- Discussion
- Variations
- See Also
- 9.14. Match INI Section Blocks
- Problem
- Solution
- Discussion
- See Also
- 9.15. Match INI Name-Value Pairs
- Problem
- Solution
- Discussion
- See Also
- Processing Markup and Data Formats with Regular Expressions
- Index
- About the Authors
- Colophon
- SPECIAL OFFER: Upgrade this ebook with OReilly
- Copyright