Perl & LWP - Helion
ISBN: 978-05-965-5209-1
stron: 262, Format: ebook
Data wydania: 2002-06-20
Księgarnia: Helion
Cena książki: 118,15 zł (poprzednio: 137,38 zł)
Oszczędzasz: 14% (-19,23 zł)
Perl soared to popularity as a language for creating and managing web content, but with LWP (Library for WWW in Perl), Perl is equally adept at consuming information on the Web. LWP is a suite of modules for fetching and processing web pages.The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is just a few lines of code away. Anything you do on the Web, whether it's buying or selling, reading or writing, uploading or downloading, news to e-commerce, can be controlled with Perl and LWP. You can automate Web-based purchase orders as easily as you can set up a program to download MP3 files from a web site.Perl & LWP covers:
- Understanding LWP and its design
- Fetching and analyzing URLs
- Extracting information from HTML using regular expressions and tokens
- Working with the structure of HTML documents using trees
- Setting and inspecting HTTP headers and response codes
- Managing cookies
- Accessing information that requires authentication
- Extracting links
- Cooperating with proxy caches
- Writing web spiders (also known as robots) in a safe fashion
Osoby które kupowały "Perl & LWP", wybierały także:
- Hands-On Gradient Boosting with XGBoost and scikit-learn 142,38 zł, (29,90 zł -79%)
- Perl. Mistrzostwo w programowaniu 44,00 zł, (24,20 zł -45%)
- Wielkie umysły programowania. Jak myślą i pracują twórcy najważniejszych języków 79,00 zł, (47,40 zł -40%)
- Learning Perl. Making Easy Things Easy and Hard Things Possible. 7th Edition 169,00 zł, (118,30 zł -30%)
- 100 sposobów na Perl 39,00 zł, (29,25 zł -25%)
Spis treści
Perl & LWP eBook -- spis treści
- Perl & LWP
- SPECIAL OFFER: Upgrade this ebook with OReilly
- A Note Regarding Supplemental Files
- Foreword
- Preface
- Audience for This Book
- Structure of This Book
- Order of Chapters
- Important Standards Documents
- Conventions Used in This Book
- Comments & Questions
- Acknowledgments
- 1. Introduction to Web Automation
- 1.1. The Web as Data Source
- 1.1.1. Screen Scraping
- 1.1.2. Brittleness
- 1.1.3. Web Services
- 1.2. History of LWP
- 1.3. Installing LWP
- 1.3.1. Installing LWP from the CPAN Shell
- 1.3.1.1. Configuring
- 1.3.1.2. Obtaining help
- 1.3.1.3. Installing LWP
- 1.3.2. Installing LWP Manually
- 1.3.2.1. Download distributions
- 1.3.2.2. Unpack and configure
- 1.3.2.3. Make, test, and install
- 1.3.1. Installing LWP from the CPAN Shell
- 1.4. Words of Caution
- 1.4.1. Network and Server Load
- 1.4.2. Copyright
- 1.4.3. Acceptable Use
- 1.5. LWP in Action
- 1.5.1. The Object-Oriented Interface
- 1.5.2. Forms
- 1.5.3. Parsing HTML
- 1.5.4. Authentication
- 1.1. The Web as Data Source
- 2. Web Basics
- 2.1. URLs
- 2.2. An HTTP Transaction
- 2.2.1. Request
- 2.2.2. Response
- 2.3. LWP::Simple
- 2.3.1. Basic Document Fetch
- 2.3.2. Fetch and Store
- 2.3.3. Fetch and Print
- 2.3.4. Previewing with HEAD
- 2.4. Fetching Documents Without LWP::Simple
- 2.5. Example: AltaVista
- 2.6. HTTP POST
- 2.7. Example: Babelfish
- 3. The LWP Class Model
- 3.1. The Basic Classes
- 3.2. Programming with LWP Classes
- 3.3. Inside the do_GET and do_POST Functions
- 3.4. User Agents
- 3.4.1. Connection Parameters
- 3.4.2. Request Parameters
- 3.4.3. Protocols
- 3.4.4. Redirection
- 3.4.5. Authentication
- 3.4.6. Proxies
- 3.4.7. Request Methods
- 3.4.7.1. Saving response content to a file
- 3.4.7.2. Sending response content to a callback
- 3.4.7.3. Mirroring a URL to a file
- 3.4.8. Advanced Methods
- 3.5. HTTP::Response Objects
- 3.5.1. Status Line
- 3.5.2. Content
- 3.5.3. Headers
- 3.5.4. Expiration Times
- 3.5.5. Base for Relative URLs
- 3.5.6. Debugging
- 3.6. LWP Classes: Behind the Scenes
- 4. URLs
- 4.1. Parsing URLs
- 4.1.1. Constructors
- 4.1.2. Output
- 4.1.3. Comparison
- 4.1.4. Components of a URL
- 4.1.5. Queries
- 4.2. Relative URLs
- 4.3. Converting Absolute URLs to Relative
- 4.4. Converting Relative URLs to Absolute
- 4.1. Parsing URLs
- 5. Forms
- 5.1. Elements of an HTML Form
- 5.2. LWP and GET Requests
- 5.2.1. GETting Fixed URLs
- 5.2.2. GETting a query_form( ) URL
- 5.3. Automating Form Analysis
- 5.4. Idiosyncrasies of HTML Forms
- 5.4.1. Hidden Elements
- 5.4.2. Text Elements
- 5.4.3. Password Elements
- 5.4.4. Checkboxes
- 5.4.5. Radio Buttons
- 5.4.6. Submit Buttons
- 5.4.7. Image Buttons
- 5.4.8. Reset Buttons
- 5.4.9. File Selection Elements
- 5.4.10. Textarea Elements
- 5.4.11. Select Elements and Option Elements
- 5.5. POST Example: License Plates
- 5.5.1. The Form
- 5.5.2. Use formpairs.pl
- 5.5.3. Translating This into LWP
- 5.6. POST Example: ABEBooks.com
- 5.6.1. The Form
- 5.6.2. Translating This into LWP
- 5.6.3. Adding Features
- 5.6.4. Generalizing the Program
- 5.7. File Uploads
- 5.8. Limits on Forms
- 6. Simple HTML Processing with Regular Expressions
- 6.1. Automating Data Extraction
- 6.2. Regular Expression Techniques
- 6.2.1. Anchor Your Match
- 6.2.2. Whitespace
- 6.2.3. Embedded Newlines
- 6.2.4. Minimal and Greedy Matches
- 6.2.5. Capture
- 6.2.6. Repeated Matches
- 6.2.7. Develop from Components
- 6.2.8. Use Multiple Steps
- 6.3. Troubleshooting
- 6.4. When Regular Expressions Arent Enough
- 6.5. Example: Extracting Linksfrom a Bookmark File
- 6.6. Example: Extracting Linksfrom Arbitrary HTML
- 6.7. Example: Extracting Temperatures from Weather Underground
- 7. HTML Processing with Tokens
- 7.1. HTML as Tokens
- 7.2. Basic HTML::TokeParser Use
- 7.2.1. Start-Tag Tokens
- 7.2.2. End-Tag Tokens
- 7.2.3. Text Tokens
- 7.2.4. Comment Tokens
- 7.2.5. Markup Declaration Tokens
- 7.2.6. Processing Instruction Tokens
- 7.3. Individual Tokens
- 7.3.1. Checking Image Tags
- 7.3.2. HTML Filters
- 7.4. Token Sequences
- 7.4.1. Example: BBC Headlines
- 7.4.2. Translating the Problem into Code
- 7.4.3. Bundling into a Program
- 7.5. More HTML::TokeParser Methods
- 7.5.1. The get_text( ) Method
- 7.5.2. The get_text( ) Method with Parameters
- 7.5.3. The get_trimmed_text( ) Method
- 7.5.4. The get_tag( ) Method
- 7.5.4.1. Start-tags
- 7.5.4.2. End-tags
- 7.5.5. The get_tag( ) Method with Parameters
- 7.6. Using Extracted Text
- 8. Tokenizing Walkthrough
- 8.1. The Problem
- 8.2. Getting the Data
- 8.3. Inspecting the HTML
- 8.4. First Code
- 8.5. Narrowing In
- 8.6. Rewrite for Features
- 8.6.1. Debuggability
- 8.6.2. Images and Applets
- 8.6.3. Link Text
- 8.6.4. Live Data
- 8.7. Alternatives
- 9. HTML Processing with Trees
- 9.1. Introduction to Trees
- 9.2. HTML::TreeBuilder
- 9.2.1. Constructors
- 9.2.2. Parse Options
- 9.2.3. Parsing
- 9.2.4. Cleanup
- 9.3. Processing
- 9.3.1. Methods for Searching the Tree
- 9.3.2. Attributes of a Node
- 9.3.3. Traversing
- 9.4. Example: BBC News
- 9.5. Example: Fresh Air
- 10. Modifying HTML with Trees
- 10.1. Changing Attributes
- 10.1.1. Whitespace
- 10.1.2. Other HTML Options
- 10.2. Deleting Images
- 10.3. Detaching and Reattaching
- 10.3.1. The detach_content( ) Method
- 10.3.2. Constraints
- 10.4. Attaching in Another Tree
- 10.4.1. Retaining Comments
- 10.4.2. Accessing Comments
- 10.4.3. Attaching Content
- 10.5. Creating New Elements
- 10.5.1. Literals
- 10.5.2. New Nodes from Lists
- 10.1. Changing Attributes
- 11. Cookies, Authentication,and Advanced Requests
- 11.1. Cookies
- 11.1.1. Enabling Cookies
- 11.1.2. Loading Cookies from a File
- 11.1.3. Saving Cookies to a File
- 11.1.4. Cookies and the New York Times Site
- 11.2. Adding Extra Request Header Lines
- 11.2.1. Pretending to Be Netscape
- 11.2.2. Referer
- 11.3. Authentication
- 11.3.1. Comparing Cookies with Basic Authentication
- 11.3.2. Authenticating via LWP
- 11.3.3. Security
- 11.4. An HTTP Authentication Example:The Unicode Mailing Archive
- 11.1. Cookies
- 12. Spiders
- 12.1. Types of Web-Querying Programs
- 12.2. A User Agent for Robots
- 12.3. Example: A Link-Checking Spider
- 12.3.1. The Basic Spider Logic
- 12.3.2. Overall Design in the Spider
- 12.3.3. HEAD Response Processing
- 12.3.4. Redirects
- 12.3.5. Link Extraction
- 12.3.6. Fleshing Out the URL Scheduling
- 12.3.7. The Rest of the Code
- 12.4. Ideas for Further Expansion
- A. LWP Modules
- B. HTTP Status Codes
- B.1. 100s: Informational
- B.2. 200s: Successful
- B.3. 300s: Redirection
- B.4. 400s: Client Errors
- B.5. 500s: Server Errors
- C. Common MIME Types
- D. Language Tags
- E. Common Content Encodings
- F. ASCII Table
- G. User's View of Object-Oriented Modules
- G.1. A User's View of Object-Oriented Modules
- G.2. Modules and Their Functional Interfaces
- G.3. Modules with Object-Oriented Interfaces
- G.4. What Can You Do with Objects?
- G.5. What's in an Object?
- G.6. What Is an Object Value?
- G.7. So Why Do Some Modules Use Objects?
- G.8. The Gory Details
- Index
- Colophon
- SPECIAL OFFER: Upgrade this ebook with OReilly