English | 2019 | ISBN-13 : 978-1789533392 | 350 Pages | True (PDF, MOBI) + Code | 23.7 MB

Web scraping is an essential technique used in many organizations to gather valuable data from web pages.

Collect and scrape data of varying complexity from the modern web using the latest tools, best practices, and techniques

Key Features

Learn different scraping techniques using a range of Python libraries such as Scrapy and Beautiful Soup

Build scrapers and crawlers to extract relevant information from the web

Automate web scraping operations to bridge any gaps in accuracy and manage complex business needs

Book Description

This book will help you get hands-on with different web scraping techniques, tools, and methodologies.

You’ll start by learning the fundamental concepts of web scraping techniques and how they can be applied to multiple sets of web pages. You’ll use powerful libraries from the Python ecosystem such as Scrapy, lxml, pyquery, and bs4 to carry out web scraping operations. Next, you’ll get up to speed with simple to intermediate scraping operations such as identifying information from web pages and using patterns or attributes to retrieve information. The book will further guide you through a series of use cases and demonstrate how to use the best tools and techniques to efficiently scrape web pages. Later, you’ll even explore the uses of other popular web scraping tools, such as Selenium and Regex, and web-based APIs.

By the end of this book, you will have learned how to efficiently scrape the web using different techniques with Python and other popular tools.

What you will learn

Analyze data and information from web pages

Understand how to use browser-based developer tools for scraping

Use XPath and CSS selectors to identify and explore markup elements

Discover how to handle and manage cookies

Explore advanced concepts in handling HTML forms and processing logins

Optimize web securities, data storage, and API use to scrape data

Use Regex with Python to extract data

Deal with complex web entities by using Selenium to find and extract data

Who this book is for

This book is for Python programmers, data analysts, web scraping bners, or anyone who wants to learn how to perform web scraping from scratch. Working knowledge of the Python programming language is expected.

Table of Contents

Web Scraping Fundamentals

Python and the Web – Using urllib and Requests

Using LXML, XPath, and CSS Selectors

Scraping Using pyquery – a Python Library

Web Scraping Using Scrapy and Beautiful Soup

Working with Secure Web

Data Extraction Using Web-Based APIs

Using Selenium to Scrape the Web

Using Regex to Extract Data

Next Steps








Please enter your comment!
Please enter your name here