English | 2019 | ISBN-13 : 978-1789533392 | 350 Pages | True (PDF, MOBI) + Code | 23.7 MB
Web scraping is an essential technique used in many organizations to gather valuable data from web pages.
Collect and scrape data of varying complexity from the modern web using the latest tools, best practices, and techniques
Learn different scraping techniques using a range of Python libraries such as Scrapy and Beautiful Soup
Build scrapers and crawlers to extract relevant information from the web
Automate web scraping operations to bridge any gaps in accuracy and manage complex business needs
This book will help you get hands-on with different web scraping techniques, tools, and methodologies.
You’ll start by learning the fundamental concepts of web scraping techniques and how they can be applied to multiple sets of web pages. You’ll use powerful libraries from the Python ecosystem such as Scrapy, lxml, pyquery, and bs4 to carry out web scraping operations. Next, you’ll get up to speed with simple to intermediate scraping operations such as identifying information from web pages and using patterns or attributes to retrieve information. The book will further guide you through a series of use cases and demonstrate how to use the best tools and techniques to efficiently scrape web pages. Later, you’ll even explore the uses of other popular web scraping tools, such as Selenium and Regex, and web-based APIs.
By the end of this book, you will have learned how to efficiently scrape the web using different techniques with Python and other popular tools.
What you will learn
Analyze data and information from web pages
Understand how to use browser-based developer tools for scraping
Use XPath and CSS selectors to identify and explore markup elements
Discover how to handle and manage cookies
Explore advanced concepts in handling HTML forms and processing logins
Optimize web securities, data storage, and API use to scrape data
Use Regex with Python to extract data
Deal with complex web entities by using Selenium to find and extract data
Who this book is for
This book is for Python programmers, data analysts, web scraping bners, or anyone who wants to learn how to perform web scraping from scratch. Working knowledge of the Python programming language is expected.
Table of Contents
Web Scraping Fundamentals
Python and the Web – Using urllib and Requests
Using LXML, XPath, and CSS Selectors
Scraping Using pyquery – a Python Library
Web Scraping Using Scrapy and Beautiful Soup
Working with Secure Web
Data Extraction Using Web-Based APIs
Using Selenium to Scrape the Web
Using Regex to Extract Data