Computer Networks | Institute of Theoretical and Applied Informatics, Polish Academy of Sciences

Title	Computer Networks
Publication Type	Book Chapter
Year of Publication	2012
Authors	Pawlas P., Domański A, Domańska J
Editor	Kwiecień A, Gaj P, Stera P.
Book Title	Communications in Computer and Information Science
Volume	291
Chapter	Universal Web pages content parser
Publisher	Springer
City	Berlin Heidelberg
ISBN Number	978-3-642-31216-8
Abstract	This article describes the universal web pages content parser-cross-platform application enhancing the process of data extraction from the web pages. In this implementation user friendly interface, possibility of significant automation and reusability of already created patterns had been the key elements. Moreover, the original approach to the issue of parsing the not well-formed HTML, stating the application`s core, is precisely presented. Universal web pages content parser shows that the simplified web scrapping utility may be available to masses and not well-formed HTML sources may feed useful tree-like data structures as well as the well-formed ones.

Historia zmian