Automatic parser

Sorry, this project is discontinued because of a lack of time. If you want to start maintaining it, contact me.

This widget is slightly more complicated to use than my other widgets, but it can be used in many situations. It extracts any information from a webpage, reformats it to adapt it to the widget (resize, paging) and displays it.

Download

This widget is available in English and French.

To use this widget on the different supported plateforms (Netvibes, IGoogle, Vista sidebar, Apple dashboard, Opera), go to its Netvibes ecosystem page.

Installation

The installation is demonstrated on video in the widget above. Click on the link to see the demonstration in the widget.

Here are the steps to install the widget:

  1. Type the URL of the webpage you are interested in.
  2. Move your mouse over elements in the webpage: their color changes. Click to select information you are interested in: only one block that you want to display in the widget or many blocks of the same type. In the latter case, all blocks of the same type will be displayed by the widget. You don’t have to select all elements you want. A representative number of elements is sufficient.
  3. Click on “Parse the information”. The widget selects all blocks you seem to be interested in. If it doesn’t select enough elements, select some missing elements and resume step 3. If it selects too many elements, it means that the widget can’t find common features to the elements you selected. Try and modify your selection by selecting less elements, or selecting them an other way.
  4. Click on “Validate”, selected information gets displayed. You may adjust the options of the widget to improve the display.
  5. As you begin to use the widget, when the source website is updated, information might become incomplete, or not displayed at all. Then click on “Configure” and resume the process at step 3 without clicking on “Reset the selection”. The widget will learn from its mistakes, it should happen at most three/four times if you are unlucky.

Working principles

To determine which elements you are interested in in the webpage, this widget generates a path to the elements you have selected, using the structure of the HTML document: type, classes and identifiers of the nodes that structure the page.

If you select only one element, the path to this element is saved. If you select more elements, the different generated paths are merged so that the most information possible is kept, and the resulting path is saved.

All elements fitting the saved path are then displayed in the widget (and adapted to display in the limited size of the widget).

If the path was too precise, elements that you want could not be displayed after an update of the source page. Then you can reconfigure the widget and select missing elements. Their paths will be merged with the already saved path to get a less restrictive path.

They talk about it