Spanish Web Archive

Web archiving is the main way to accomplish the legal deposit of online publications. It is carried out with crawler robots that go through previously selected URLs and save everything they have linked to the frequency, depth and size that is determined. The result of these process of harvesting are the web archives.

Today, it is impossible to achieve exhaustiveness in web archiving, so the Spanish National Library has opted for a mixed model that combines massive and selective collections:

  1. Bulk crawls collect the largest number of domains possible, while not delving deep into the navigation levels, and they are linked with a national top-level domain, such as .es. They are carried out once a year.
  2. Selective crawls are made to complete the massive collections, as they collect more frequently a smaller sample of websites selected for their relevance to history, society and culture. They are carried out several times a year in collaboration with the conservation centers of the Autonomous Communities and other specialized institutions. These selective collections can be of three types:
    1. Themed crawls: each department of the Biblioteca Nacional de España and each Autonomous Region maintain its thematic collections with online resources that they consider necessary to keep as part of the legal deposit. For example: Music and Audiovisuals, Andalusian electronic journals, Institutions of the Valencian Community, etc.
    2. Crawls for events: about events of special relevance.
    3. Crawls for risk: in the case of websites in danger of extinction.

Further information: BNE Web archive

DATA AND FORMATS AVAILABLE:

 

Bulk crawls
OpenWayBack HTML (*Only available at the BNE building)
Themed crawls
Fine arts CSV, JSON, ODS, TXT, XLS, XML
Feminism CSV, JSON, ODS, TXT, XLS, XML
Cervantes CSV, JSON, ODS, TXT, XLS, XML
Old collection CSV, JSON, ODS, TXT, XLS, XML
Gastronomy CSV, JSON, ODS, TXT, XLS, XML
La BNE CSV, JSON, ODS, TXT, XLS, XML
Environment and climate change CSV, JSON, ODS, TXT, XLS, XML
Music and audio-visual CSV, JSON, ODS, TXT, XLS, XML
Public Organisations CSV, JSON, ODS, TXT, XLS, XML
Regional Television and Press CSV, JSON, ODS, TXT, XLS, XML
National Television and Press CSV, JSON, ODS, TXT, XLS, XML
Catalan politics CSV, JSON, ODS, TXT, XLS, XML
National politics CSV, JSON, ODS, TXT, XLS, XML
Traditions CSV, JSON, ODS, TXT, XLS, XML
Spanish Universities CSV, JSON, ODS, TXT, XLS, XML
Autonomous Communities crawls
Andalucía CSV, JSON, ODS, TXT, XLS, XML
Aragón CSV, JSON, ODS, TXT, XLS, XML
Asturias CSV, JSON, ODS, TXT, XLS, XML
Canarias CSV, JSON, ODS, TXT, XLS, XML
Cantabria CSV, JSON, ODS, TXT, XLS, XML
Castilla-La Mancha CSV, JSON, ODS, TXT, XLS, XML
Castilla y León CSV, JSON, ODS, TXT, XLS, XML
Comunidad de Madrid CSV, JSON, ODS, TXT, XLS, XML
Comunidad Foral de Navarra CSV, JSON, ODS, TXT, XLS, XML
Comunidad Valenciana CSV, JSON, ODS, TXT, XLS, XML
Extremadura CSV, JSON, ODS, TXT, XLS, XML
Galicia CSV, JSON, ODS, TXT, XLS, XML
La Rioja CSV, JSON, ODS, TXT, XLS, XML
Murcia CSV, JSON, ODS, TXT, XLS, XML
País Vasco CSV, JSON, ODS, TXT, XLS, XML
Crawls for events
Abdication of Juan Carlos I and proclamation of Felipe VI CSV, JSON, ODS, TXT, XLS, XML
Death of Adolfo Suárez CSV, JSON, ODS, TXT, XLS, XML
Catalan self-determination referendum of 9 November 2014 CSV, JSON, ODS, TXT, XLS, XML
General elections 2015-2016 CSV, JSON, ODS, TXT, XLS, XML
Galician elections 2016 CSV, JSON, ODS, TXT, XLS, XML
Basque elections 2016  CSV, JSON, ODS, TXT, XLS, XML
European Parliament elections 2014 CSV, JSON, ODS, TXT, XLS, XML
ETA disarmament CSV, JSON, ODS, TXT, XLS, XML
Terrorist attacks in Catalonia CSV, JSON, ODS, TXT, XLS, XML
Catalan elections 2017 CSV, JSON, ODS, TXT, XLS, XML
Andalusian elections 2018 CSV, JSON, ODS, TXT, XLS, XML
General elections 2019 CSV, JSON, ODS, TXT, XLS, XML
European Parliament Elections 2019 CSV, JSON, ODS, TXT, XLS, XML
Municipal and regional elections 2019 CSV, JSON, ODS, TXT, XLS, XML
LGTBI pride CSV, JSON, ODS, TXT, XLS, XML
Coronavirus (COVID-19) CSV, JSON, ODS, TXT, XLS, XML
Basque elections 2020 CSV, JSON, ODS, TXT, XLS, XML
Galician elections 2020 CSV, JSON, ODS, TXT, XLS, XML
eCrawls for risk
Wikispaces  CSV, JSON, ODS, TXT, XLS, XML
Websites in danger of extinction CSV, JSON, ODS, TXT, XLS, XML