.. "CDCF ecig Documentation Page" Web Scraping ============ Sites ------ See ``scraping`` directory. For now, these are mostly in Jupyter notebooks and may require some future cleanup, but for the time being the notebooks are available for the following sites: - CS Vape - Get Pop - My Vapor Store - Perfect Vape - Vape.com - Vape Sourcing - Vape WH - Vaping.com Some of the original regular expression functions developed as a demo are available here, but we expect them to all eventually be replaced and/or migrated to the NLP code section. Total Items Scraped ------------------- Text fields are available in `scraped_data` directory. Images are available in via Box if needed, upon request. .. list-table:: Gathered Data :header-rows: 1 * - Site - Items gathered - Images gathered * - mipod - 1,053 - 1,036 * - csvape - 621 - 439 * - getoop - 972 - 972 * - myvaporstore - 2,056 - 578 * - vape.com - 5,454 - 34,589 * - perfectvape - 923 - 2,835 * - vapewh - 362 - 12,957 * - vapesourcing - 2,587 - 34,243 * - vaping.com - 1,202 - 4,020 * - ElementVape - - Mipod provided by CDCF.