Web Scraping¶
Sites¶
See scraping directory. For now, these are mostly in Jupyter notebooks and may require some future cleanup, but for the time being the notebooks are available for the following sites:
CS Vape
Get Pop
My Vapor Store
Perfect Vape
Vape.com
Vape Sourcing
Vape WH
Vaping.com
Some of the original regular expression functions developed as a demo are available here, but we expect them to all eventually be replaced and/or migrated to the NLP code section.
Total Items Scraped¶
Text fields are available in scraped_data directory. Images are available in via Box if needed, upon request.
Site |
Items gathered |
Images gathered |
|---|---|---|
mipod |
1,053 |
1,036 |
csvape |
621 |
439 |
getoop |
972 |
972 |
myvaporstore |
2,056 |
578 |
vape.com |
5,454 |
34,589 |
perfectvape |
923 |
2,835 |
vapewh |
362 |
12,957 |
vapesourcing |
2,587 |
34,243 |
vaping.com |
1,202 |
4,020 |
ElementVape |
Mipod provided by CDCF.