Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

Sorry, you do not have permission to ask a question, You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please type your username.

Please type your E-Mail.

Please choose an appropriate title for the post.

Please choose the appropriate section so your post can be easily searched.

Please choose suitable Keywords Ex: post, video.

Browse

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Logo Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Logo

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Navigation

  • Home
  • About Us
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • About Us
  • Contact Us
Home/ Questions/Q 4984

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Latest Questions

Author
  • 61k
Author
Asked: November 27, 20242024-11-27T05:02:10+00:00 2024-11-27T05:02:10+00:00

Html Parser – How to scan HTML files for missing assets and broken links

  • 61k

Hello Coders,

The article presents a simple, open-source tool that I'm using to statically analyze HTML files for missing assets and broken links, before using the files in real projects. This Html Parser is basically a Python3 wrapper over Beautiful Soup, the popular OSS parsing library for HTML files and XMLs. The source code can be found on Github released under EULA License.


Thank you! Content provided by AppSeed – App Generator.


Features

  • Open-Source – can be also used for eLearning
  • Works with directories – all HTML files are scanned
  • Detects missing assets (JS, CSS, images ) for each page
  • Detects broken links and suggest the right path
  • Acceptable execution time – 100 Pages processed <1min

  • Html Parser – source code
  • Sample Output – captured from a real project
  • EULA License – free for solo-developers, small companies, startUps, and NGOs

Html Parser - Developer Tool crafted by AppSeed, animated presentation.


To use the tool we need to specify two things:

  • The folder where HTML files are saved
  • The assets folder – parent Directory for all JS, CSS, Images ..

Once we have provided this simple setup, we can call the scripts in the terminal:

$ python ./check-assets.py 
Enter fullscreen mode Exit fullscreen mode


HTML Parser – The Relevant Parts

To scan and correlate the information, the tool uses a few structures to save and reuse the relevant information and also perform simple operations over detected HTML files.

Hot it works

  • define a map where the key is the file name
  • associate a data structure to each file where the relevant information is stored and updated
  • Each HTML file is scanned for assets and links
  • Validate the information for each file and save the missing assets for each by looking on the disk

HTML Parser – Source Code

The relevant functions and code chunks are below. If something relevant is missing, feel free to ask for it in the comments section:


Read files from a directory

def get_files( aPath ):      FILES_LIST = []      for (root, dirs, files) in walk( aPath ):         FILES_LIST.extend( files )         break      return FILES_LIST 
Enter fullscreen mode Exit fullscreen mode


The structure/class to save the information for each file

class TMPL:      # constructor     def __init__(self, aFile=''):          self.file      = aFile         self.title     = ''         self.css       = [] # All CSS Files         self.js        = [] # All JS Files         self.img       = [] # All Images         self.links     = [] # All Links          self.err       = [] # used to report missing assets         self.err_links = [] # used to report missing assets      # Used to have a string representation      def __repr__(self):         return "" + self.file + ' some other info' 
Enter fullscreen mode Exit fullscreen mode


Initiate Beautiful Soup object for each file

def get_bs( aFile ):    minified = htmlmin.minify( file_load( aFile ), remove_empty_space=True)   return bs(minified,'html.parser') 
Enter fullscreen mode Exit fullscreen mode


Scan each file for Links and assets

The results are injected into associated structures for each file.

# BS object is constructed and available for queries   soup = get_bs( FULL_PATH )  # Scan for CSS files tmpl.css = get_css( soup )  # # Scan for JS files tmpl.css = get_js( soup )  ...  
Enter fullscreen mode Exit fullscreen mode

Links and images are scanned in the same way using simple helpers.
Once the information is saved, we can traverse the DOM using BS objects and perform mutations over elements.


HTML Parser – Sample output

To visualize a real production output, please access a sample file saved into the public repository: check assets – output


(env) PS > python.exe .check-assets.py   Files (2) ['apps-calendar.html', 'index.html']   ***** ***** *****   PROCESSING --> apps-calendar.html | files (1) remaining  PROCESSING --> index.html | files (0) remaining  PROCESSING --> apps-calendar.html  ERR - Missing Asset -> /static/assets/css/classic-horizontal/style-ERROR.css  ERR - Missing Asset -> /static/assets/images/logo-mini-ERROR.svg  PROCESSING --> index.html  ERR - Missing Asset -> /static/assets/images/favicon-ERROR.png     |     |- apps-calendar.html     |    |     |    |--- CSS: 6 file(s)     |          | /static/assets/vendors/mdi/css/materialdesignicons.min.css     |          | /static/assets/vendors/css/vendor.bundle.base.css     |          | /static/assets/vendors/fullcalendar/fullcalendar.min.css     |          | /static/assets/css/classic-horizontal/style.css     |          | /static/assets/css/classic-horizontal/style-ERROR.css     |          | /static/assets/images/favicon.png     |      ...  Pages with errors: 2     |     |- apps-calendar.html     |    |     | /static/assets/css/classic-horizontal/style-ERROR.css     |    |     | /static/assets/images/logo-mini-ERROR.svg     |     |- index.html     |    |     | /static/assets/images/favicon-ERROR.png 
Enter fullscreen mode Exit fullscreen mode

The tool can be easily extended to LIVE websites using the existing core. In case any of you find it useful, feel free to suggest features in the comments section or push a PR on Github.


Thank you! – For more resources, please access:


  • Beautiful Soup – the official docs
  • AppSeed – for more tools and starters

Btw, my (nick) name is Sm0ke and I'm pretty active also on Twitter.

devtoolspythonwebdev
  • 0 0 Answers
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

Sidebar

Ask A Question

Stats

  • Questions 4k
  • Answers 0
  • Best Answers 0
  • Users 2k
  • Popular
  • Answers
  • Author

    How to ensure that all the routes on my Symfony ...

    • 0 Answers
  • Author

    Insights into Forms in Flask

    • 0 Answers
  • Author

    Kick Start Your Next Project With Holo Theme

    • 0 Answers

Top Members

Samantha Carter

Samantha Carter

  • 0 Questions
  • 20 Points
Begginer
Ella Lewis

Ella Lewis

  • 0 Questions
  • 20 Points
Begginer
Isaac Anderson

Isaac Anderson

  • 0 Questions
  • 20 Points
Begginer

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help

Footer

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise

Querify Question Shop: Explore, ask, and connect. Join our vibrant Q&A community today!

About Us

  • About Us
  • Contact Us
  • All Users

Legal Stuff

  • Terms of Use
  • Privacy Policy
  • Cookie Policy

Help

  • Knowledge Base
  • Support

Follow

© 2022 Querify Question. All Rights Reserved

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.