urlchecker.core.fileproc¶
Copyright (c) 2020-2022 Ayoub Malek and Vanessa Sochat
This source code is licensed under the terms of the MIT license. For a copy, see <https://opensource.org/licenses/MIT>.
-
urlchecker.core.fileproc.
check_file_type
(file_path: str, file_types: List[str]) → bool[source]¶ Check file type to assert that only file with certain predefined extensions are checked. We currently support an extension verbatim, or regular expression to match the filename. For example, .* matches all hidden files, and *.html matches an html file.
- Args:
- file_path (str) : path to file.
- file_types (list) : list of file extensions to accept.
- Returns:
- (bool) true if file type is supported else false.
-
urlchecker.core.fileproc.
collect_links_from_file
(file_path: str, unique: bool = True) → List[str][source]¶ Collect all links in a file.
- Args:
- file_path (str) : path to file.
- unique (bool) : specify whether to filter out duplicate links.
- Returns:
- (list) list of links/ urls in a file.
-
urlchecker.core.fileproc.
get_file_paths
(base_path: str, file_types: List[str], exclude_files: List[str] = None, include_patterns: List[str] = None) → List[str][source]¶ Get path to all files under a give directory and its subfolders.
- Args:
- base_path (str) : base path.
- file_types (list) : list of file extensions to accept.
- include_patterns (list) : list of files and patterns to include.
- exclude_files (list) : list of files or patterns to exclude
- Returns:
- (list) list of file paths.
-
urlchecker.core.fileproc.
include_file
(file_path: str, exclude_patterns: List[str] = None, include_patterns: List[str] = None) → bool[source]¶ Check a file path for inclusion based on an OR regular expression. The user is currently not notified if a file is marked for removal.
- Args:
- file_path (str) : a file path to check if should be included.
- exclude_patterns (list) : list of patterns to exclude.
- include_patterns (list) : list of patterns to include.
- Returns:
- (bool) boolean indicating if the URL should be excluded (not tested).