Language
KeywordExtractor
Extracts keywords from subtitles using spaCy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keywords |
list[str]
|
List of keywords to extract. |
required |
Attributes:
Name | Type | Description |
---|---|---|
keywords |
list[str]
|
List of keywords to extract. |
nlp |
spaCy language model for text processing. |
|
lemmatized_keywords |
set[str]
|
Set of lemmatized keywords. |
Methods:
Name | Description |
---|---|
generate_segments |
Captures keyword segments from a list of subtitles. |
Source code in video_sampler/language/keyword_capture.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
generate_segments(subtitle_list)
Captures keyword segments from a list of subtitles.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subtitle_list |
list[tuple[tuple[int, int], str]]
|
List of subtitles in the format (start_time, end_time, content). |
required |
Yields:
Name | Type | Description |
---|---|---|
subtitle_line |
Iterable[subtitle_line]
|
A named tuple representing a keyword segment in the format (start_time, end_time, lemma, content). |
Source code in video_sampler/language/keyword_capture.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
download_sub(sub_url, max_retries=2)
Download a VTT subtitle file to a string with retry mechanism.
Source code in video_sampler/language/keyword_capture.py
14 15 16 17 18 19 20 21 22 23 |
|
parse_srt_subtitle(srt_content)
Parse a SRT subtitle file to a list of subtitle segments.
Source code in video_sampler/language/keyword_capture.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|