README.MD 1.35 KB
Newer Older
Lincoln Smith's avatar
Lincoln Smith committed
1
2
Library for scraping [Program Orders](https://policies.anu.edu.au/ppl/document/ANUP_006803) from the ANU [Programs and Courses](https://programsandcourses.anu.edu.au/) website.

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Provides functions for processing P&C html into an intermediate `dr_scraper.ReqNode` representation that attempts to preserve layout information, and a `dr_scraper.DegreeRuleScraper` class that can be used to generate a `dr_scraper.ProgramOrder` tree representaion of a set of requirements that should be amenable to further processing.

## Usage
Typical usage is to create a `DegreeRuleScraper` with a url:
```python
from dr_scraper import DegreeRuleScraper

scraper = DegreeRuleScraper('https://programsandcourses.anu.edu.au/program/VCOMP')
orders = scraper.build_program_order_struct()
```

The scraper class will attempt to interpret the path as a file path if processing it as a URL fails. You can force interpretation as a file path by setting `path_is_file`:
```python
scraper = DegreeRuleScraper('/some/file/here.html', path_is_file=True)
```

The other common parameter to change is `header_id`. Degree/program level plans use the default `header_id`, subplans like majors and minors have a different page structure and you will need to set `header_id` as below:
```python
scraper = DegreeRuleScraper('https://programsandcourses.anu.edu.au/specialisation/HCSD-SPEC', header_id='requirements')
```