mirror of
https://github.com/lleene/hugo-site.git
synced 2025-01-23 03:52:21 +01:00
content update oct 2021
This commit is contained in:
parent
cc12fcc3d8
commit
b5d1242dc5
@ -13,6 +13,10 @@ This site shares a bit of informal documentation and more blog-based record
|
|||||||
keeping. Providing commentary on design decisions should be just as useful as
|
keeping. Providing commentary on design decisions should be just as useful as
|
||||||
some of the technical documentation however included in my repositories.
|
some of the technical documentation however included in my repositories.
|
||||||
|
|
||||||
|
### Contact
|
||||||
|
|
||||||
|
You can reach me at `lieuwe at leene dot dev`.
|
||||||
|
|
||||||
## My Setup
|
## My Setup
|
||||||
|
|
||||||
I mainly use RHEL flavours of linux having both CentOS and Fedora machines. Most
|
I mainly use RHEL flavours of linux having both CentOS and Fedora machines. Most
|
||||||
|
@ -1,6 +1,54 @@
|
|||||||
---
|
---
|
||||||
title: "Domain Setup"
|
title: "Domain Setup ☄💻"
|
||||||
date: 2021-09-19T17:14:03+02:00
|
date: 2021-09-19T17:14:03+02:00
|
||||||
draft: true
|
draft: false
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## DNS Records
|
||||||
|
|
||||||
|
The main part of setting up a domain is configuring your
|
||||||
|
[DNS Records](https://en.wikipedia.org/wiki/List_of_DNS_record_types). This
|
||||||
|
basically dictates how your physical machine address is mapped to your human
|
||||||
|
readable service names. I mainly use this domain for web services together
|
||||||
|
self hosted email. As such I outlined the relevant records below that these
|
||||||
|
services require.
|
||||||
|
|
||||||
|
| Name | Description
|
||||||
|
| ----------------------------------------------- | -----------------------
|
||||||
|
| **A** Address record | physical IPv4 address associated with this domain
|
||||||
|
| **CNAME** Canonical name record | Alias name for A record name. This is generally for subdomains (i.e. other.domain.xyz as alias for domain.xyz both served the same machine)
|
||||||
|
| **CAA** Certification Authority Authorization | DNS Certification Authority Authorization, constraining acceptable CAs for a host/domain.
|
||||||
|
| **DS** Delegation signer | The record used to identify the DNSSEC signing key of a delegated zone
|
||||||
|
| **MX** Mail exchange record | Maps a domain name to a list of message transfer agents for that domain
|
||||||
|
| **TXT** Text record | Carries machine-readable data, such as specified by RFC 1464, opportunistic encryption, Sender Policy Framework, DKIM, DMARC, DNS-SD, etc.
|
||||||
|
|
||||||
|
The essential records for web services are the A and CNAME records which enable
|
||||||
|
correct name look up when outside you private network. Nowadays SSL should be
|
||||||
|
part and so specifying which certification authority you use should be set in
|
||||||
|
the CAA record. Most likely this will be `letsencrypt.org` which pretty much
|
||||||
|
provides SSL certificate signing free of charge securing your traffic to some
|
||||||
|
extent. In combination there should be a DS record here that presents your
|
||||||
|
public signing key used by your machine's SSL setup and allows you to
|
||||||
|
setup DNSSEC on your domain.
|
||||||
|
|
||||||
|
The other records are required for secure email transfer. First you need the
|
||||||
|
equivalent of a name record, the MX record which should point to another A
|
||||||
|
record and may or may not the same machine / physical address as the domain
|
||||||
|
hosting your web-services. Signing your email is similar to SSL encryption
|
||||||
|
should be an essential part of your setup. A SMTP set-up with postfix
|
||||||
|
can do so by using [openDKIM](http://www.opendkim.org/). This will require
|
||||||
|
you to similarly provide your public signing key as a TXT record.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
"v=DKIM1;k=rsa;p=${key_part1}"
|
||||||
|
"${key_part2}"
|
||||||
|
```
|
||||||
|
|
||||||
|
The TXT record will look something like the above statement. There are some
|
||||||
|
inconveniences unfortunately when using RSA in combination with a high entropy
|
||||||
|
which yields a long public key. You need to break this key up into multiple
|
||||||
|
strings which the `openkdim` tool may or may not do by default as there is a
|
||||||
|
maximum character length for each TXT entry element. As long as no semi-colons
|
||||||
|
are inserted this should just work as expected.
|
||||||
|
100
content/posts/python-urllib.md
Normal file
100
content/posts/python-urllib.md
Normal file
@ -0,0 +1,100 @@
|
|||||||
|
---
|
||||||
|
title: "Python Urllib ⬇📜"
|
||||||
|
date: 2021-10-26T20:02:07+02:00
|
||||||
|
draft: false
|
||||||
|
toc: true
|
||||||
|
tags:
|
||||||
|
- python
|
||||||
|
- scraping
|
||||||
|
- code
|
||||||
|
---
|
||||||
|
|
||||||
|
I had to pull some meta data from a media data base and since this tends to
|
||||||
|
be a go to setup when I use urllib with python. I thought I would make a quick
|
||||||
|
note regarding cookies and making POST/GET requests accordingly.
|
||||||
|
|
||||||
|
## Setting up a HTTP session
|
||||||
|
|
||||||
|
The urllib python library allows you to get global session parameters directly
|
||||||
|
by calling the `build_opener` and `install_opener` methods accordingly. Usually
|
||||||
|
if you make HTTP requests with empty headers or little to no session data
|
||||||
|
any script will tend to be blocked when robots are not welcome so while setting
|
||||||
|
these parameters mitigates such an issue it is advised to be a responsible
|
||||||
|
end-user.
|
||||||
|
|
||||||
|
```python
|
||||||
|
mycookies = http.cookiejar.MozillaCookieJar()
|
||||||
|
mycookies.load("cookies.txt")
|
||||||
|
opener = urllib.request.build_opener(
|
||||||
|
urllib.request.HTTPCookieProcessor(mycookies)
|
||||||
|
)
|
||||||
|
opener.addheaders = [
|
||||||
|
(
|
||||||
|
"User-agent",
|
||||||
|
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36"
|
||||||
|
+ "(KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
|
||||||
|
),
|
||||||
|
(
|
||||||
|
"Accept",
|
||||||
|
"text/html,application/xhtml+xml,application/xml;q=0.9,"
|
||||||
|
+ "image/avif,image/webp,image/apng,*/*;q=0.8,"
|
||||||
|
+ "application/signed-exchange;v=b3;q=0.9",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
urllib.request.install_opener(opener)
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code snippet sets a user agent and what kind of data the session
|
||||||
|
is willing to accept. This is generic and simply taken from one of my own
|
||||||
|
browser sessions. Additionally I load in `cookies.txt` which are the session
|
||||||
|
cookies that I exported to a file for a given domain from my browser.
|
||||||
|
|
||||||
|
## HTTP POST request
|
||||||
|
|
||||||
|
Web based APIs will have various methods for interacting but POST requests with
|
||||||
|
JSON type input/output and occasionally XML but given python's native support
|
||||||
|
for JSON this is generally the way to do things.
|
||||||
|
|
||||||
|
``` python
|
||||||
|
url = f"{host_name}/api.php"
|
||||||
|
data = json.dumps(post_data).encode()
|
||||||
|
req = urllib.request.Request(url, data=data)
|
||||||
|
meta = urllib.request.urlopen(req)
|
||||||
|
return json.loads(meta.read())
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code snippet prepares a `req` object for particular `host_name` and
|
||||||
|
`post_data` which is a dictionary that is encoded to a JSON string. Calling
|
||||||
|
urlopen on this request will perform a POST request accordingly where if
|
||||||
|
all works as expected should return a JSON string that is mapped to a python
|
||||||
|
collection.
|
||||||
|
|
||||||
|
In the scenario where the data is returned as an XML string / document, there
|
||||||
|
is a `xmltodict` python library that will return a python collection. The
|
||||||
|
downside here is the xml has quite a deep hierarchy that is difficult to
|
||||||
|
appreciate unless the we get into large xml data structures that can be queried.
|
||||||
|
For reference the xml parsing will look something like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
xmltodict.parse(meta.read())
|
||||||
|
```
|
||||||
|
|
||||||
|
## HTTP GET request with BeautifulSoup
|
||||||
|
|
||||||
|
Performing GET requests is usually much much most simply since you just need
|
||||||
|
to determine the appropriate url. Here I included an example where the
|
||||||
|
`BeautifulSoup` python library is used to container the HTTP response and
|
||||||
|
search through any links within the response that march a regular expression.
|
||||||
|
|
||||||
|
```python
|
||||||
|
query_url = f"{host_name}/?f_search={tag_name}"
|
||||||
|
resp_data = urllib.request.urlopen(query_url)
|
||||||
|
resp_soup = BeautifulSoup(resp_data)
|
||||||
|
return [ link["href"]
|
||||||
|
for link in resp_soup.find_all("a", href=True)
|
||||||
|
if re.match( f"{host_name}/g/([0-9a-z]+)/([0-9a-z]+)", link["href"] )
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
This is probably the most common use case for the `BeautifulSoup` library and
|
||||||
|
it is very effective instead of sifting through any html data.
|
Loading…
x
Reference in New Issue
Block a user