content update oct 2021

This commit is contained in:
Lieuwe Leene 2021-10-27 00:27:09 +02:00
parent cc12fcc3d8
commit b5d1242dc5
No known key found for this signature in database
GPG Key ID: FD6DB59EC3B879CD
3 changed files with 154 additions and 2 deletions

View File

@ -13,6 +13,10 @@ This site shares a bit of informal documentation and more blog-based record
keeping. Providing commentary on design decisions should be just as useful as
some of the technical documentation however included in my repositories.
### Contact
You can reach me at `lieuwe at leene dot dev`.
## My Setup
I mainly use RHEL flavours of linux having both CentOS and Fedora machines. Most

View File

@ -1,6 +1,54 @@
---
title: "Domain Setup"
title: "Domain Setup ☄💻"
date: 2021-09-19T17:14:03+02:00
draft: true
draft: false
---
## DNS Records
The main part of setting up a domain is configuring your
[DNS Records](https://en.wikipedia.org/wiki/List_of_DNS_record_types). This
basically dictates how your physical machine address is mapped to your human
readable service names. I mainly use this domain for web services together
self hosted email. As such I outlined the relevant records below that these
services require.
| Name | Description
| ----------------------------------------------- | -----------------------
| **A** Address record | physical IPv4 address associated with this domain
| **CNAME** Canonical name record | Alias name for A record name. This is generally for subdomains (i.e. other.domain.xyz as alias for domain.xyz both served the same machine)
| **CAA** Certification Authority Authorization | DNS Certification Authority Authorization, constraining acceptable CAs for a host/domain.
| **DS** Delegation signer | The record used to identify the DNSSEC signing key of a delegated zone
| **MX** Mail exchange record | Maps a domain name to a list of message transfer agents for that domain
| **TXT** Text record | Carries machine-readable data, such as specified by RFC 1464, opportunistic encryption, Sender Policy Framework, DKIM, DMARC, DNS-SD, etc.
The essential records for web services are the A and CNAME records which enable
correct name look up when outside you private network. Nowadays SSL should be
part and so specifying which certification authority you use should be set in
the CAA record. Most likely this will be `letsencrypt.org` which pretty much
provides SSL certificate signing free of charge securing your traffic to some
extent. In combination there should be a DS record here that presents your
public signing key used by your machine's SSL setup and allows you to
setup DNSSEC on your domain.
The other records are required for secure email transfer. First you need the
equivalent of a name record, the MX record which should point to another A
record and may or may not the same machine / physical address as the domain
hosting your web-services. Signing your email is similar to SSL encryption
should be an essential part of your setup. A SMTP set-up with postfix
can do so by using [openDKIM](http://www.opendkim.org/). This will require
you to similarly provide your public signing key as a TXT record.
```bash
"v=DKIM1;k=rsa;p=${key_part1}"
"${key_part2}"
```
The TXT record will look something like the above statement. There are some
inconveniences unfortunately when using RSA in combination with a high entropy
which yields a long public key. You need to break this key up into multiple
strings which the `openkdim` tool may or may not do by default as there is a
maximum character length for each TXT entry element. As long as no semi-colons
are inserted this should just work as expected.

View File

@ -0,0 +1,100 @@
---
title: "Python Urllib ⬇📜"
date: 2021-10-26T20:02:07+02:00
draft: false
toc: true
tags:
- python
- scraping
- code
---
I had to pull some meta data from a media data base and since this tends to
be a go to setup when I use urllib with python. I thought I would make a quick
note regarding cookies and making POST/GET requests accordingly.
## Setting up a HTTP session
The urllib python library allows you to get global session parameters directly
by calling the `build_opener` and `install_opener` methods accordingly. Usually
if you make HTTP requests with empty headers or little to no session data
any script will tend to be blocked when robots are not welcome so while setting
these parameters mitigates such an issue it is advised to be a responsible
end-user.
```python
mycookies = http.cookiejar.MozillaCookieJar()
mycookies.load("cookies.txt")
opener = urllib.request.build_opener(
urllib.request.HTTPCookieProcessor(mycookies)
)
opener.addheaders = [
(
"User-agent",
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36"
+ "(KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
),
(
"Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,"
+ "image/avif,image/webp,image/apng,*/*;q=0.8,"
+ "application/signed-exchange;v=b3;q=0.9",
),
]
urllib.request.install_opener(opener)
```
The above code snippet sets a user agent and what kind of data the session
is willing to accept. This is generic and simply taken from one of my own
browser sessions. Additionally I load in `cookies.txt` which are the session
cookies that I exported to a file for a given domain from my browser.
## HTTP POST request
Web based APIs will have various methods for interacting but POST requests with
JSON type input/output and occasionally XML but given python's native support
for JSON this is generally the way to do things.
``` python
url = f"{host_name}/api.php"
data = json.dumps(post_data).encode()
req = urllib.request.Request(url, data=data)
meta = urllib.request.urlopen(req)
return json.loads(meta.read())
```
The above code snippet prepares a `req` object for particular `host_name` and
`post_data` which is a dictionary that is encoded to a JSON string. Calling
urlopen on this request will perform a POST request accordingly where if
all works as expected should return a JSON string that is mapped to a python
collection.
In the scenario where the data is returned as an XML string / document, there
is a `xmltodict` python library that will return a python collection. The
downside here is the xml has quite a deep hierarchy that is difficult to
appreciate unless the we get into large xml data structures that can be queried.
For reference the xml parsing will look something like this:
```python
xmltodict.parse(meta.read())
```
## HTTP GET request with BeautifulSoup
Performing GET requests is usually much much most simply since you just need
to determine the appropriate url. Here I included an example where the
`BeautifulSoup` python library is used to container the HTTP response and
search through any links within the response that march a regular expression.
```python
query_url = f"{host_name}/?f_search={tag_name}"
resp_data = urllib.request.urlopen(query_url)
resp_soup = BeautifulSoup(resp_data)
return [ link["href"]
for link in resp_soup.find_all("a", href=True)
if re.match( f"{host_name}/g/([0-9a-z]+)/([0-9a-z]+)", link["href"] )
]
```
This is probably the most common use case for the `BeautifulSoup` library and
it is very effective instead of sifting through any html data.