content update oct 2021

2025-01-23 03:52:21 +01:00 · 2021-10-27 00:27:09 +02:00 · 2021-10-27 00:27:09 +02:00 · b5d1242dc5
commit b5d1242dc5
parent cc12fcc3d8
3 changed files with 154 additions and 2 deletions
--- a/content/about.md
+++ b/content/about.md
@ -13,6 +13,10 @@ This site shares a bit of informal documentation and more blog-based record
 keeping. Providing commentary on design decisions should be just as useful as
 some of the technical documentation however included in my repositories.

+### Contact
+
+You can reach me at `lieuwe at leene dot dev`. 
+
 ## My Setup

 I mainly use RHEL flavours of linux having both CentOS and Fedora machines. Most
--- a/content/posts/domain-setup.md
+++ b/content/posts/domain-setup.md
@ -1,6 +1,54 @@
 ---
-title: "Domain Setup"
+title: "Domain Setup ☄💻"
 date: 2021-09-19T17:14:03+02:00
-draft: true
+draft: false
 ---

+
+
+## DNS Records
+
+The main part of setting up a domain is configuring your
+[DNS Records](https://en.wikipedia.org/wiki/List_of_DNS_record_types). This
+basically dictates how your physical machine address is mapped to your human
+readable service names. I mainly use this domain for web services together
+self hosted email. As such I outlined the relevant records below that these
+services require.
+
+| Name                                            | Description
+| ----------------------------------------------- | -----------------------  
+| **A**     Address record                        | physical IPv4 address associated with this domain
+| **CNAME** Canonical name record                 | Alias name for A record name. This is generally for subdomains (i.e. other.domain.xyz as alias for domain.xyz both served the same machine)
+| **CAA**   Certification Authority Authorization | DNS Certification Authority Authorization, constraining acceptable CAs for a host/domain.
+| **DS**    Delegation signer                     | The record used to identify the DNSSEC signing key of a delegated zone
+| **MX**    Mail exchange record                  | Maps a domain name to a list of message transfer agents for that domain
+| **TXT**   Text record                           | Carries machine-readable data, such as specified by RFC 1464, opportunistic encryption, Sender Policy Framework, DKIM, DMARC, DNS-SD, etc.
+
+The essential records for web services are the A and CNAME records which enable
+correct name look up when outside you private network. Nowadays SSL should be
+part and so specifying which certification authority you use should be set in
+the CAA record. Most likely this will be `letsencrypt.org` which pretty much
+provides SSL certificate signing free of charge securing your traffic to some
+extent. In combination there should be a DS record here that presents your
+public signing key used by your machine's SSL setup and allows you to
+setup DNSSEC on your domain.
+
+The other records are required for secure email transfer. First you need the
+equivalent of a name record, the MX record which should point to another A
+record and may or may not the same machine / physical address as the domain
+hosting your web-services. Signing your email is similar to SSL encryption
+should be an essential part of your setup. A SMTP set-up with postfix
+can do so by using [openDKIM](http://www.opendkim.org/). This will require
+you to similarly provide your public signing key as a TXT record.
+
+```bash
+"v=DKIM1;k=rsa;p=${key_part1}"
+"${key_part2}"
+```
+
+The TXT record will look something like the above statement. There are some
+inconveniences unfortunately when using RSA in combination with a high entropy
+which yields a long public key. You need to break this key up into multiple
+strings which the `openkdim` tool may or may not do by default as there is a
+maximum character length for each TXT entry element. As long as no semi-colons
+are inserted this should just work as expected.
--- a/content/posts/python-urllib.md
+++ b/content/posts/python-urllib.md
@ -0,0 +1,100 @@
+---
+title: "Python Urllib ⬇📜"
+date: 2021-10-26T20:02:07+02:00
+draft: false
+toc: true
+tags:
+  - python
+  - scraping
+  - code
+---
+
+I had to pull some meta data from a media data base and since this tends to
+be a go to setup when I use urllib with python. I thought I would make a quick
+note regarding cookies and making POST/GET requests accordingly.
+
+## Setting up a HTTP session
+
+The urllib python library allows you to get global session parameters directly
+by calling the `build_opener` and `install_opener` methods accordingly. Usually
+if you make HTTP requests with empty headers or little to no session data
+any script will tend to be blocked when robots are not welcome so while setting
+these parameters mitigates such an issue it is advised to be a responsible
+end-user.
+
+```python
+mycookies = http.cookiejar.MozillaCookieJar()
+mycookies.load("cookies.txt")
+opener = urllib.request.build_opener(
+urllib.request.HTTPCookieProcessor(mycookies)
+)
+opener.addheaders = [
+(
+    "User-agent",
+    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36"
+    + "(KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+),
+(
+    "Accept",
+    "text/html,application/xhtml+xml,application/xml;q=0.9,"
+    + "image/avif,image/webp,image/apng,*/*;q=0.8,"
+    + "application/signed-exchange;v=b3;q=0.9",
+),
+]
+urllib.request.install_opener(opener)
+```
+
+The above code snippet sets a user agent and what kind of data the session
+is willing to accept. This is generic and simply taken from one of my own
+browser sessions. Additionally I load in `cookies.txt` which are the session
+cookies that I exported to a file for a given domain from my browser.
+
+## HTTP POST request
+
+Web based APIs will have various methods for interacting but POST requests with
+JSON type input/output and occasionally XML but given python's native support
+for JSON this is generally the way to do things.
+
+``` python
+url = f"{host_name}/api.php"
+data = json.dumps(post_data).encode()
+req = urllib.request.Request(url, data=data)
+meta = urllib.request.urlopen(req)
+return json.loads(meta.read())
+```
+
+The above code snippet prepares a `req` object for particular `host_name` and
+`post_data` which is a dictionary that is encoded to a JSON string. Calling
+urlopen on this request will perform a POST request accordingly where if
+all works as expected should return a JSON string that is mapped to a python
+collection.
+
+In the scenario where the data is returned as an XML string / document, there
+is a `xmltodict` python library that will return a python collection. The
+downside here is the xml has quite a deep hierarchy that is difficult to
+appreciate unless the we get into large xml data structures that can be queried.
+For reference the xml parsing will look something like this:
+
+```python
+xmltodict.parse(meta.read())
+```
+
+## HTTP GET request with BeautifulSoup
+
+Performing GET requests is usually much much most simply since you just need
+to determine the appropriate url. Here I included an example where the
+`BeautifulSoup` python library is used to container the HTTP response and
+search through any links within the response that march a regular expression.
+
+```python
+query_url = f"{host_name}/?f_search={tag_name}"
+resp_data = urllib.request.urlopen(query_url)
+resp_soup = BeautifulSoup(resp_data)
+return [ link["href"]
+    for link in resp_soup.find_all("a", href=True)
+    if re.match( f"{host_name}/g/([0-9a-z]+)/([0-9a-z]+)", link["href"] )
+]
+```
+
+This is probably the most common use case for the `BeautifulSoup` library and
+it is very effective instead of sifting through any html data.