content update oct 2021

2025-12-16 08:29:19 +01:00 · 2021-10-27 00:27:09 +02:00
parent cc12fcc3d8
commit b5d1242dc5
3 changed files with 154 additions and 2 deletions
--- a/content/about.md
+++ b/content/about.md
@ -13,6 +13,10 @@ This site shares a bit of informal documentation and more blog-based record
 keeping. Providing commentary on design decisions should be just as useful as
 some of the technical documentation however included in my repositories.
 ### Contact
 You can reach me at `lieuwe at leene dot dev`. 
 ## My Setup
 I mainly use RHEL flavours of linux having both CentOS and Fedora machines. Most
--- a/content/posts/domain-setup.md
+++ b/content/posts/domain-setup.md
@ -1,6 +1,54 @@
 ---
-title: "Domain Setup"
+title: "Domain Setup ☄💻"
 date: 2021-09-19T17:14:03+02:00
-draft: true
+draft: false
 ---
 ## DNS Records
 The main part of setting up a domain is configuring your
 [DNS Records](https://en.wikipedia.org/wiki/List_of_DNS_record_types). This
 basically dictates how your physical machine address is mapped to your human
 readable service names. I mainly use this domain for web services together
 self hosted email. As such I outlined the relevant records below that these
 services require.
 | Name                                            | Description
 | ----------------------------------------------- | -----------------------  
 | **A**     Address record                        | physical IPv4 address associated with this domain
 | **CNAME** Canonical name record                 | Alias name for A record name. This is generally for subdomains (i.e. other.domain.xyz as alias for domain.xyz both served the same machine)
 | **CAA**   Certification Authority Authorization | DNS Certification Authority Authorization, constraining acceptable CAs for a host/domain.
 | **DS**    Delegation signer                     | The record used to identify the DNSSEC signing key of a delegated zone
 | **MX**    Mail exchange record                  | Maps a domain name to a list of message transfer agents for that domain
 | **TXT**   Text record                           | Carries machine-readable data, such as specified by RFC 1464, opportunistic encryption, Sender Policy Framework, DKIM, DMARC, DNS-SD, etc.
 The essential records for web services are the A and CNAME records which enable
 correct name look up when outside you private network. Nowadays SSL should be
 part and so specifying which certification authority you use should be set in
 the CAA record. Most likely this will be `letsencrypt.org` which pretty much
 provides SSL certificate signing free of charge securing your traffic to some
 extent. In combination there should be a DS record here that presents your
 public signing key used by your machine's SSL setup and allows you to
 setup DNSSEC on your domain.
 The other records are required for secure email transfer. First you need the
 equivalent of a name record, the MX record which should point to another A
 record and may or may not the same machine / physical address as the domain
 hosting your web-services. Signing your email is similar to SSL encryption
 should be an essential part of your setup. A SMTP set-up with postfix
 can do so by using [openDKIM](http://www.opendkim.org/). This will require
 you to similarly provide your public signing key as a TXT record.
 ```bash
 "v=DKIM1;k=rsa;p=${key_part1}"
 "${key_part2}"
 ```
 The TXT record will look something like the above statement. There are some
 inconveniences unfortunately when using RSA in combination with a high entropy
 which yields a long public key. You need to break this key up into multiple
 strings which the `openkdim` tool may or may not do by default as there is a
 maximum character length for each TXT entry element. As long as no semi-colons
 are inserted this should just work as expected.
--- a/content/posts/python-urllib.md
+++ b/content/posts/python-urllib.md
@ -0,0 +1,100 @@
 ---
 title: "Python Urllib ⬇📜"
 date: 2021-10-26T20:02:07+02:00
 draft: false
 toc: true
 tags:
  - python
  - scraping
  - code
 ---
 I had to pull some meta data from a media data base and since this tends to
 be a go to setup when I use urllib with python. I thought I would make a quick
 note regarding cookies and making POST/GET requests accordingly.
 ## Setting up a HTTP session
 The urllib python library allows you to get global session parameters directly
 by calling the `build_opener` and `install_opener` methods accordingly. Usually
 if you make HTTP requests with empty headers or little to no session data
 any script will tend to be blocked when robots are not welcome so while setting
 these parameters mitigates such an issue it is advised to be a responsible
 end-user.
 ```python
 mycookies = http.cookiejar.MozillaCookieJar()
 mycookies.load("cookies.txt")
 opener = urllib.request.build_opener(
 urllib.request.HTTPCookieProcessor(mycookies)
 )
 opener.addheaders = [
 (
    "User-agent",
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36"
    + "(KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
 ),
 (
    "Accept",
    "text/html,application/xhtml+xml,application/xml;q=0.9,"
    + "image/avif,image/webp,image/apng,*/*;q=0.8,"
    + "application/signed-exchange;v=b3;q=0.9",
 ),
 ]
 urllib.request.install_opener(opener)
 ```
 The above code snippet sets a user agent and what kind of data the session
 is willing to accept. This is generic and simply taken from one of my own
 browser sessions. Additionally I load in `cookies.txt` which are the session
 cookies that I exported to a file for a given domain from my browser.
 ## HTTP POST request
 Web based APIs will have various methods for interacting but POST requests with
 JSON type input/output and occasionally XML but given python's native support
 for JSON this is generally the way to do things.
 ``` python
 url = f"{host_name}/api.php"
 data = json.dumps(post_data).encode()
 req = urllib.request.Request(url, data=data)
 meta = urllib.request.urlopen(req)
 return json.loads(meta.read())
 ```
 The above code snippet prepares a `req` object for particular `host_name` and
 `post_data` which is a dictionary that is encoded to a JSON string. Calling
 urlopen on this request will perform a POST request accordingly where if
 all works as expected should return a JSON string that is mapped to a python
 collection.
 In the scenario where the data is returned as an XML string / document, there
 is a `xmltodict` python library that will return a python collection. The
 downside here is the xml has quite a deep hierarchy that is difficult to
 appreciate unless the we get into large xml data structures that can be queried.
 For reference the xml parsing will look something like this:
 ```python
 xmltodict.parse(meta.read())
 ```
 ## HTTP GET request with BeautifulSoup
 Performing GET requests is usually much much most simply since you just need
 to determine the appropriate url. Here I included an example where the
 `BeautifulSoup` python library is used to container the HTTP response and
 search through any links within the response that march a regular expression.
 ```python
 query_url = f"{host_name}/?f_search={tag_name}"
 resp_data = urllib.request.urlopen(query_url)
 resp_soup = BeautifulSoup(resp_data)
 return [ link["href"]
    for link in resp_soup.find_all("a", href=True)
    if re.match( f"{host_name}/g/([0-9a-z]+)/([0-9a-z]+)", link["href"] )
 ]
 ```
 This is probably the most common use case for the `BeautifulSoup` library and
 it is very effective instead of sifting through any html data.