mirror of
https://github.com/lleene/hugo-site.git
synced 2025-01-23 03:52:21 +01:00
content update oct 2021
This commit is contained in:
parent
cc12fcc3d8
commit
b5d1242dc5
@ -13,6 +13,10 @@ This site shares a bit of informal documentation and more blog-based record
|
||||
keeping. Providing commentary on design decisions should be just as useful as
|
||||
some of the technical documentation however included in my repositories.
|
||||
|
||||
### Contact
|
||||
|
||||
You can reach me at `lieuwe at leene dot dev`.
|
||||
|
||||
## My Setup
|
||||
|
||||
I mainly use RHEL flavours of linux having both CentOS and Fedora machines. Most
|
||||
|
@ -1,6 +1,54 @@
|
||||
---
|
||||
title: "Domain Setup"
|
||||
title: "Domain Setup ☄💻"
|
||||
date: 2021-09-19T17:14:03+02:00
|
||||
draft: true
|
||||
draft: false
|
||||
---
|
||||
|
||||
|
||||
|
||||
## DNS Records
|
||||
|
||||
The main part of setting up a domain is configuring your
|
||||
[DNS Records](https://en.wikipedia.org/wiki/List_of_DNS_record_types). This
|
||||
basically dictates how your physical machine address is mapped to your human
|
||||
readable service names. I mainly use this domain for web services together
|
||||
self hosted email. As such I outlined the relevant records below that these
|
||||
services require.
|
||||
|
||||
| Name | Description
|
||||
| ----------------------------------------------- | -----------------------
|
||||
| **A** Address record | physical IPv4 address associated with this domain
|
||||
| **CNAME** Canonical name record | Alias name for A record name. This is generally for subdomains (i.e. other.domain.xyz as alias for domain.xyz both served the same machine)
|
||||
| **CAA** Certification Authority Authorization | DNS Certification Authority Authorization, constraining acceptable CAs for a host/domain.
|
||||
| **DS** Delegation signer | The record used to identify the DNSSEC signing key of a delegated zone
|
||||
| **MX** Mail exchange record | Maps a domain name to a list of message transfer agents for that domain
|
||||
| **TXT** Text record | Carries machine-readable data, such as specified by RFC 1464, opportunistic encryption, Sender Policy Framework, DKIM, DMARC, DNS-SD, etc.
|
||||
|
||||
The essential records for web services are the A and CNAME records which enable
|
||||
correct name look up when outside you private network. Nowadays SSL should be
|
||||
part and so specifying which certification authority you use should be set in
|
||||
the CAA record. Most likely this will be `letsencrypt.org` which pretty much
|
||||
provides SSL certificate signing free of charge securing your traffic to some
|
||||
extent. In combination there should be a DS record here that presents your
|
||||
public signing key used by your machine's SSL setup and allows you to
|
||||
setup DNSSEC on your domain.
|
||||
|
||||
The other records are required for secure email transfer. First you need the
|
||||
equivalent of a name record, the MX record which should point to another A
|
||||
record and may or may not the same machine / physical address as the domain
|
||||
hosting your web-services. Signing your email is similar to SSL encryption
|
||||
should be an essential part of your setup. A SMTP set-up with postfix
|
||||
can do so by using [openDKIM](http://www.opendkim.org/). This will require
|
||||
you to similarly provide your public signing key as a TXT record.
|
||||
|
||||
```bash
|
||||
"v=DKIM1;k=rsa;p=${key_part1}"
|
||||
"${key_part2}"
|
||||
```
|
||||
|
||||
The TXT record will look something like the above statement. There are some
|
||||
inconveniences unfortunately when using RSA in combination with a high entropy
|
||||
which yields a long public key. You need to break this key up into multiple
|
||||
strings which the `openkdim` tool may or may not do by default as there is a
|
||||
maximum character length for each TXT entry element. As long as no semi-colons
|
||||
are inserted this should just work as expected.
|
||||
|
100
content/posts/python-urllib.md
Normal file
100
content/posts/python-urllib.md
Normal file
@ -0,0 +1,100 @@
|
||||
---
|
||||
title: "Python Urllib ⬇📜"
|
||||
date: 2021-10-26T20:02:07+02:00
|
||||
draft: false
|
||||
toc: true
|
||||
tags:
|
||||
- python
|
||||
- scraping
|
||||
- code
|
||||
---
|
||||
|
||||
I had to pull some meta data from a media data base and since this tends to
|
||||
be a go to setup when I use urllib with python. I thought I would make a quick
|
||||
note regarding cookies and making POST/GET requests accordingly.
|
||||
|
||||
## Setting up a HTTP session
|
||||
|
||||
The urllib python library allows you to get global session parameters directly
|
||||
by calling the `build_opener` and `install_opener` methods accordingly. Usually
|
||||
if you make HTTP requests with empty headers or little to no session data
|
||||
any script will tend to be blocked when robots are not welcome so while setting
|
||||
these parameters mitigates such an issue it is advised to be a responsible
|
||||
end-user.
|
||||
|
||||
```python
|
||||
mycookies = http.cookiejar.MozillaCookieJar()
|
||||
mycookies.load("cookies.txt")
|
||||
opener = urllib.request.build_opener(
|
||||
urllib.request.HTTPCookieProcessor(mycookies)
|
||||
)
|
||||
opener.addheaders = [
|
||||
(
|
||||
"User-agent",
|
||||
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36"
|
||||
+ "(KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
|
||||
),
|
||||
(
|
||||
"Accept",
|
||||
"text/html,application/xhtml+xml,application/xml;q=0.9,"
|
||||
+ "image/avif,image/webp,image/apng,*/*;q=0.8,"
|
||||
+ "application/signed-exchange;v=b3;q=0.9",
|
||||
),
|
||||
]
|
||||
urllib.request.install_opener(opener)
|
||||
```
|
||||
|
||||
The above code snippet sets a user agent and what kind of data the session
|
||||
is willing to accept. This is generic and simply taken from one of my own
|
||||
browser sessions. Additionally I load in `cookies.txt` which are the session
|
||||
cookies that I exported to a file for a given domain from my browser.
|
||||
|
||||
## HTTP POST request
|
||||
|
||||
Web based APIs will have various methods for interacting but POST requests with
|
||||
JSON type input/output and occasionally XML but given python's native support
|
||||
for JSON this is generally the way to do things.
|
||||
|
||||
``` python
|
||||
url = f"{host_name}/api.php"
|
||||
data = json.dumps(post_data).encode()
|
||||
req = urllib.request.Request(url, data=data)
|
||||
meta = urllib.request.urlopen(req)
|
||||
return json.loads(meta.read())
|
||||
```
|
||||
|
||||
The above code snippet prepares a `req` object for particular `host_name` and
|
||||
`post_data` which is a dictionary that is encoded to a JSON string. Calling
|
||||
urlopen on this request will perform a POST request accordingly where if
|
||||
all works as expected should return a JSON string that is mapped to a python
|
||||
collection.
|
||||
|
||||
In the scenario where the data is returned as an XML string / document, there
|
||||
is a `xmltodict` python library that will return a python collection. The
|
||||
downside here is the xml has quite a deep hierarchy that is difficult to
|
||||
appreciate unless the we get into large xml data structures that can be queried.
|
||||
For reference the xml parsing will look something like this:
|
||||
|
||||
```python
|
||||
xmltodict.parse(meta.read())
|
||||
```
|
||||
|
||||
## HTTP GET request with BeautifulSoup
|
||||
|
||||
Performing GET requests is usually much much most simply since you just need
|
||||
to determine the appropriate url. Here I included an example where the
|
||||
`BeautifulSoup` python library is used to container the HTTP response and
|
||||
search through any links within the response that march a regular expression.
|
||||
|
||||
```python
|
||||
query_url = f"{host_name}/?f_search={tag_name}"
|
||||
resp_data = urllib.request.urlopen(query_url)
|
||||
resp_soup = BeautifulSoup(resp_data)
|
||||
return [ link["href"]
|
||||
for link in resp_soup.find_all("a", href=True)
|
||||
if re.match( f"{host_name}/g/([0-9a-z]+)/([0-9a-z]+)", link["href"] )
|
||||
]
|
||||
```
|
||||
|
||||
This is probably the most common use case for the `BeautifulSoup` library and
|
||||
it is very effective instead of sifting through any html data.
|
Loading…
x
Reference in New Issue
Block a user