forked from mirror/fake-firehose
Updated documentation
This commit is contained in:
parent
046839eba5
commit
e4f1ac5688
|
@ -1,12 +1,12 @@
|
|||
fakeRelayKey="YOUR--FAKE---RELAY---KEY"
|
||||
fakeRelayHost="https://your-fake-relay-url.YourPetMastodon.com"
|
||||
|
||||
## Set to false if you don't want to send URIs to your fakerelay.
|
||||
runFirehose=false
|
||||
## Set to false if you don't want to send URIs to your fakerelay. Generally this is only used for debugging
|
||||
runFirehose=true
|
||||
|
||||
## Maximum number of curl instances to be allowed to run. This is only used
|
||||
## if you send data to the relay
|
||||
maxCurls=75
|
||||
maxCurls=500
|
||||
|
||||
## Minimum number of posts to queue up before sending the to the relay.
|
||||
## This is more useful when you are streaming federated timelines from larger instances
|
||||
|
@ -20,6 +20,8 @@ minURIs=100
|
|||
## Archive mode will save the json stream but not parse it, not even into URIs
|
||||
## This will greatly save resources, but obviously will not send it to
|
||||
## the relay.
|
||||
##
|
||||
## Generally only used for debugging or archiving instance streams
|
||||
archive=false
|
||||
|
||||
## Restart timeout
|
||||
|
|
272
readme.md
272
readme.md
|
@ -1,10 +1,9 @@
|
|||
# Fake Firehose
|
||||
This project generates the mythical "firehose" relay that small Mastodon instances look for,
|
||||
at least to get content.
|
||||
This project is basically a shell/bash/text frontend for [fakerelay](https://github.com/g3rv4/FakeRelay)
|
||||
|
||||
It's a little crazy.
|
||||
It allows instances to fill their federated timelines from other instances that have public timelines.
|
||||
|
||||
Find a better way to do it and issue a pull request, or just tell me where your new repo is :)
|
||||
You can find the fakefirehose author at [@raynor@raynor.haus](https://raynor.haus/@raynor)
|
||||
|
||||
## How to run it
|
||||
|
||||
|
@ -14,9 +13,9 @@ In the config folder there are three files
|
|||
- domains-local
|
||||
- hashtags
|
||||
|
||||
If you want the full on public feed from an instance, put it in the domains-federated file, one domain per line
|
||||
If you want the full on public feed from an instance, put it in the domains-federated file, one domain per line.
|
||||
|
||||
If you only want the local feed from an instance, put it on the domains-local file, one domain per line
|
||||
If you only want the local feed from an instance, put it on the domains-local file, one domain per line.
|
||||
|
||||
If you want to follow a hash tag you either either add a hashtag after an instance in `domains-federated` or `domains-local`
|
||||
|
||||
|
@ -26,14 +25,101 @@ stream from mastodon.social
|
|||
Another example: if in `domains-local` you put `infosec.exchange #hacker` a stream will open to watch for the hashtag #hacker on the _local_ stream from infosec.exchange
|
||||
|
||||
## Docker
|
||||
Build docker
|
||||
To run it in docker -- recommended
|
||||
|
||||
Run docker
|
||||
1. Make sure you have [docker installed](https://docs.docker.com/engine/install/).
|
||||
2. From your shell: create a directory, it is recommended that you give it a relevant name
|
||||
3. Go into that directory and use `git pull https://github.com/raynormast/fake-firehose.git`
|
||||
4. Go into the created directory: `cd fake-firehose`
|
||||
5. `docker build -t fakefirehose .`
|
||||
6. Edit your `docker-compose.yml` file as needed. **The biggest thing** is to watch the volumes. It is _highly_ recommended that you keep your data directory in the parent directory, and NOT the directory the git repo is in.
|
||||
7. Edit your `.env.production` file. The file is fairly well commented.
|
||||
8. Run `docker compose -f docker-compose.yml`
|
||||
|
||||
### The hashtags file
|
||||
If you put ANY hashtags in here a stream will be opened for _every_ host in the `domains-federated` and `domains-local` file.
|
||||
The entire thing should look something like:
|
||||
```
|
||||
cd ~
|
||||
mkdir MastodonFireHose
|
||||
cd MastodonFirehose
|
||||
git pull https://github.com/raynormast/fake-firehose.git
|
||||
cd fake-firehose
|
||||
docker build -t fakefirehose .
|
||||
# Edit your docker-compose and .env.production here
|
||||
sudo docker compose -f docker-compose.yml up -d
|
||||
```
|
||||
|
||||
# Configuration
|
||||
|
||||
## tl;dr
|
||||
Your `./config` folder has three sample files, after editing you should have the following three files:
|
||||
```
|
||||
domains-federated
|
||||
domains-local
|
||||
hashtags
|
||||
```
|
||||
|
||||
**In each file, comments begin with `##` not the tradional single `#`.**
|
||||
|
||||
The syntax is the same for the domains files:
|
||||
```
|
||||
## Follow full timeline
|
||||
mastodon.instance
|
||||
|
||||
## Follow these hashtags from the timeline
|
||||
mastodon.instance #mastodon #relay
|
||||
```
|
||||
|
||||
The files are well commented.
|
||||
|
||||
|
||||
## domains-federated file
|
||||
This file has the full federated feeds of any instances you want fed to fakerelay.
|
||||
|
||||
Each line of the file should have the domain name of an instance whose federated timeline you want to follow.
|
||||
I.e.,
|
||||
```
|
||||
raynor.haus
|
||||
infosec.exchange
|
||||
```
|
||||
|
||||
This can generate a LOT of posts if you choose a large instance.
|
||||
|
||||
For example, if you use `mastodon.social` or `mas.to` you can expect your server to fall behind. `mastodon.social` generates 50,000 - 200,000 posts on the federated timeline per day.
|
||||
|
||||
It is recommended that you only use this file to:
|
||||
- follow hashtags
|
||||
- follow instances with small federated timelines, with content you want in yours
|
||||
|
||||
#### domains-federated hashtags
|
||||
The one time to use the federated timeline is to catch most posts with a specific hashtag.
|
||||
|
||||
Every word after after an instance domain is a hashtag to relay.
|
||||
|
||||
Example:
|
||||
|
||||
`mastodon.social fediblock fediverse mastodev mastoadmin`
|
||||
|
||||
Will only return posts from the mastodon.social federated feed with hashtags of `#fediblock`, `#fediverse`,
|
||||
`#mastodev`, and `#mastoadmin`.
|
||||
|
||||
The `#` is optional -- it is accepted simply to make the file more intuitive.
|
||||
|
||||
## domains-local file
|
||||
This file is identical to the `domains-federated` file except that it only recieves posts created on
|
||||
_that_ instance (the local timeline).
|
||||
|
||||
It is possible to keep up with the larger instances, such as `mastodon.social` if you only look at the
|
||||
local timeline.
|
||||
|
||||
|
||||
## hashtags file
|
||||
If you put ANY hashtags in here a stream will be opened for _every_ host in the `domains-federated` and `domains-local` file.
|
||||
|
||||
**It's purpose is for people or instances that want to find nearly every post with a particular hashtag**
|
||||
|
||||
_It can very quickly open up a lot of `curl` streams_
|
||||
|
||||
### Example
|
||||
`domains-federated` content:
|
||||
|
||||
```
|
||||
|
@ -56,6 +142,10 @@ Mastodon
|
|||
|
||||
will result in the following streams all opening:
|
||||
```shell
|
||||
https://mastodon.social/api/v1/streaming/public
|
||||
https://mas.to/api/v1/streaming/public
|
||||
https://aus.social/api/v1/streaming/public/local
|
||||
https://mastodon.nz/api/v1/streaming/public/local
|
||||
https://mastodon.social/api/v1/streaming/hashtag?tag=JohnMastodon
|
||||
https://mas.to/api/v1/streaming/hashtag?tag=JohnMastodon
|
||||
https://aus.social/api/v1/streaming/hashtag?tag=JohnMastodon
|
||||
|
@ -67,6 +157,164 @@ https://mastodon.nz/api/v1/streaming/hashtag?tag=Mastodon
|
|||
```
|
||||
|
||||
If you had a total of 5 lines in `domains-federated` and `domains-local` plus 3 entries in `hashtags`
|
||||
there would 5x5x3 = 75 new streams.
|
||||
there would 5 x 5 x 3 = 75 new streams.
|
||||
|
||||
I mean, you can do it, but you won't need your central heating system any more.
|
||||
Usually a more targeted approach is better.
|
||||
|
||||
It is recommended that you put hashtags in your `domains-federated` or `domains-local` files.
|
||||
|
||||
Your humble author's federated file currently looks like this:
|
||||
```
|
||||
mastodon.social infosec hacker hackers osint hive lockbit hackgroup apt vicesociety
|
||||
|
||||
mastodon.social blackmastodon blackfediverse poc actuallyautistic neurodivergent blacklivesmatter freechina antiracist neurodiversity blackhistory bipoc aapi asian asianamerican pacificislander indigenous native
|
||||
|
||||
mastodon.social fediblock fediverse mastodev mastoadmin
|
||||
mastodon.social apple politics vegan trailrunning church churchillfellowship christianity christiannationalism
|
||||
```
|
||||
|
||||
My `domains-local` file is:
|
||||
```
|
||||
## Fake Firehose will only take local posts from these domains
|
||||
|
||||
mastodon.social
|
||||
universeodon.com
|
||||
|
||||
## International English (if you aren't from the US) ###
|
||||
## mastodon.scot
|
||||
aus.social
|
||||
mastodon.nz
|
||||
respublicae.eu
|
||||
mastodon.au
|
||||
|
||||
### Tech ###
|
||||
partyon.xyz
|
||||
infosec.exchange
|
||||
ioc.exchange
|
||||
tech.lgbt
|
||||
techhub.social
|
||||
fosstodon.org
|
||||
appdot.net
|
||||
social.linux.pizza
|
||||
|
||||
journa.host
|
||||
climatejustice.social
|
||||
```
|
||||
|
||||
This generates an acceptable stream of posts for my federated timeline. The tags I follow on mastodon.social
|
||||
are those that are either few in number overall, or are harder to find on local timelines.
|
||||
|
||||
## .env.production
|
||||
tl;dir, This file is fairly well commented internally, just go at it.
|
||||
|
||||
**The sample file probably does not need any changes beyond your fakerelay information**
|
||||
|
||||
### options
|
||||
#### fakeRelayKey
|
||||
This needs to have the key you generated with fakerelay.
|
||||
|
||||
_Example_:
|
||||
`fakeRelayKey="MrNtYH+GjwDtJtR6YCx2O4dfasdf2349QtZaVni0rsbDryETCx9lHSZmzcOAv3Y8+4LiD8bFUZbnyl4w=="`
|
||||
|
||||
|
||||
#### fakeRelayHost
|
||||
The full URL to your fakerelay
|
||||
|
||||
_Example_:
|
||||
fakeRelayHost="https://fr-relay-post.myinstance.social/index"
|
||||
|
||||
#### runFirehose
|
||||
This controls whether the posts will actually be sent to your relay, or only collected in your /data folder.
|
||||
You almost certainly want this set at:
|
||||
|
||||
`runFirehose=true`
|
||||
|
||||
The _only_ reason to set it to `false` is for debugging, or logging posts from the fediverse.
|
||||
|
||||
#### maxCurls and minURIs
|
||||
These two options are closely related. `maxCurls` is the maximum number of `curl` processes you want to have
|
||||
running on your system at once. If you follow timelines with a lot of posts, you may need to limit this.
|
||||
|
||||
**Note** This always needs to be higher that the total number of instances + hashtags you have configured, because each one of those is a separate `curl` process
|
||||
|
||||
fake-firehose batches posts to de-duplicate them, `minURIs` is the size of that batch. If you have a lot of
|
||||
_federated_ posts coming in you will want to set this to a high number because a lot of them will be duplicates.
|
||||
|
||||
If you only use local timelines it doesn't matter, you will not have any duplicates.
|
||||
|
||||
It is a tradeoff between resources (and `curl` processes running) and how quickly you want to fill your
|
||||
instance's federated timeline.
|
||||
|
||||
_Example for a moderate number of incoming posts_:
|
||||
```
|
||||
## Max curl processes have not gotten out of control so this is absurdely high.
|
||||
maxCurls=2000
|
||||
|
||||
## Nearly all of the timelines I follow are local, so there are very few duplicates.
|
||||
minURIs=10
|
||||
```
|
||||
|
||||
#### archive
|
||||
Archive mode will save the json stream but not parse it, not even into URIs.
|
||||
This will greatly save resources, but obviously will not send it to
|
||||
the relay.
|
||||
|
||||
**The only reasons to use this is for debugging or logging posts from servers**.
|
||||
|
||||
You almost certainly want this set at:
|
||||
|
||||
```archive=false```
|
||||
|
||||
#### restartTimeout
|
||||
This is how long the docker image will run before exiting. As long as your `docker-compose` has `restart: always` set this simply restarts the image to kill any hung `curl` processes.
|
||||
|
||||
The only reason to set it high is if you have a lot of timelines you follow. Each one takes time to open up,
|
||||
so if you restart often you will miss more posts.
|
||||
|
||||
_Example:_
|
||||
|
||||
`restartTimeout=4h`
|
||||
|
||||
#### streamDelay
|
||||
This is only for debugging.
|
||||
|
||||
Keep it at:
|
||||
|
||||
`streamDelay="0.1s"`
|
||||
|
||||
# Data Directory
|
||||
Data is saved in the format of:
|
||||
```
|
||||
"%Y%m%d".uris.txt
|
||||
```
|
||||
|
||||
In archive mode the format is:
|
||||
```
|
||||
"/data/"%Y%m%d"/"%Y%m%d".$host.json"
|
||||
```
|
||||
|
||||
For example, if you set `archive=true` and had `mastodon.social` in your `domains-federated` or `domains-local` config, on January 1st, 2023 the json stream would be saved at
|
||||
```
|
||||
/data/20230101.mastodon.social.json
|
||||
```
|
||||
|
||||
# Misc
|
||||
## Backoff
|
||||
An exponential backoff starts if `curl` fails. It is rudimentary and maxes out at 15 minutes.
|
||||
|
||||
## DNS lookup
|
||||
Before a URL starts streaming fakefirehose will look up the DNS entry of the host. If it fails,
|
||||
the stream will not begin, _and will not attempt to begin again_ until the container is restarted.
|
||||
|
||||
## Permissions
|
||||
The permissions of the outputted data files will be set to `root` by default. This will get fixed
|
||||
in a future release.
|
||||
|
||||
# Why fake firehose?
|
||||
When I wrote this there were not other options I was aware of to fill a federated timeline of a small instance.
|
||||
The work of [Gervasio Marchand](https://mastodonte.tech/@g3rv4) is fantastic but still required programming knowledge to make use of.
|
||||
|
||||
I wanted the simplest setup and config I could create, without setting up an entirely new web UI.
|
||||
|
||||
There are a lot of things to do better, I'll work on the ones I have time and capability for. Otherwise, this project
|
||||
is practically begging to be re-written in python or something else.
|
|
@ -8,11 +8,6 @@ then
|
|||
exit 2
|
||||
fi
|
||||
|
||||
# if [[ "$checkUrl" != *"200"* ]]
|
||||
# then
|
||||
# echo "[WARN] Server threw an error, skipping"
|
||||
# fi
|
||||
|
||||
# Check to see if domain name resolves. If not, exist
|
||||
if [[ ! `dig $host +short` ]]
|
||||
then
|
||||
|
|
Loading…
Reference in New Issue