Hacking Tools

Natudump – Automating The Extraction Of Naturalisation Decrees From LegiFrance

This is example of scraping public LegiFrance registry’s naturalisation decrees for research purposes only (naturalisation par mariage is not included in these decrees). Code license is MIT.

pip install selenium charset_normalizer

mkdir -p jo
# python3 natudump.py -o jo --years $(seq 2000 2021) --output-directory-prefix "$(wslpath -a -w "$PWD")\\" # for WSL systems, must be on a NTFS drive
python3 natudump.py   -o jo --years $(seq 2000 2021) --output-directory-prefix "$PWD/"
ls jo | wc -l

mkdir -p txtjo
# https://github.com/pdfminer/pdfminer.six/issues/809
git clone --branch 20220524 --depth 1 https://github.com/pdfminer/pdfminer.six
PYTHONPATH="pdfminer.six:pdfminer.six/tools:$PYTHONPATH" find jo -name '*.pdf' -exec python3 -m pdf2txt {} -o txt{}.txt \;
ls txtjo | wc -l

python3 tabulate.py -i txtjo -o natufrance_2000_2021.tsv
grep 'Russie\|URSS\|U.R.S.S' natufrance_2000_2021.tsv | wc -l

mkdir -p catjo
git clone --branch v0.4 --depth 1 https://github.com/pmaupin/pdfrw
rm $(PYTHONPATH="$PWD/pdfrw:$PYTHONPATH" find jo/ -type f -not -exec python3 -c 'import sys, pdfrw; pdfrw.PdfReader(sys.argv[1])' {} \; -print)
for years in $(seq 2000 2021); do PYTHONPATH="$PWD/pdfrw:$PYTHONPATH" python3 pdfrw/examples/cat.py jo/JORF_${years}*; mv cat.JORF_${years}*.pdf catjo; done
ls catjo | wc -l

mkdir -p tarjo
for years in $(seq 2000 2021); do tar -cf tarjo/jo${years}.tar jo/*_${years}*; done
ls tarjo | wc -l
Varshini

Varshini is a Cyber Security expert in Threat Analysis, Vulnerability Assessment, and Research. Passionate about staying ahead of emerging Threats and Technologies.

Recent Posts

How to Prevent Software Supply Chain Attacks

What is a Software Supply Chain Attack? A software supply chain attack occurs when a…

1 week ago

How UDP Works and Why It Is So Fast

When people ask how UDP works, the simplest answer is this: UDP sends data quickly…

3 weeks ago

How EDR Killers Bypass Security Tools

Endpoint Detection and Response (EDR) solutions have become a cornerstone of modern cybersecurity, designed to…

3 weeks ago

AI-Generated Malware Campaign Scales Threats Through Vibe Coding Techniques

A large-scale malware campaign leveraging AI-assisted development techniques has been uncovered, revealing how attackers are…

3 weeks ago

How Does a Firewall Work Step by Step

How Does a Firewall Work Step by Step? What Is a Firewall and How Does…

3 weeks ago

Fake VPN Download Trap Can Steal Your Work Login in Minutes

People trying to securely connect to work are being tricked into doing the exact opposite.…

3 weeks ago