You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
columns:
- downDate
- hash
- filepath
- url
- status
- urlSeen - leave it empty, when we guess the url
- status - manage when the url does not lead to file
metadata/ - will contain multiple various sources
- government web pages
- sql tables from psp
metadata.tsv
Working/ (intermediate folders and files)
text-tei-like/ (preserve download date)
text-notes/
text-udpipe/
text-audio-in/
text-nametag/
meta-...... - ??? TODO specify
audio-...... - ??? TODO specify
speaker-person.tsv - linking file, speaker, person, role
Logs/
Results/ - (or Dist/ ??? but it is a bit strange in Sample folder) folder contains compiled corpora
ParCzech.TEI/
ParCzech.TEI.ana/
ParCzech.TEITOK/ - publish it on TEITOK-dev
AudioPSP/
ParlaMint.Source-TEI/ (this format goes to ParlaMint pipeline)
ParlaMint-CZ.TEI/
ParlaMint-CZ.TEI.ana/
ParlaMint.Sample/ (this folder content can be pushed to ParlaMint repository through pull request)
ParlaSpeech/
Source - ???
Docs/
Schema/
Makefile
licence
contributing
citing
README.md
.gitignore
The text was updated successfully, but these errors were encountered:
Goals:
The structure:
Scripts/
StaticData/
Taxonomies/
- ParCzech and ParlaMint taxonomies (currently in src/metadater/taxonomies/)Metadata/
- manually added metadata and annotationsorg-coalition-opposition.tsv
currently here src/metadater/parczech_coal-opp.tsvorg-ana.tsv
annotation of organization src/psp-db/org-ana.unlPatches/
translations.tsv
maybe only one file is needed and add some column for context (if context specific translation)Build/
- dont add to gituse the same structure as Samples
Samples/
.gitignore
Sources/
html/
{date}/
- use in Build folder - the date when the source was downloadedsample/
- static sample datahtml.tsv
https://www.psp.cz/eknih/2013ps/stenprot/001schuz/s001001.htm
https://www.psp.cz/eknih/2013ps/stenprot/{MEEGING}schuz/s{MEETING}{PAGE}.htm
Audio/
audio.tsv
- downDate
- hash
- filepath
- url
- status
- urlSeen - leave it empty, when we guess the url
- status - manage when the url does not lead to file
metadata/
- will contain multiple various sources- government web pages
- sql tables from psp
Working/
(intermediate folders and files)text-tei-like/
(preserve download date)text-notes/
text-udpipe/
text-audio-in/
text-nametag/
meta-......
- ??? TODO specifyaudio-......
- ??? TODO specifyspeaker-person.tsv
- linking file, speaker, person, roleLogs/
Results/
- (orDist/
??? but it is a bit strange in Sample folder) folder contains compiled corporaParCzech.TEI/
ParCzech.TEI.ana/
ParCzech.TEITOK/
- publish it on TEITOK-devAudioPSP/
ParlaMint.Source-TEI/
(this format goes to ParlaMint pipeline)ParlaMint-CZ.TEI/
ParlaMint-CZ.TEI.ana/
ParlaMint.Sample/
(this folder content can be pushed to ParlaMint repository through pull request)ParlaSpeech/
Docs/
Schema/
Makefile
licence
contributing
citing
README.md
.gitignore
The text was updated successfully, but these errors were encountered: