Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

Evaluating Hierarchical Clustering Methods for Corpora with Chronological Order

Abstract : Hierarchical clustering can traditionally be represented through a dendrogram: a rooted tree whose leaves are documents, the length of the path between two leaves representing the stylistic/linguistic distance between the documents. Clusters correspond to branching nodes: the shorter the distance between two nodes, the more they are expected to share stylistic and linguistic features. We wonder how much the resulting dendrogram is consistent with the chronological order of writing. Indeed, this would provide us with a method of evaluating the result of the clustering. More precisely, the question we want to answer is: can the branching nodes of the dendrogram be re-ordered so that its leaves follow a chronological order as best as possible, while of course preserving the structure of the dendrogram?
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-03341803
Contributeur : Philippe Gambette <>
Soumis le : dimanche 12 septembre 2021 - 21:04:00
Dernière modification le : mercredi 15 septembre 2021 - 03:31:40

Fichier

Evaluating Hierarchical Cluste...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-03341803, version 1

Citation

Philippe Gambette, Olga Seminck, Dominique Legallois, Thierry Poibeau. Evaluating Hierarchical Clustering Methods for Corpora with Chronological Order. EADH2021: Interdisciplinary Perspectives on Data. Second International Conference of the European Association for Digital Humanities, EADH, Sep 2021, Krasnoyarsk, Russia. ⟨hal-03341803⟩

Partager

Métriques

Consultations de la notice

4

Téléchargements de fichiers

1