20
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RECONHECIMENTO DO VOCABULÁRIO DE JORNAIS POPULARES BRASILEIROS POR UM DICIONÁRIO COMPUTACIONAL DE ACESSO LIVRE Translated title: RECOGNIZING THE VOCABULARY OF BRAZILIAN POPULAR NEWSPAPERS WITH A FREE-ACCESS COMPUTATIONAL DICTIONARY

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          RESUMO Relata-se um experimento de verificação da identificação de um universo de palavras do português popular escrito por duas versões de um dicionário computacional do português brasileiro (PB), DELAF PB 2004 e DELAF PB 2015. Esse dicionário computacional é gratuitamente acessível para ser utilizado em análises linguísticas do Português do Brasil e em outras pesquisas, o que justifica um estudo crítico. O universo vocabular provém do corpus PorPopular, composto por jornais populares, o Diário Gaúcho (DG) e o jornal baiano Massa ! (MA). Do DG, partiu-se de um conjunto de textos com 984.465 palavras ( tokens) , publicados em 2008, com ortografia desatualizada frente ao Acordo Ortográfico da Língua Portuguesa adotado em 2009. Do MA, examinou-se um universo com 215.776 palavras ( tokens) , em publicações de 2012, 2014 e 2015, com todo o material na nova ortografia. A verificação envolveu: a) gerar listas de palavras diferentes empregadas em DG e MA; b) comparar essas listas com as listas de entradas das duas versões do DELAF PB; c) avaliar a cobertura desse vocabulário; d) propor modos de inclusão de itens não cobertos. Os resultados do trabalho mostraram, no DG, uma média de 19% de palavras diferentes ( types) desconhecidas pelos DELAF PB 2004 e 2015. No MA, essa média ficou em 13%. A versão do dicionário repercutiu ligeiramente sobre o desempenho do reconhecimento de itens.

          Translated abstract

          ABSTRACT We report an experiment to check the identification of a set of words in popular written Portuguese with two versions of a computational dictionary of Brazilian Portuguese, DELAF PB 2004 and DELAF PB 2015. This dictionary is freely available for use in linguistic analyses of Brazilian Portuguese and other researches, which justifies critical study. The vocabulary comes from the PorPopular corpus, made of popular newspapers Diário Gaúcho (DG) and Massa ! (MA). From DG, we retained a set of texts with 984.465 words (tokens), published in 2008, with the spelling used before the Portuguese Language Orthographic Agreement adopted in 2009. From MA, we examined papers of 2012, 2014 e 2015, with 215.776 words (tokens), all with the new spelling. The checking involved: a) generating lists of words (types) occurring in DG and MA; b) comparing them with the entry lists of both versions of DELAF PB; c) assessing the coverage of this vocabulary; d) proposing ways of incorporating the items not covered. The results of the work show that an average of 19% of the types in DG were not found in DELAF PB 2004 or 2015. In MA, this average is 13%. Switching versions of the dictionary affected slightly the performance in recognizing the words.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: not found
          • Article: not found

          Dimensões da palavra

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Jornal Popular X Jornal Tradicional: Análise léxico-gramatical da notícia a partir da Linguística de Corpus - Um estudo de casos dos jornais cariocas “O Globo” e “O Dia”

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Linguistic issues in the development of ReGra: A grammar checker for Brazilian Portuguese

              This paper presents a number of linguistic and computational issues identified during the implementation of a general use grammar checker for contemporary Brazilian Portuguese, ReGra, that has been incorporated in the word processor REDATOR by Itautec/Philco (Brazil). Two main strategies were employed in the implementation of correction rules: an error-driven, localist approach based on the identification of patterns indicative of grammatical mistakes; and a more generic approach that requires automatic syntactic analysis. In this discussion, particular emphasis is given to the development of a parser based on a phrase structure grammar comprising over 600 production rules. As for the computational performance, ReGra permits texts to be revised at a rate of ca. 200 words per second.
                Bookmark

                Author and article information

                Contributors
                Role: ND
                Role: ND
                Role: ND
                Journal
                alfa
                Alfa: Revista de Linguística (São José do Rio Preto)
                Alfa, rev. linguíst. (São José Rio Preto)
                Universidade Estadual Paulista Júlio de Mesquita Filho (São Paulo, SP, Brazil )
                0002-5216
                1981-5794
                May 2019
                : 63
                : 1
                : 63-80
                Affiliations
                [3] Champs-sur-Marne orgnameUniversité Paris-Est orgdiv1Institut d’électronique et d’informatique Gaspard-Monge França eric.laporte@ 123456univ-paris-est.fr.
                [1] Porto Alegre Rio Grande do Sul orgnameUniversidade Federal do Rio Grande do Sul orgdiv1Programa de Pós-Graduação em Letras Brazil maria.finatto@ 123456ufrgs.br.
                [2] São Carlos orgnameUniversidade Federal de São Carlos orgdiv1Centro de Educação e Ciências Humanas Brazil otovale@ 123456ufscar.br.
                Article
                S1981-57942019000100063
                10.1590/1981-5794-1904-3
                08e55c97-6047-428b-9ff0-72d4be97a2e8

                This work is licensed under a Creative Commons Attribution 4.0 International License.

                History
                : 23 March 2018
                : 17 June 2018
                Page count
                Figures: 0, Tables: 0, Equations: 0, References: 31, Pages: 18
                Product

                SciELO Brazil

                Categories
                Artigos Originais

                Jornais populares,Léxico,Vocabulário,Dicionário computacional,Cobertura lexical,Reconhecimento de palavras,Português brasileiro,Popular newspapers,Lexis,Vocabulary,NLP dictionary,Lexical coverage,Word recognition,Brazilian Portuguese

                Comments

                Comment on this article