0
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Servicio de clasificación documental multi cliente basado en técnicas de aprendizaje de máquina y Elasticsearch Translated title: Serviço de classificação documentária multi-cliente baseado em técnicas de aprendizagem de máquina e Elasticsearch Translated title: Multi-Client Document Classification Service Based on Machine Learning Techniques and Elasticsearch

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Resumen Este artículo presenta un servicio de clasificación documental que permite a los sistemas de gestión documental de múltiples clientes brindar una mayor confianza y credibilidad sobre los tipos documentales asignados a los documentos que cargan los usuarios. La investigación fue realizada a través de las fases de CRISP-DM en las que se evaluaron dos modelos de representación de documentos, bolsas de palabras con n-gramas acumulativos y BERT (propuesto recientemente por Google), y cinco técnicas de aprendizaje de máquina, perceptrón multicapa, bosques aleatorios, k vecinos más cercanos, árboles de decisión y un clasificador bayesiano ingenuo. Los experimentos se realizaron con datos de dos organizaciones y los mejores resultados fueron los obtenidos por el perceptrón multicapa, los bosques aleatorios y los k vecinos más cercanos, con resultados muy similares de exactitud general y recuerdo por clase para los tres algoritmos. Los resultados no son concluyentes para ofertar el servicio a múltiples clientes con un solo modelo, ya que esto depende de los documentos y tipos documentales de cada uno de ellos. Por lo anterior, se ofrece un servicio basado en una arquitectura de microservicios que permite a cada organización la creación de su propio modelo, el monitoreo de su rendimiento en producción y su actualización cuando el rendimiento no sea adecuado.

          Translated abstract

          Resumo Este artigo apresenta um serviço de classificação de documentos que permite que sistemas de gerenciamento de documentos de múltiplos clientes (multilocatário) forneçam maior confiança e credibilidade nos tipos de documentos atribuídos aos documentos carregados pelos usuários. A pesquisa foi realizada através das fases do CRISP-DM onde foram avaliados dois modelos de representação de documentos, sacos de palavras com n-gramas cumulativos e BERT (recentemente proposto pelo Google) e cinco técnicas de aprendizado de máquina, perceptron multicamadas, florestas aleatórias, k mais próximo vizinhos, árvores de decisão e bayes ingênuos. Os experimentos foram realizados com dados de duas organizações e os melhores resultados foram obtidos pelo perceptron multicamadas, as florestas aleatórias e os k vizinhos mais próximos, com resultados muito semelhantes de precisão geral e recuperação por classe para esses três algoritmos. Os resultados não são conclusivos para oferecer o serviço a vários clientes com um único modelo, pois isso depende também dos documentos e tipos de documentos de cada um deles. Portanto, um serviço é oferecido com base em uma arquitetura de microsserviços que permite a cada organização criar seu próprio modelo, monitorar seu desempenho na produção e atualizá-lo quando o desempenho não for adequado.

          Translated abstract

          Abstract This paper presents a document classification service that allows multiple client (multi-tenant) document management systems to provide greater confidence and credibility regarding the document types assigned to documents uploaded by users. The research was carried out through the phases of CRISP-DM, where two document representation models were evaluated (bags of words with cumulative n-grams and BERT, which was recently proposed by Google) and five machine learning techniques (multilayer perceptron, random forests, k-nearest neighbors, decision trees, and naïve bayes). The experiments were carried out with data from two organizations, and the best results were obtained by multilayer perceptron, random forests, and k-nearest neighbors, which showed very similar results regarding general accuracy and recall by class. The results are not conclusive with respect to the ability to offer the service to multiple clients with a single model, since this also depends on their documents and document types. Therefore, a service is offered which is based on a microservices architecture that allows each organization to create its own model, monitor its performance in production, and update it when performance is not adequate.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: found
          • Article: not found
          Is Open Access

          Text Classification Algorithms: A Survey

          In recent years, there has been an exponential growth in the number of complex documentsand texts that require a deeper understanding of machine learning methods to be able to accuratelyclassify texts in many applications. Many machine learning approaches have achieved surpassingresults in natural language processing. The success of these learning algorithms relies on their capacityto understand complex models and non-linear relationships within data. However, finding suitablestructures, architectures, and techniques for text classification is a challenge for researchers. In thispaper, a brief overview of text classification algorithms is discussed. This overview covers differenttext feature extractions, dimensionality reduction methods, existing algorithms and techniques, andevaluations methods. Finally, the limitations of each technique and their application in real-worldproblems are discussed.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found
            Is Open Access

            Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Text classification based on deep belief network and softmax regression

                Bookmark

                Author and article information

                Journal
                cient
                Revista científica
                Rev. Cient.
                Universidad Distrital Francisco José de Caldas (Bogotá, Distrito Capital, Colombia )
                0124-2253
                2344-8350
                April 2022
                : 43
                : 64-79
                Affiliations
                [1] Cali orgnameNexura S.A.S. Colombia dsgarcia@ 123456nexura.com
                [3] Popayán Valle del Cauca orgnameUniversidad del Cauca Colombia mmendoza@ 123456unicauca.edu.co
                [2] Popayán Valle del Cauca orgnameUniversidad del Cauca Colombia ccobos@ 123456unicauca.edu.co
                [4] Popayán Valle del Cauca orgnameUniversidad del Cauca Colombia manzamb@ 123456unicauca.edu.co
                [5] Cali orgnameNexura S.A.S. Colombia jmartinez@ 123456nexura.com
                Article
                S0124-22532022000100064 S0124-2253(22)00004300064
                10.14483/23448350.18352
                c0a4f847-e150-4a25-903b-813f29c462a3

                This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

                History
                : October 2021
                : August 2021
                Page count
                Figures: 0, Tables: 0, Equations: 0, References: 35, Pages: 16
                Product

                SciELO Colombia

                Categories
                Artículo de investigación

                sistema de gerenciamento de documentos,análise de dados,CRISP-DM,florestas aleatórias,k-vizinhos mais próximos,perceptron multicamadas,trigramas.,trigrams.,random forests,multilayer perceptron,k-nearest neighbors,document management system,data analytics,sistema de gestión documental,perceptrón multicapa,k vecinos más cercanos,bosques aleatorios,analítica de datos

                Comments

                Comment on this article