搜档网
当前位置:搜档网 › Semantic annotation, indexing, and retrieval

Semantic annotation, indexing, and retrieval

Semantic Annotation, Indexing, and Retrieval Atanas Kiryakov, Borislav Popov, Damyan Ognyanoff, Dimitar Manov, Angel

Kirilov, Miroslav Goranov

Ontotext Lab, Sirma AI EOOD, 138 Tsarigradsko Shose, Sofia 1784, Bulgaria

{naso, borislav, damyan, mitac, angel, miro}@sirma.bg

Abstract. The Semantic Web realization depends on the availability of critical

mass of metadata for the web content, linked to formal knowledge about the

world. This paper presents our vision about a holistic system allowing

annotation, indexing, and retrieval of documents with respect to real-world

entities. A system (called KIM), partially implementing this concept is shortly

presented and used for evaluation and demonstration.

Our understanding is that a system for semantic annotation should be based

upon specific knowledge about the world, rather than indifferent to any

ontological commitments and general knowledge. To assure efficiency and

reusability of the metadata we introduce a simplistic upper-level ontology

which starts with some basic philosophic distinctions and goes down to the

most popular entity types (people, companies, cities, etc.), thus providing many

of the inter-domain common sense concepts and allowing easy domain-specific

extensions. Based on the ontology, an extensive knowledge base of entities

descriptions is maintained.

Semantically enhanced information extraction system providing automatic

annotation with references to classes in the ontology and instances in the

knowledge base is presented. Based on these annotations, we perform IR-like

indexing and retrieval, further extended using the ontology and knowledge

about the specific entities.

1 Introduction

Semantic Web is about adding formal semantics (metadata, knowledge) to the web content for the purpose of more efficient access and management. Since its vitality depends on the presence of critical mass of metadata, the acquisition of this metadata is a major challenge for the Semantic Web community. Though, in some cases unavoidable, the manual accumulation of this explicit semantics is not considered a feasible approach. Our vision is that fully automatic methods for semantic annotation should be researched and developed. For this to happen, the necessary design and modeling questions should be faced and resolved, and the enabling complementary resources and infrastructure should be provided. To assure wide acceptance and usage of semantic annotation systems their tasks should be clearly defined, their performance – properly evaluated and communicated.

The semantic annotation offered here is a specific metadata generation and usage schema targeted to enable new information access methods and extend existing ones. The annotation scheme offered is based on the understanding that the named entities

(NE, see 1.1) mentioned in the documents constitute important part of their semantics. Further, using different sorts of redundancy, external or background knowledge, those entities can be coupled with formal descriptions and thus provide more semantics and connectivity to the web. We hope that the expectations towards the Semantic Web will be easier to realize if the following basic tasks can be defined and solved:

1.Annotate and hyperlink (references to) named entities in text documents;

2.Index and retrieve documents with respect to the referred entities.

The first task can be seen as an advanced combination of basic press-clipping exercise, typical IE1 task, and automatic hyper-linking. The resulting annotations represent basically a document enrichment and presentation method, which can further be used to enable other access methods.

The second task is just a modification of the classical IR task – documents are retrieved based on relevance to NEs instead of words. However the basic assumption is quite similar – the documents are characterized by the bag of tokens2 constituting their content, disregarding its structure. While the basic IR approach considers as tokens the word stems, for the last decade there was considerable effort towards using word-senses or lexical concepts (see [20] and [36]) for indexing and retrieval. The named entities can be seen as special sort of token to be taken care of. What we present here is one more (pretty much independent) development direction instead of alternative of the contemporary IR trends.

Fig. 1. Semantic Annotation

In a nutshell, Semantic Annotation is about assigning to the entities in the text links to their semantic descriptions (as presented on Fig. 1). This sort of metadata provides 1 Information extraction, a relatively young discipline in the Natural Language Processing (NLP), which conducts partial analysis of text in order to extract specific information, [6].

2 Or “atomic text entities” as those are referred in [17].

both class and instance information about the entities. It is a matter of terminology whether these annotations should be called “semantic”, “entity” or some other way. To the best of our knowledge there is no well-established term for this task; neither there is a well-established meaning for “semantic annotation”. What is more important, the automatic semantic annotations enable many new applications: highlighting, indexing and retrieval, categorization, generation of more advanced metadata, smooth traversal between unstructured text and available relevant knowledge. Semantic annotation is applicable for any sort of text – web pages, regular (non-web) documents, text fields in databases, etc. Further, knowledge acquisition can be performed based on extraction of more complex dependencies – analysis of relationships between entities, event and situation descriptions, etc.

This paper presents a schema for automatic semantic annotation, indexing and retrieval, together with a discussion on number of design and modeling questions (section 2) followed by discussion on the process (section 3). In section 4, we present a software platform, KIM, which demonstrates this model based on the latest Semantic Web and Information Extraction technology. The fifth section provides survey on related work. Conclusion and future work are discussed in section 6.

1.1 Named Entities

In the NLP and particularly IE tradition, named entities are considered: people, organizations, locations, and others referred by name. In a wider interpretation, those include also scalar values (numbers,dates, amounts of money), addresses, etc.

The NEs require different handling because of their different nature and semantics3 as opposed to the words (terms, phrases, etc.) While the former denote particulars (individuals or instances), the later denote universals (concepts, classes, relations, attributes). While the words can be described with the means of lexical semantics and common sense, the understanding and managing of named entities, requires more specific world knowledge.

2. Semantic Annotation Model and Representation

Here we discuss the structure and the representation of the semantic annotations, including the necessary knowledge and metadata. There are number of basic prerequisite for representation of semantic annotations:

?Ontology (or at least taxonomy) defining the entity classes. It should be possible to refer to those classes;

?Entity identifiers which allow those to be distinguished and linked to their semantic descriptions;

?Knowledge base with entity descriptions.

The next question considers an important choice for the representation of the annotations – “to embed or not to embed?” Although the embedded annotations seem 3 Without trying to discuss what semantic means in general, we simplify it down to “a model or description of an object which allows further interpretation.”

Fig. 2. Distributed Heterogeneous

Knowledge easier to maintain, there are number of arguments providing evidence that the semantic annotations have to be decoupled from the content they refer to. One key reason is to allow dynamic, user-specific, semantic annotations – the embedded annotations become part of the content and may not change corresponding to the interest of the user or the context of usage. Further, embedded complex annotations would have negative impact on the volume of the content and can complicate its maintenance – imagine that page with three layers of overlapping semantic annotations need to be updated preserving them consistent. Those and number of other issues defending the externally encoded annotation can be found in [34] which also provides an interesting parallel to the open hypermedia systems.

Once decided that the semantic annotations has to be kept separate from the content, the next question is whether or not (or how much) to couple the annotations with the ontology and the knowledge base? It is the case that such integration seems profitable – it would be easier to keep in synch the annotations with the class and entity descriptions. However, there are at least two important problems:

? Both the cardinality and the complexity of the annotations differ from those of

the entity descriptions – the annotations are simpler, but their count is usually much bigger than this of the entity descriptions. Even considering middle-sized document corpora the annotations can reach tens of millions. Suppose 10M annotations are stored in an RDF(S) store together with 1M entity descriptions. Suppose also that each annotation and each entity description are represented with 10 statements. There is a considerable difference regarding the inference approaches and hardware capable in

efficient reasoning and access to 10M-

statement repository and with 110M-

statement repository.

? It would be nice if the world knowledge

(ontology and instance data) and the

document-related metadata are kept

independent. This would mean that for

one and the same document different

extraction, processing, or authoring

methods will be able to deliver

alternative metadata referring to one and

the same knowledge store.

? Most important, it should be possible the

ownership and the responsibility for the

metadata and the knowledge to be distributed. This way, different parties can develop and maintain separately the

content, the metadata, and the knowledge.

Based on the above arguments we propose decoupled representation and management of the documents, the metadata (annotations) and the formal knowledge (ontologies and instance data) as depicted on Fig. 2.

2.1 Light-weight Upper Level Ontology

We will shortly advocate the appropriateness of using ontology for defining the entity types – those are the only wide accepted paradigm for management of open, sharable, and reusable knowledge. According to our view, light-weight ontology (poor on axioms) is sufficient for simple definition of the entity classes, their appropriate attributes, and relations. In the same time it allows more efficient and scalable management of the knowledge (compared the heavy-weight semantic approaches.) The ontology to support semantic annotation in a web context should address number of general classes which use to appear in texts in various domains. Describing these classes together with the most basic relations and attributes means that an upper-level ontology should be involved. The experience within number of projects4 demonstrates that “logically extensive” upper-level ontologies are extremely hard to agree on, build, maintain, understand, and use. This seems to provide enough evidence that a light-weight upper level ontology is necessary for semantic annotations.

2.2 Knowledge Representation Language

According to the analysis of ontology and knowledge representation languages and formats in [11] and other authors it becomes evident that there is no much consensus beyond RDF(S), see [4]. The latter is well established in the Semantic Web community as a knowledge representation and interchange language. The rich diversity of RDF(S) repositories, APIs and tools, forms a mature environment for development of systems grounded in RDF(S) representation of their ontological and knowledge resources. Because of the common acceptance of RDF(S) in the Semantic Web community, it would be easy to reuse the ontology and KB, as well as enrich them with domain-specific extensions. The new OWL (see [9]) standard offers clear, relatively consensual and backward-compatible path beyond RDF(S), but still lacks sufficient tool support. Our experience shows (see the section on KIM) that for the basic purposes of light-weight ontology definition and entity description, RDF(S) provides sufficient basic expressiveness. The most critical nice-to-have primitives (equality, transitive and symmetric relations, etc.) are well covered in OWL Lite – the simplest first level of OWL. So, we suggest that RDF(S) is used in a way which allows easy extension towards OWL – this means avoiding primitives and patterns not included in OWL, https://www.sodocs.net/doc/6918910926.html,/2002/07/owl.

2.3 Metadata Encoding and Management

The metadata has to be stored in a format allowing its efficient management; we are not going to prescribe a specific format here, but rather to outline number of principles and requirements towards the document and annotation management:

4For instance, Cyc (https://www.sodocs.net/doc/6918910926.html,) and the Standard Upper Ontology initiative (https://www.sodocs.net/doc/6918910926.html,/)

?Documents (and other content) in different formats to be identifiable and their text content to be accessible;

?To allow non-embedded annotations over documents to be stored, managed and retrieved according to their positions, features, and references to a KB;

?To allow embedding of the annotations at least for some of the formats;

?To allow export and exchange of the annotations in different formats.

There are number of standards and initiatives related to encoding and representation of metadata related to text. Two of the most popular are TEI5 and Tipster6.

2.4 Knowledge Base

Once having the entity types, relations, and attributes encoded in an ontology, the next aspect of the semantic annotation representation are the entity descriptions. It should be possible to identify, describe and interconnect the entities in a general, flexible and standard fashion. We call a body of formal knowledge about entities a knowledge base (KB) – although a bit old-fashioned, this term reflects best the representation of non-ontological formal knowledge. A KB is expected to contain mostly instance knowledge/data, so, other names can also make a good fit for such dataset.

We consider that the ontology (defining all classes, relations and attributes, together with further constraints and dependencies) is a sort of schema for the KB and both should be kept into a semantic store – any sort of formal knowledge reasoning and management system which provides the basic operations: storage and retrieval according to the syntax and semantics of the selected formalism. The store may or may not provide inference7, it can implement different reasoning strategies, etc. There are also more advanced management features which are not considered as a must: versioning, access control, transaction support, locking, client-caching. For an overview of those see [16], [15], [19] and [25]. Whether the ontology and knowledge base should be kept together – this is a matter of distributed knowledge representation and management which is outside the scope of this paper.

The KB can host two sorts of entity knowledge (descriptions and relationships): ?Pre-populated – such imported or otherwise acquired from trusted sources;

?Automatically extracted – such discovered in the process of semantic annotation (say via IE) or using other knowledge discovery and acquisition methods such as data-mining.

It is up to the specific implementation, whether or not and how much the KB to be pre-populated. For instance, information about entities of general importance (including their aliases) can significantly help the IE used for automatic semantic annotations – an extensive proposal about this can be found in the description of the KIM platform later on in this paper.

5 The Text Encoding Initiative, https://www.sodocs.net/doc/6918910926.html,/

6 Tipster Architecture, https://www.sodocs.net/doc/6918910926.html,/cs/faculty/grishman/tipster.html

7 For instance, there are experts who do not consider as inference the interpretation of RDF(S) according to its model-theoretic semantics, just because this one is simple compared to semantic and the inference methods in other languages.

Further, domain and task specific knowledge could help the customization of a semantic annotation application – after extending the ontology to match the appliance domain, the KB could be pre-populated with specific entities. For instance, information about specific markets, customers, products, technologies and competitors could be of a great help for business intelligence and press-clipping; for company intelligence within UK it would be important to have more exhaustive coverage of UK-based companies and UK locations. It might also appear beneficial to reduce the general information that is not applicable in the concrete context and thus construct a more focused KB.

Since state of the art IE (and in particular named entity recognition, NER) allows recognition of new (previously unknown) entities and relations between them, it is reasonable to use this advantage for the enrichment of the KB. Because of the innate non-preciseness of these methods, the knowledge accumulated through them should be distinguishable from the one that was pre-populated. Thus the extraction of new metadata, can still be grounded in the trusted knowledge about the world, while the accumulated entities would be available for indexing, browsing and navigation. Recognized entities could be transformed to trusted ones at some point, through semi-automatic validation process. Important part of this enrichment would be the template extraction of entity relations, which could be referred to as some kind of content-based learning of the system. Depending on the texts that are being processed, the respective changes would occur in the recognized parts of the KB, and thus its projection of the world would change accordingly (e.g. processing only sport news articles, the metadata would be both rich for this domain and poor for the others.) 2.5. Unified Representation of Lexical Knowledge

The symbolic IE processing usually requires some lexica to be used for pattern recognition and other purposes. These are both general entries (such as various sorts of stop words) as well as such specific for the entity classes being handled. It is common that IE systems keep these in application-specific formats or directly hard-coded in the source code.

It’s worth to represent and manage those in the same format used for the ontology and the entity knowledge base – this way the same tools (parsers, editors, etc.) can be used to manage both sorts of knowledge. For this purpose, part of the ontology (or just a separate one) could be dedicated to defining the types of lexical resources used by the natural language technologies involved.

The corresponding lexical resources part of the KB should be pre-populated to aid the IE process by providing clues for the entity and relation recognition, which goes beyond the already known instances. For instance, for efficient recognition of persons in the text one would need lists of first names (male and female), person titles, positions and professions. Some of these could be ontologically distinguishable by gender, as well. For the Organization lexica one should pre-populate possible suffixes (such as Ltd., GmbH, etc.), and terms appearing in the organization name (e.g. company, theatre, etc.). Additionally, time and date lexica (“a.m.”, “Tue”, etc.), currency units, address lexica and others should be included. The mature symbolic NER and IE systems already have coverage of such resources; the next step to

integrate them in a system for automatic semantic annotation would be just to encode them in a formal ontology and present them in the KB.

3. Semantic Annotation Process

As already mentioned, we focus mainly on the automatic semantic annotation, leaving manual annotation to approaches more related to authoring web content. Even less accurate, the automatic approaches for metadata acquisition promise scalability and without them the Semantic Web will remain mostly a vision for long time. Our experience shows that the existing state-of-the-art IE systems have the potential to automate the annotation with reasonable accuracy and performance.

Although a lot of research and development contributed in the area of automatic IE so far, the lack of standards and integration with formal knowledge management systems was obscuring its usage. We claim that it is crucial to encode the extracted knowledge formally and according to well known and widely accepted knowledge representation and metadata encoding standards. Such system should be easily extensible for domain-specific applications, providing basic means for addressing the most common entities types, their attributes, and relations.

3.1 Extraction

It is a major problem with the traditional NER approaches that the annotations produced are not encoded in an open formal system and unbound entity types are used. The resources used are also traditionally presented in a proprietary form with no clear semantics. This hinders the reuse of both lexical resources and the resulting annotations by other systems, thus limiting the progress of the language technologies, since effortless sharing of resources and results is too expensive.

These problems can be partly resolved by an ontology-based infrastructure for IE. As proposed above, the entity types should be defined within an ontology, and the entities being recognized to be described (or at least kept) in accompanying KB. Thus the NLP systems with ontology support would more easily share both pre-populated knowledge and the results of their processing, as well, as all the different sorts of lexicons and other resources commonly used.

An important case demonstrating how ontologies can be used in IE are the so-called gazetteers used to look-up in the text predefined strings out of predefined lists. At present, the lists are being kept in proprietary formats. Typical result of the work of the gazetteers are annotations with some unbound strings used as types. A better approach presumes all the various annotation types and list values to be kept in a semantic store. Thus, the resulting annotation can be typed by reference to ontology classes and even further, point to specific lexeme or entity, if appropriate.

Since a huge amount of NLP research has been contributed in the recent years (and even decades), we suggest the reuse of existing systems with proven maturity, and effectiveness. Such system should be modified so to use resources kept in a KB and produce annotations referring to the latter. Our experience shows that such a change is not a trivial one. All the processing layers have to be re-engineered in order to get

opened towards the semantic repository and depend on it for their inputs. However, there are number of benefits of such approach:

?All the various sorts of resources can be managed in a much more standard and uniform way;

?It becomes easier to manage the different sorts of linguistic knowledge at the proper level of generality. For instance, a properly structured entity type hierarchy would allow that the entities and their references in the text are classified in the most precise way, but still easily matched in more general patterns. Thus, one can have a specific mountain annotated and still match it within a grammar rule which expects any sort of location;

?Wherever it is possible, any available further knowledge will be accessible directly with a reference from the annotation to the semantic store. Thus, available knowledge for an entity can be used for instance for disambiguation or co-reference resolution tasks.

A processing layer that is not inherent to the traditional IE systems can generate and store in the K

B the descriptions of the newly discovered entities. When the same entity is encountered in the text next time, it can be directly linked to the already generated description. Further, extending the IE task to cover template relations extraction, another layer could enrich the KB with these relations.

3.2. Indexing and Retrieval

Historically, the issue of specific handling of the named entities was neglected by the information retrieval (IR) community, apart from some shallow handling for the purpose of Questions/Answering tasks. However, a recent large scale human interaction study on a personal content IR system of Microsoft (reported in [10]) demonstrates that, at least in some cases, the ignorance of the named entities does not match the user needs: “The most common query types in our logs were People/places/things, Computers/internet and Health/science. In the People/places thing category, names were especially prevalent. Their importance is highlighted by the fact that 25% of the queries involved people’s names suggesting that people are a powerful memory cue for personal content. In contrast, general informational queries are less prevalent.”

As the web content is rapidly growing, the demand of more advanced retrieval methods increases accordingly. Based on semantic annotations, efficient indexing and retrieval techniques could be developed involving explicit handling of the named entity references.

In a nutshell, the semantic annotations could be used to index both “NY” and “N.Y.” as occurrence of the specific entity “New York” like if there was just its unique ID. Because of no entity recognition involved, the present systems will index on “NY”, “N”, and “Y” which demonstrates well some of the problems with the keyword-based search engines.

Given metadata indexing of the content, advanced semantic querying should be feasible. In a query towards a repository of semantically annotated documents, it should be possible to specify entity type restrictions, name and other attribute restrictions, as well as relations between the entities of interest. For instance, it should possible to make a query that targets all documents that refer to Persons that hold

some Positions within an Organization, and also restricts the names of the entities or some of their attributes (e.g. a person’s gender).

Further, semantic annotations could be used to match specific references in the text to more general queries. For instance, a query such as “company ‘Redwood Shores’” could match documents mentioning the town and specific companies such as ORACLE and Symbian, but not the word “company”.

Finally, although the above sketched enhancements look prominent, it still requires a lot of research and experiments to determine to what extent and how they could improve the existing IR systems. It is hard in a general context to predict how semantic indexing will combine with the symbolic and the statistical methods currently in use, such as the lexical approach presented in [20] and the latent semantic analysis presented in [18]. For this purpose, large scale experimental data and evaluation are required.

4 KIM Platform: Implementing the Vision

The Knowledge and Information Management (KIM) platform embodies our vision of semantic annotation, indexing and retrieval services and infrastructure. An essential idea in KIM, is the semantic (or entity) annotation, (as depicted on Fig. 1). It can be seen as a classical named-entity recognition and annotation process. However, in contrast to most of the existing IE systems, KIM provides for each entity reference in the text (i) a link (URI) to the most specific class in the ontology and (ii) a link to the specific instance in the knowledge base. The latest is (to the best of our knowledge) an unique KIM feature which allows further indexing and retrieval of documents with respect to entities.

For the end-user, the usage of a KIM-based application is straightforward and simple – requesting annotation from a browser plug-in, which highlights the entities in the current content and generates a hyperlink used for further exploring the available knowledge for the entity (as shown in Fig. 4). A semantic query web UI allows specification of a search query, that consists of entity type, name, attribute and relation restrictions (allowing queries such as Organization-locatedIn-Country, Person-hasPosition-Position-within-Organization, etc.)

This section provides a short overview of the main components of KIM, which is presented in bigger details in [27] and on its web site, https://www.sodocs.net/doc/6918910926.html,/kim.

4.1 KIM Architecture

The KIM platform consists of KIM Ontology (KIMO)8, knowledge base, KIM Server (with API for remote access, embedding, and integration), and front-ends (browser plug-in for Internet Explorer, Semantic Query web user interface, and Knowledge Explorer for KB navigation). KIM ontologies and knowledge bases are kept in the Sesame9 RDF(S) repository and the Ontology Middleware Module10 [16].

KIM provides a mature infrastructure for IE, annotation and document management, based on GATE11 [7]. The Lucene12 information retrieval engine has been adopted to index documents by entity types and measure relevance according entities, along with tokens and stems. It is important to mention that KIM, as a software platform, is domain and task independent as are Gate, Sesame and Lucene.

4.2 KIM Ontology (KIMO)

KIM uses a simplistic upper-level ontology starting with some basic philosophic distinctions between entity types (such as Object-s - existing entities such as locations and agents, Happening-s – defining events and situations, and Abstract-ions that are neither objects, neither happenings). Further on, the ontology goes in more details to such extent that real-world entity types of general importance are included (meetings, military conflicts, employment positions, commercial, government and other organizations, people, and various locations, etc.). The characteristic attributes and relations for the featured entity types, are defined (e.g. subRegionOf property for Location-s, hasPosition for Person s, locatedIn for organizations, etc.) Having this simplistic upper-level ontology as basis, one could add domain-specific extensions to it easily, for profiling the semantic annotation for concrete applications.

The distribution of the most commonly referred entity types varies greatly from domain to domain. As researched in [23], despite the difference of type distributions, there are several general entity types that appear in all corpuses – Person, Location, Organization, Money (amount), Dates, etc. The proper representation and positioning of those basic types was one of the objectives behind the design of KIMO. Further the ontology defines more specific entity types (e.g. Mountain, as a more specific type than Location.) The extent of specialization of the ontology is determined on the basis of research of the entity types in a corpus of general news (incl. political, sport, financial, etc.)

The KIM ontology (KIMO)13 consists of about 250 classes and 100 properties. The top Entities could be seen in the type hierarchy of the KIM plug-in on Fig 4. The 8 https://www.sodocs.net/doc/6918910926.html,/kim/2003/03/kimo.rdfs

9http://sesame.aidministrator.nl/, RDF(S) repository by Aidministrator b.v.

10 OMM (https://www.sodocs.net/doc/6918910926.html,/omm) is an enterprise back-end for knowledge management.

11 General Architecture for Text Engineering (GATE), https://www.sodocs.net/doc/6918910926.html,, leading NLP and IE platform developed in the University of Sheffield.

12 Lucene, https://www.sodocs.net/doc/6918910926.html,/lucene/, high performance full text search engine

13 https://www.sodocs.net/doc/6918910926.html,/kim/2003/03/kimo.rdfs

Fig. 3. Simplified entity description ontology is encoded in RDF(S). In addition number of “generative” (in the style of the RDFS MT semantics) axioms are defined, such as:

and => This sort of axioms are supported by Sesame and provide easy to understand and manage consistent mechanism for “custom” extensions to the RDF(S) semantics with respect to specific ontology. Those axioms can be seen as an add-hoc but quite practical way to avoid the RDF(S) constraints without a need to implement some specific flavor of OWL or another language.

4.3 KIM Knowledge Base

The entity descriptions are being stored in

the same RDF(S) repository as the KIM

ontology. Each entity has information about

its specific type, aliases (incl. a main alias,

expressing the (most probable) official

name), attributes (e.g. latitude of a

Location ), and relations (e.g. a Location

subRegionOf another Location ). A

simplified schema of the entity

representation is depicted on Fig. 3.

KIM KB has been pre-populated with

entities of general importance, that allow

enough clues for the IE process to perform well on inter-domain web content. It consists

of about 80,000 entities. Various relations

between entities are also predefined (like position of a person in an organization or company’s allocation.)

4.4 KIM Information Extraction

KIM IE is based on the GATE framework, which has proved its maturity, extensibility and task independency for IE and other NL applications. The essence of the KIM IE, is the recognition of named entities (NE) with respect to KIMO ontology. The entity instances all bear unique identifiers (URI) that allow annotations to be linked both to the entity type and to the exact individual in the KB. For new (previously unknown) entities, URIs are being generated and assigned, next minimal descriptions are stored in the semantic store. The annotations are kept separated from the annotated content, and an API for their management is provided.

The actual processing of the content goes through several steps, starting with tokenization, splitting to sentences, and part of speech tagging. These processing layers are provided by the GATE framework, along with grammars and other standard building bricks for construction of sophisticated IE applications. However, number of components and resources have been considerably re-engineered and new ones were

developed.

4.5 Indexing and Retrieval

Once the NER process has finished, the content is indexed with respect to specific NE. This enables queries with restrictions over entities, entity types, names, attributes, and relations. Technically, Lucene is adapted to perform full-text indexing, which is uniquely addressing each entity disregarding the alias used in the text. The retrieval accuracy of KIM has not been evaluated against a traditional IR engine, and this is a topic that would be researched in the future.

4.6 KIM Front-Ends

Different KIM front-end user interfaces are possible given the KIM API, which provides the functionality and infrastructure for the semantic annotation, indexing and retrieval, as well as document management, and KB navigation. We have created a plug-in (Fig. 4) for the Internet Explorer browser. The KIM plug-in provides lightweight semantic annotations delivery to the end user. On its first tab, the plug-in displays the entity type hierarchy (a branch of the KIM ontology). For each entity type there is an associated color used for highlighting the annotations of this type. Check boxes for each entity, allow the user to select the entity types of interest.

Fig. 4. The KIM plug-in with the top of the KIM Ontology, and KIM Explorer on top

Upon invoking annotation of the current browser content, the plug-in extracts the text of the currently displayed document and sends it to an Annotation Server which is in its turn using the KIM Server NER API. The servers return the annotations with their offsets, type and instance information. The annotations are highlighted in the content

(in the color of the respective entity type), and are hyperlinked to the KIM KB Explorer (Fig. 4). On the second tab of the plug-in there is a list of all the recognized entities for the current document, sorted by appearance frequency. Upon choosing from the list of entities, or following a hyperlink over an annotated entity in the text the user invokes the KIM KB Explorer, which provides a view of the part of the KB and the ontology that are related to the chosen entity (incl. type, aliases, relations and attributes). This way the user can directly navigate from the annotations to the instances that they are linked to in the KB. Via this explorer, the KB could be further explored by choosing one of the related entities, or the entity class.

5. Related Work

Semantic annotation of documents with respect to ontology and entity knowledge base is discussed in [5] and [14] – although presenting interesting and ambitious approaches, these do not discuss usage of information extraction for automatic annotation. The focus of [14] is manual semantic annotation for authoring web content, while [5] targets the creation of a web-based open hypermedia linking service, backed by a conceptual model of document terminology. Semantic annotation is used also in the S-CREAM project presented in [13] – the approach there is interesting with the heavy involvement of machine learning techniques for extraction of relations between the entities being annotated. Similar approach is taken also within the MnM project [35], where the semantic annotations can be placed inline in the document content and refer to an ontology and KB server (WebOnto), accessible via standard API. Another related approach is taken in OFF, [8], which puts an emphasize on the collaborative ontology development and annotation.

An interesting NE indexing and question/answering system is presented in [24]. Flat set of entity types is assigned to tokens and the annotations are incorporated in the content, in order to index by NE type later. Once indexed the content is queried via NL questions, with NE tagging over the question used to determine the expected answer type (e.g. When have the UN been established; UN here would be tagged with _ORG, specifying that the expected answer type is organization.) All the semantic annotation techniques above lack usage of upper-level ontologies and critical mass of world knowledge to serve as a trusted and reusable basis for the automatic recognition and annotation, as in the approach presented in [1] and discussed later on here.

Significant amount of research on information extraction (IE) has been performed in various projects within the GATE framework (see [6-7], [23]) with many existing tools and resources available. We build on those to provide language technology open to the Semantic Web standards and tools.

6. Conclusion and Future Work

This paper presented the notion of semantic annotation– an original meta-data model allowing ontology based named-entity annotation, indexing, and retrieval. Number of issues related to the representation and the usage of the semantic annotation were

addressed. The KIM platform (addressed with more details in [27]) was shortly introduced to demonstrate an implementation of this vision.

The evaluation work done until now does not provide enough evidence regarding the approach, technology, and resources being used. The major obstacle is that there are neither test data nor well developed metrics for semantic annotation and retrieval.

Although na?ve in some aspects, KIM platform provides a test bed and proves number of hypothesis and design decisions:

?It is worth using massive entity knowledge for. Even without comprehensive disambiguation, the precision drawbacks seem acceptable;

?It is possible to store and query tens of thousands of entities together with their descriptions in an RDF(S) repository (namely, Sesame);

? A simple but efficient technique for entity-aware IR is demonstrated;

?Few of light-weight front end tools can deliver in intuitive fashion the results of semantic annotation, indexing, and retrieval.

The challenges towards the general approach can be summarized as follows: ?Develop (or adapt) evaluation metric which properly measures the performance of a semantic annotation system;

?Experiment different approaches towards disambiguation of named-entity references: adaptation of a Hidden Markov Model learner successfully used

for non-semantic disambiguation is one of the first ideas; techniques similar

to those used for word-sense disambiguation (namely, lexical-chaining);

techniques for “symbolic” context management.

References

1. Bontcheva K., Kiryakov A., Cunningham H., Popov B., Dimitrov M. Semantic Web

Enabled, Open Source Language Technology. In proc. of EACL Workshop “Language Technology and the Semantic Web”, NLPXML-2003, 13 April, 2003

4. Brickley D, Guha R.V., eds. Resource Description Framework (RDF) Schemas, W3C

https://www.sodocs.net/doc/6918910926.html,/TR/2000/CR-rdf-schema-20000327/

5. Carr L., Bechhofer S., Goble C., Hall W. Conceptual Linking: Ontology-based Open

Hypermedia. In The WWW10 Conference, Hong Kong, May, pp. 334-342.

6. Cunningham H., Information Extraction: a User Guide (revised version). Department of

Computer Science, University of Sheffield, May, 1999

7. Cunningham H., Maynard D., Bontcheva K. and Tablan V., GATE: A Framework and

Graphical Development Environment for Robust NLP Tools and Applications. In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.

8. Collier N., Takeuchi K, Kawazoe A. Open Ontology Forge: An Environment for Text

Mining in a Semantic Web World. In proc. of the International Workshop on Semantic Web Foundations and Application Technologies, Nara, Japan, 11th March.

9. Dean M., Connolly D., van Harmelen, F., Hendler J., Horrocks I., McGuinness D., Patel-

Schneider P., Stein L.A., Web Ontology Language (OWL) Reference Version 1.0. W3C Working Draft 12 Nov. 2002, https://www.sodocs.net/doc/6918910926.html,/TR/2002/WD-owl-ref-20021112/

10. Dumais S., Cutrell E., Cadiz J., Jancke G., Sarin R. and Robbins D. Stuff I've Seen: A

system for personal information retrieval and re-use. In proc. of SIGIR’03, July 28 – August 1, 2003, Toronto, Canada, ACM Press, pp. 72-79.

11. Fensel D. Ontology Language, v.2 (Welcome to OIL) . Deliverable 2, On-To-Knowledge

project, Dec 2001. https://www.sodocs.net/doc/6918910926.html,/downl/del2.pdf

13. H andschuh S., Staab St., Ciravegna F. S-CREAM – Semi-automatic CREAtion of Metadata.

The 13th International Conference on Knowledge Engineering and Management (EKAW 2002), ed Gomez-Perez, A., Springer Verlag, 2002.

14. K ahan J., Koivunen M., Prud'Hommeaux E., Swick R. Annotea: An Open RDF

Infrastructure for Shared Web Annotations. In The WWW10 Conference, Hong Kong, May, pp. 623-632.

15. K ampman A., Harmelen F., Broekstra J. Sesame: A Generic Architecture for Storing and

Querying RDF and RDF Schema. In proc. of ISWC2002, June 9-12th, 2002, Italia.

16. K iryakov A., Simov K. Iv., Ognyanov D. Ontology Middleware: Analysis and Design

Del. 38, On-To-Knowledge, March 2002. https://www.sodocs.net/doc/6918910926.html,/downl/del38.pdf 17. K iryakov A., Simov K. Iv. Ontologically Supported Semantic Matching. In proc. of

“NODALIDA’99: Nordic Conference on Comp. Linguistics”, Trondheim, Dec. 9-10, 1999.

18. Landauer T., and Dumais S. A solution to Plato's problem: the Latent Semantic Analysis

theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2), 1997, 211-240.

19. M aedche A., Motik B., Stojanovic L., Studer R. and Volz R. Ontologies for Enterprise

Knowledge Management. In IEEE Intelligent Systems, Vol. 18, Num. 2, pp. 26-33, 2003.

20. M ahesh K., Kud J., Dixon P. Oracle at TREC8: A Lexical Approach, In proc. of the Eighth

Text Retrieval Conference (TREC-8), 1999.

21. Manov D, Kiryakov A, Popov B, Bontcheva K, Maynard D, Cunningham H. Experiments

with geographic knowledge for information extraction. NAACL-HLT 2003, Canada.

Workshop on the Analysis of Geographic References, May 31 2003, Edmonton, Alberta. 23. Maynard D., Tablan V., Bontcheva K., Cunningham H, and Wilks Y. MUlti-Source Entity

recognition – an Information Extraction System for Diverse Text Types. Technical report CS--02--03, Univ. of Sheffield, Dep. of CS, 2003. https://www.sodocs.net/doc/6918910926.html,/gate/doc/papers.html 24. Moldovan D., Mihalcea R. Document Indexing Using Named Entities. In “Studies in

Informatics and Control”, Vol. 10, No. 1, March 2001.

25. Noy N., Musen M. Ontology Versioning as an Element of an Ontology-Management

Framework. IEEE Intelligent Systems, to appear, 2003.

27. Popov B., Kiryakov A., Kirilov A., Manov D., Ognyanoff D., Goranov M. KIM – Semantic

Annotation Platform. In proc. of 2nd International Semantic Web Conference (ISWC2003), 20-23 October 2003, Florida, USA. To appear.

29. Pustejovsky J., Boguraev B., Verhagen, M., Buitelaar P., and Johnston M., Semantic

Indexing and Typed Hyperlinking. In proc. of the AAAI Conference, Spring Symposium, NLP for WWW, Stanford University, CA, 1997, pp. 120-128.

34. v an Ossenbruggen J., Hardman L., Rutledge L., Hypermedia and the Semantic Web: A

Research Agenda. Journal of Digital information, volume 3 issue 1, May 2002.

35. V argas-Vera M., Motta E., Domingue J., Lanzoni M., Stutt A. and Ciravegna F, MnM:

Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup, In Proc. Of EKAW 2002, ed. Gomez-Perez, A., Springer Verlag, 2002.

36. Voorhees E. Using WordNet for Text Retrieval. In “WordNet: an electronic lexical

database.” Fellbaum, C. (editor), MIT Press, 1998.

Java注解

注解 可以先把注解当成注释来看,注释就是给类的各个组成部分(包、类名、构造器、属性、方法、方法参数,以及局部变量)添加一些解释。 可以先不去管注解是用来干什么的,就把它当成注释来看。注解的格式当然不能与注释相同,注解是需要声明的,声明注解与声明一个接口有些相似。当然Java也有一些内置注解,例如:@Override就是内置注解。 1声明注解 声明注解与声明一个接口相似,它需要使用@interface。一个注解默认为Annotation的 注解还可以带有成员,没有成员的注解叫做标记注解。成员的类型只能是基本类型、枚举类型)、String、基本类型数组、String[],以及注解和注解数组类型。 其中String表示成员的类型,value()表示成员名称。其中圆括号不能没有,也不能在圆

括号内放参数,它不是一个方法,只是一个成员变量。 注解可以有多个成员,但如果只有一个成员,那么成员名必须为value。这时在设置成

Java还提供了一些元注解,用来控制注解,例如@Retention和@Target: ●@Target:ElementType类型(枚举类型),表示当前注解可以标记什么东西,可选 值为: TYPE:可以标记类、接口、注解类、Enum。 FIELD:可以标记属性。 METHOD:可以标记就去。 PARAMETER:可以标记参数。 CONSTRUCTOR:可以标记构造器。 LOCAL_VARIABLE:可以标记局部变量。 ANNOTATION_TYPE:可以标记注解类声明。

PACKAGE:可以标记包。 ●@Retention:RetentionPolicy类型(枚举类型),表示注解的可保留期限。可选值为: SOURCE:只在源代码中存在,编译后的字节码文件中不保留注解信息。 CLASS:保留到字节码文件中,但类加载器不会加载注解信息到JVM。 RUNTIME:保留到字节码文件中,并在目标类被类加载器加载时,同时加载注解信息到JVM,可以通过反射来获取注解信息。 2访问注解 很多第三方程序或工具都使用了注解完成特殊的任务,例如Spring、Struts等。它们都提供了自己的注解类库。在程序运行时使用反射来获取注解信息。下面我们来使用反射来获取注解信息。

annotation入门_

Java Annotation 入门
摘要: 本文针对 java 初学者或者 annotation 初次使用者全面地说明了 annotation 的使用方法、定义 方式、分类。初学者可以通过以上的说明制作简单的 annotation 程序,但是对于一些高级的 an notation 应用(例如使用自定义 annotation 生成 javabean 映射 xml 文件)还需要进一步的研 究和探讨。涉及到深入 annotation 的内容,作者将在后文《Java Annotation 高级应用》中谈 到。
同时,annotation 运行存在两种方式:运行时、编译时。上文中讨论的都是在运行时的 annota tion 应用,但在编译时的 annotation 应用还没有涉及,
一、为什么使用 Annotation:
在 JAVA 应用中,我们常遇到一些需要使用模版代码。例如,为了编写一个 JAX-RPC web serv ice,我们必须提供一对接口和实现作为模版代码。如果使用 annotation 对远程访问的方法代码 进行修饰的话,这个模版就能够使用工具自动生成。 另外,一些 API 需要使用与程序代码同时维护的附属文件。例如,JavaBeans 需要一个 BeanIn fo Class 与一个 Bean 同时使用/维护,而 EJB 则同样需要一个部署描述符。此时在程序中使用 a nnotation 来维护这些附属文件的信息将十分便利而且减少了错误。
二、Annotation 工作方式:
在 5.0 版之前的 Java 平台已经具有了一些 ad hoc annotation 机制。比如,使用 transient 修 饰符来标识一个成员变量在序列化子系统中应被忽略。而@deprecated 这个 javadoc tag 也是 一个 ad hoc annotation 用来说明一个方法已过时。从 Java5.0 版发布以来,5.0 平台提供了 一个正式的 annotation 功能:允许开发者定义、使用自己的 annoatation 类型。此功能由一个 定义 annotation 类型的语法和一个描述 annotation 声明的语法,读取 annotaion 的 API,一 个使用 annotation 修饰的 class 文件,一个 annotation 处理工具(apt)组成。
1

shiro入门教程

一、介绍: shiro是apache提供的强大而灵活的开源安全框架,它主要用来处理身份认证,授权,企业会话管理和加密。 shiro功能:用户验证、用户执行访问权限控制、在任何环境下使用session API,如cs程序。可以使用多数据源如同时使用oracle、mysql。单点登录(sso)支持。remember me服务。详细介绍还请看官网的使用手册:https://www.sodocs.net/doc/6918910926.html,/reference.html 与spring security区别,个人觉得二者的主要区别是: 1、shiro灵活性强,易学易扩展。同时,不仅可以在web中使用,可以工作在任务环境内中。 2、acegi灵活性较差,比较难懂,同时与spring整合性好。 如果对权限要求比较高的项目,个人建议使用shiro,主要原因是可以很容易按业务需求进行扩展。 附件是对与shiro集成的jar整合及源码。 二、shiro与spring集成 shiro默认的配置,主要是加载ini文件进行初始化工作,具体配置,还请看官网的使用手册(https://www.sodocs.net/doc/6918910926.html,/web.html)init文件不支持与spring的集成。此处主要是如何与spring及springmvc集成。 1、web.xml中配置shiro过滤器,web.xml中的配置类使用了spring的过滤代理类来完成。 Xml代码 2、在spring中的application.xml文件中添加shiro配置:

Java代码

anon org.apache.shiro.web.filter.authc.AnonymousFilter authc org.apache.shiro.web.filter.authc.FormAuthenticatio nFilter authcBasic org.apache.shiro.web.filter.authc.BasicHttpAuthenti cationFilter logout org.apache.shiro.web.filter.authc.LogoutFilter noSessionCrea tion org.apache.shiro.web.filter.session.NoSessionCreati onFilter perms org.apache.shiro.web.filter.authz.PermissionsAuthor izationFilter port org.apache.shiro.web.filter.authz.PortFilter rest org.apache.shiro.web.filter.authz.HttpMethodPermiss ionFilter roles org.apache.shiro.web.filter.authz.RolesAuthorizatio nFilter ssl org.apache.shiro.web.filter.authz.SslFilter user https://www.sodocs.net/doc/6918910926.html,erFilter

RESTEasy入门经典

RESTEasy是JBoss的开源项目之一,是一个RESTful Web Services框架。RESTEasy的开发者Bill Burke同时也是JAX-RS的J2EE标准制定者之一。JAX-RS 是一个JCP制订的新标准,用于规范基于HTTP的RESTful Web Services的API。 我们已经有SOAP了,为什么需要Restful WebServices?用Bill自己的话来说:"如果是为了构建SOA应用,从技术选型的角度来讲,我相信REST比SOAP更具优势。开发人员会意识到使用传统方式有进行SOA架构有多复杂,更不用提使用这些做出来的接口了。这时他们就会发现Restful Web Services的光明之处。" 说了这么多,我们使用RESTEasy做一个项目玩玩看。首先创造一个maven1的web 项目 Java代码 1.mvn archetype:create -DgroupId=org.bluedash \ 2. 3.-DartifactId=try-resteasy -DarchetypeArtifactId=maven-archetype -webapp 准备工作完成后,我们就可以开始写代码了,假设我们要撰写一个处理客户信息的Web Service,它包含两个功能:一是添加用户信息;二是通过用户Id,获取某个用户的信息,而交互的方式是标准的WebService形式,数据交换格式为XML。假设一条用户包含两个属性:Id和用户名。那么我们设计交换的XML数据如下: Java代码 1. 2. 1 3. liweinan 4. 首先要做的就是把上述格式转换成XSD2,网上有在线工具可以帮助我们完成这一工作3,在此不详细展开。使用工具转换后,生成如下xsd文件: Java代码 1. 2. 4.

关系映射annotation

一对一(One-To-One) 使用@OneToOne注解建立实体Bean之间的一对一关联。一对一关联有三种情况:(1).关联的实体都共享同样的主键,(2).其中一个实体通过外键关联到另一个实体的主键(注意要模拟一对一关联必须在外键列上添加唯一约束),(3).通过关联表来保存两个实体之间的连接关系(要模拟一对一关联必须在每一个外键上 添加唯一约束)。 1.共享主键的一对一关联映射: @Entity @Table(name="Test_Body") public class Body { private Integer id; private Heart heart; @Id public Integer getId() { return id; } public void setId(Integer id) { this.id = id; } @OneToOne @PrimaryKeyJoinColumn public Heart getHeart() { return heart; }

public void setHeart(Heart heart) { this.heart = heart; } } @Entity @Table(name="Test_Heart") public class Heart { private Integer id; @Id public Integer getId() { return id; } public void setId(Integer id) { this.id = id; } } 通过@PrimaryKeyJoinColumn批注定义了一对一关联 2.使用外键进行实体一对一关联: @Entity @Table(name="Test_Trousers") public class Trousers { @Id public Integer id;

java中注解的几大作用

@SuppressWarnings("deprecation")//阻止警告 @HelloAnnotation("当为value属性时,可以省掉属性名和等于号。") public static void main(String[]args)throws Exception{ System.runFinalizersOnExit(true); if(AnnotationTest.class.isAnnotationPresent(HelloAnnotation.class)){ HelloAnnotation helloAnnotation= (HelloAnnotation)AnnotationTest.class.getAnnotation(HelloAnnotation.class); System.out.println("color():"+helloAnnotation.color()); System.out.println("value():"+helloAnnotation.value()); System.out.println("author():"+helloAnnotation.author()); System.out.println("arrayAttr():"+helloAnnotation.arrayAttr().length); System.out.println("annotationAttr():"+helloAnnotation.annotationAttr().value()); System.out.println("classType(): "+helloAnnotation.classType().newInstance().sayHello("hello,ketty")); } } @Deprecated//自定义:备注过时的方法信息 public static void sayHello(){ System.out.println("hello,world"); } }

ERDAS IMAGINE快速入门

实验一ERDAS IMAGINE快速入门 一、背景知识 ERDAS IMAGINE是美国ERDAS公司开发的遥感图像处理系统,后来被Leica公司合并。它以其先进的图像处理技术,友好、灵活的用户界面和操作方式,面向广阔应用领域的产品模块,服务于不同层次用户的模型开发工具以及高度的RS/GIS(遥感图像处理和地理信息系统)集成功能,为遥感及相关应用领域的用户提供了内容丰富而功能强大的图像处理工具,代表了遥感图像处理系统未来的发展趋势。 ERDAS IMAGINE是以模块化的方式提供给用户的,可使用户根据自己的应用要求、资金情况合理地选择不同功能模块及其不同组合,对系统进行剪裁,充分利用软硬件资源,并最大限度地满足用户的专业应用要求,目前的最高版本为9.1。ERDAS IMAGINE面向不同需求的用户,对于系统的扩展功能采用开放的体系结构以IMAGINE Essentials、IMAGINE Advantage、IMAGINE Professional的形式为用户提供了低、中、高三档产品架构,并有丰富的功能扩展模块供用户选择,使产品模块的组合具有极大的灵活性。 ?IMAGINE Essentials级:是一个花费极少的,包括有制图和可视化核心功能的影像工具软件。该级功能的重点在于遥感图像的输入、输出与显示;图像库的 建立与查询管理;专题制图;简单几何纠正与非监督分类等。 ?IMAGINE Advantage级:是建立在IMAGINE Essential级基础之上的,增加了更丰富的图像光栅GIS和单片航片正射校正等强大功能的软件。IMAGINE Advantag提供了灵活可靠的用于光栅分析,正射校正,地形编辑及先进的影像 镶嵌工具。简而言之,IMAGINE Advantage是一个完整的图像地理信息系统 (Imaging GIS)。 ?IMAGINE Professional级:是面向从事复杂分析,需要最新和最全面处理工具,

一小时搞明白注解处理器(Annotation Processor Tool)

一小时搞明白注解处理器(Annotation Processor Tool) 什么是注解处理器? 注解处理器是(Annotation Processor)是javac的一个工具,用来在编译时扫描和编译和处理注解(Annotation)。你可以自己定义注解和注解处理器去搞一些事情。一个注解处理器它以Java代码或者(编译过的字节码)作为输入,生成文件(通常是java文件)。这些生成的java文件不能修改,并且会同其手动编写的java代码一样会被javac编译。看到这里加上之前理解,应该明白大概的过程了,就是把标记了注解的类,变量等作为输入内容,经过注解处理器处理,生成想要生成的java代码。 处理器AbstractProcessor 处理器的写法有固定的套路,继承AbstractProcessor。如下: [java] view plain copy 在CODE上查看代码片派生到我的代码片 public class MyProcessor extends AbstractProcessor { @Override public synchronized void init(ProcessingEnvironment processingEnv) { super.init(processingEnv); } @Override public Set getSupportedAnnotationTypes() { return null; } @Override public SourceVersion getSupportedSourceVersion() { return https://www.sodocs.net/doc/6918910926.html,testSupported(); } @Override public boolean process(Set annotations, RoundEnvironment roundEnv) { return true; } } init(ProcessingEnvironment processingEnv) 被注解处理工具调用,参数ProcessingEnvironment 提供了Element,Filer,Messager等工具 getSupportedAnnotationTypes() 指定注解处理器是注册给那一个注解的,它是一个字符串的集合,意味着可以支持多个类型的注解,并且字符串是合法全名。getSupportedSourceVersion 指定Java版本 process(Set annotations, RoundEnvironment roundEnv) 这个也是最主

二次开发入门MapBasic--MapInfo教程

MapInfo教程--二次开发入门摘要:MapBasic是Mapinfo自带的二次开发语言,它是一种类似Basic的解释性语言,利用MapBasic编程生成的*.mbx文件能在Mapinfo软件平台上运行,早期的Mapinfo二次开发都是基于MapBasic进行的。MapBasic学起来容易,用起来却束缚多多,无法实现较复杂的自定义功能,用它来建立用户界面也很麻烦,从现在角度看,MapBasic比较适合用于扩展Mapinfo功能。 一、利用MapBasic开发 MapBasic是Mapinfo自带的二次开发语言,它是一种类似Basic的解释性语言,利用MapBasic编程生成的*.mbx文件能在Mapinfo软件平台上运行,早期的Mapinfo二次开发都是基于MapBasic进行的。MapBasic学起来容易,用起来却束缚多多,无法实现较复杂的自定义功能,用它来建立用户界面也很麻烦,从现在角度看,MapBasic比较适合用于扩展Mapinfo功能。 二、利用OLE自动化开发 1. 建立Mapinfo自动化对象 基于OLE自动化的开发就是编程人员通过标准化编程工具如VC、VB、Delphi、PB等建立自动化控制器,然后通过传送类似MapBasic语言的宏命令对Mapinfo进行操作。实际上是将Mapinfo用作进程外服务器,它在后台输出OLE自动化对象,供控制器调用它的属性和方法。 OLE自动化开发的首要一步就是建立Mapinfo自动化对象,以Delphi为例(后面都是如此),你可设定一个Variant类型的全程变量代表OLE自动化对象,假设该变量名为olemapinfo,那么有: oleMapinfo := CreateOleObject('Mapinfo.Application') 一旦OLE自动化对象建立,也就是后台Mapinfo成功启动,你就可以使用该对象的Do方法向Mapinfo发送命令,如: oleMapinfo.Do('Set Next Document Parent' + WinHand + 'Style 1') 这一命令使Mapinfo窗口成为应用程序的子窗口,WinHand是地图窗口句柄,style 1 是没有边框的窗口类型。你还可以使用自动化对象的Eval方法返回MapBasic表达式的值,如下面语句返回当前所打开的表数: TablesNum:=olemapinfo.eval('NumTables()') 你也可以直接调用Mapinfo菜单或按钮命令对地图窗口进行操作,如地图放大显示:oleMapinfo.RunMenuCommand(1705) 2. 建立客户自动化对象触发CallBack 基于OLE自动化开发的难点在于所谓的CallBack,Mapinfo服务器对客户程序地图窗口的反应叫CallBack,假如你在地图窗口中移动地图目标,Mapinfo能返回信息告诉你地图目标当前的坐标位置,这就是CallBack功能。如果你想定制自己的地图操作工具或菜单命令,你必须依靠CallBack。但是想捕获CallBack信息,你的客户程序必须具备接收CallBack信息的能力,为此需要在客户程序中定义自己的OLE自动化对象,如: //定义界面 IMyCallback = interface(IDispatch) ['{2F4E1FA1-6BC7-11D4-9632-913682D1E638}'] function WindowContentsChanged(var WindowID: Integer):SCODE;safecall; function SetStatusText(var StatusText: WideString): SCODE; safecall; //定义界面实现

java注解详解

注解(Annotation)简介 Annotation(注解)是JDK5.0及以后版本引入的一个特性。注解是java的一个新的类型(与接口很相似),它与类、接口、枚举是在同一个层次,它们都称作为java的一个类型(TYPE)。它可以声明在包、类、字段、方法、局部变量、方法参数等的前面,用来对这些元素进行说明,注释。它的作用非常的多,例如:进行编译检查、生成说明文档、代码分析等。 JDK提供的几个基本注解 a.@SuppressWarnings 该注解的作用是阻止编译器发出某些警告信息。 它可以有以下参数: deprecation:过时的类或方法警告。 unchecked:执行了未检查的转换时警告。 fallthrough:当Switch程序块直接通往下一种情况而没有Break时的警告。 path:在类路径、源文件路径等中有不存在的路径时的警告。 serial:当在可序列化的类上缺少serialVersionUID定义时的警告。 finally:任何finally子句不能完成时的警告。 all:关于以上所有情况的警告。 b.@Deprecated 该注解的作用是标记某个过时的类或方法。 c.@Override 该注解用在方法前面,用来标识该方法是重写父类的某个方法。 元注解 a.@Retention 它是被定义在一个注解类的前面,用来说明该注解的生命周期。 它有以下参数: RetentionPolicy.SOURCE:指定注解只保留在一个源文件当中。 RetentionPolicy.CLASS:指定注解只保留在一个class文件中。 RetentionPolicy.RUNTIME:指定注解可以保留在程序运行期间。 b.@Target 它是被定义在一个注解类的前面,用来说明该注解可以被声明在哪些元素前。 它有以下参数: ElementType.TYPE:说明该注解只能被声明在一个类前。 ElementType.FIELD:说明该注解只能被声明在一个类的字段前。 ElementType.METHOD:说明该注解只能被声明在一个类的方法前。 ElementType.PARAMETER:说明该注解只能被声明在一个方法参数前。

Maya从入门到精通经典讲解

Maya绝技83式从入门到精通 第1招、自制MAY A启动界面 在安装目录下的BIN文件夹中的MayaRes.dll文件,用Resource Hacker打开。在软件的目录树中找到“位图”下MAY ASTARTUPIMAGE.XPM并保存。图片分辨率要一致,然后选择替换位图,把自己修改的图片替换保存,即可。 第2招、控制热盒的显示 MAYA中的热盒可以按着空格键不放,就可以显示出来。并且按下鼠标左键选择Hotbox Style 中的Zones Only可以不让热盒弹出。如果选择Center Zone Only可以连AW的字样也不会出现。完全恢复的快捷键是ALT+M。 第3招、创建多彩的MAY A界面 MAYA默认界面色彩是灰色的,如果你想尝试一下其他的色彩界面,可以自行修改。方法是选择Windows/Settings/Preferences/Colors... 第4招、创建自己的工具架 把自己最常用的工具放置在工具架的方法是,按下Ctrl+Shift的同时,点选命令,该命令就可以添加到当前的工具架上了。 第5招、自定义工具架图标 我们将一行MEL添加到工具架上的时候,图标出现MEL字样,不容易区分,此时可以选择Windows/Settings/Preferences/Shelves选择新添加的命令,单击Change Image按钮,选择要替换的图片,选择Save All Shelves按钮,就替换成功。 第6招、自定义标记菜单 执行Windows/Settings/Preferences/Marking Menus设置相关参数,然后在Settings下符合自己操作习惯来设置参数,最后单击Save即可。 第7招、自定义物体属性 如果想添加一个属性,并且把其他数据进行设置表达式或者驱动关键帧,就必须在属性对话框中点击Attributes/add... 第8招、选择并且拖动 打开Windows/Settings/Preferences在Selection中,勾选Click Drag Select然后点击Save这样就可以了。 第9招、界面元素隐藏或显示 执行Display/UI Elements下的Show UI Elements或者Hide UI Elements可以对于全界面下元素显示或者隐藏。 第10招、改变操纵器的显示大小与粗细 打开Windows/Settings/Preferences在Manipulators中修改Line Size可以改变操纵器的显示粗细,按下小键盘的“+”“-”可以改变操纵器的显示大小。

hibernate_annotation

Hibernate Annotation 使用hibernate Annotation来映射实体 准备工作 下载 hibernate-distribution-3.3.2.GA hibernate-annotations-3.4.0.GA slf4j 导入相关依赖包 Hibernate HOME: \hibernate3.jar \lib\bytecode(二进制) \lib\optional(可选的) \lib\required(必须的) 导入required下的所有jar包 antlr-2.7.6.jar commons-collections-3.1.jar dom4j-1.6.1.jar hibernate3.jar javassist-3.9.0.GA.jar jta-1.1.jar slf4j-api-1.5.10.jar slf4j-log4j12-1.5.10.jar log4j-1.2.14.jar mysql.jar ---Annotation包 ejb3-persistence.jar hibernate-annotations.jar hibernate-commons-annotations.jar

简单的例子,通过annotation注解来映射实体PO 1、建立(Java Project)项目:hibernate_0100_annotation_HelloWorld_default 2、在项目根下建立lib目录 a)导入相关依赖jar包 antlr-2.7.6.jar commons-collections-3.1.jar dom4j-1.6.1.jar ejb3-persistence.jar hibernate-annotations.jar hibernate-commons-annotations.jar hibernate3.jar javassist-3.9.0.GA.jar jta-1.1.jar log4j-1.2.14.jar mysql.jar slf4j-api-1.5.10.jar slf4j-log4j12-1.5.10.jar 3、建立PO持久化类cn.serup.model.Teacher 内容如下 package cn.serup.model; import javax.persistence.Entity; import javax.persistence.Id; //@Entity表示该是实体类 @Entity public class Teacher { private int id ; private String username ; private String password ; //ID为主键,主键手动分配 @Id public int getId() { return id; } public void setId(int id) { this.id = id;

springMVC+annotation的简单配置

Spring MVC + Annotation 1.创建一个web工程名字spring_mvc 2.添加相应配置文件 添加spring应用上下文的文件applicationContext.xml Log4j配置文件:log4j.properties 3.在web.xml中添加配置 1.配置spring上下文 contextConfigLocation /WEB-INF/applicationContext.xml 2.添加log4J的配置文件 log4jConfigLocation /WEB-INF/classes/log4j.properties 3.设置字符集 CharacterEncodingFilter org.springframework.web.filter.CharacterEncodingFilter encoding UTF-8 forceEncoding true CharacterEncodingFilter /* 4.对spring上下文添加监听 org.springframework.web.context.ContextLoaderListener

WGCNA新手入门笔记2(含代码和数据)

WGCNA新手入门笔记2(含代码和数据) 上次我们介绍了WGCNA的入门(WGCNA新手入门笔记(含代码和数据)),大家在安装WGCNA包的时候,可能会遇到GO.db这个包安装不了的问题。主要问题应该是出在电脑的防火墙,安装时请关闭防火墙。 如果还有问题,请先单独安装AnnotationDbi这个包,biocLite("AnnotationDbi") 再安装GO.db,并尝试从本地文件安装该包。 如果还有问题,请使用管理员身份运行R语言,尝试上述步骤。 另外如果大家问题解决了请在留言处留个言,告知大家是在哪一步解决了问题,谢谢!因为本人没有进行单因素实验,不知道到底是哪个因素改变了实验结果。。。 今天给大家过一遍代码。网盘中有代码和数据。 链接:https://www.sodocs.net/doc/6918910926.html,/s/1bpvu9Dt 密码:w7g4 ##导入数据## library(WGCNA)options(stringsAsFactors = FALSE)enableWGCNAThreads()

enableWGCNAThreads()是指允许R语言程序最大线程运行,像我这电脑是4核CPU的,那么就能用上3核: 当然如果当前电脑没别的事,也可以满负荷运作 samples=read.csv( 'Sam_info.txt',sep = 't',https://www.sodocs.net/doc/6918910926.html,s = 1)expro=read.csv( 'ExpData.txt',sep = 't',https://www.sodocs.net/doc/6918910926.html,s = 1)dim(expro) 这部分代码是为了让R语言读取外部数据。当然了在读取数据之前首先改变一下工作目录,这一点在周二的文章中提过了。R语言读取外部数据的方式常用的有read.table和read.csv,这里用的是read.csv,想要查看某一函数的具体参数,可以用?函数名查看,比如: 大家可以注意到read.table和read.csv中header参数的默认值是不同的,header=true表示第一行是标题,第二行才是数据,header=false则表示第一行就是数据,没有标题。##筛选方差前25%的基因## m.vars=apply(expro, 1,var)expro.upper=expro[which(m.vars>quantile(m.vars, probs = seq( 0, 1, 0.25))[ 4]),]dim(expro.upper)datExpr= as.data.frame(t(expro.upper));nGenes = ncol(datExpr)nSamples = nrow(datExpr) 这一步是为了减少运算量,因为一个测序数据可能会有好几

java中注解的几大作用

注解的作用: 1、生成文档。这是最常见的,也是java 最早提供的注解。常用的有@see @param @return 等 2、跟踪代码依赖性,实现替代配置文件功能。比较常见的是spring 2.5 开始的基于注解配置。作用就是减少配置。现在的框架基本都使用了这种配置来减少配置文件的数量。以后java的程序开发,最多的也将实现注解配置,具有很大用处; 3、在编译时进行格式检查。如@override 放在方法前,如果你这个方法并不是覆盖了超类方法,则编译时就能检查出。 使用方法详解: 下面是注解类,其实注解也就是一个类文件 package annotation; import https://www.sodocs.net/doc/6918910926.html,ng.annotation.ElementType; import https://www.sodocs.net/doc/6918910926.html,ng.annotation.Retention; import https://www.sodocs.net/doc/6918910926.html,ng.annotation.RetentionPolicy; import https://www.sodocs.net/doc/6918910926.html,ng.annotation.Target; import entity.PersonChiness; /*** * Retention:保持、保留 * RetentionPolicy:政策、方针 * @author huawei *@Retention *1、指示注释类型的注释要保留多久。如果注释类型声明中不存在Retention 注释,则保留策略默认为RetentionPolicy.CLASS *2、有三种取值(代表三个阶段): * RetentionPolicy.SOURCE:保留注解到java源文件阶段,例如Override、SuppressWarnings * RetentionPolicy.CLASS:保留注解到class文件阶段,例如 * RetentionPolicy.RUNTIME:保留注解到运行时阶段即内存中的字节码,例如Deprecated */ //元注解:表示的是注解的注解,(同义词有元信息、元数据) //如果不加,javac会把这无用的注解丢掉 @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.TYPE,ElementType.METHOD})//指定该注解使用的用处:用在class上和用在方法体上。 public @interface HelloAnnotation {

如何做annotation

如何做annotation 可以从文章结构(topic, thesis, thesis statement, topic sentence, coherence, unity, etc.)、文章内容(main idea and function of each para.)、读后(reflection)。 具体建议: ?Underline key words. ?Note the meaning and function of each para. ?Summarize the key information. ?Identify the purpose. ?Evaluate the strengths and weaknesses in logic, ? writing style, languages, etc./Agree or disagree ?Reflect on the topic 90-100分的评分标准如下: 90-100 An active and in-depth examination of a text—explanatory and critical ?Recognize and remember the vital information (summarize main idea, purpose, etc.) ?Analyze the organization of the text (topic, thesis, function of each para., etc.) ?Analyzing facts, opinions, and bias statements, if there are ?Interpreting the facts along with the author’s attitude ?Evaluate the strengths and weaknesses of the text ?Agree or disagree with certain points in the text ?Reflect: Think about the subject matter more deeply and thoroughly

Angular2 入门

Angular2 入门 快速上手 Why Angular2 Angular1.x显然非常成功,那么,为什么要剧烈地转向Angular2? 性能的限制 AngularJS当初是提供给设计人员用来快速构建HTML表单的一个内部工具。随着时间的推移,各种特性被加入进去以适应不同场景下的应用开发。然而由于最初的架构限制(比如绑定和模板机制),性能的提升已经非常困难了。 快速变化的WEB 在语言方面,ECMAScript6的标准已经完成,这意味着浏览器将很快支持例如模块、类、lambda表达式、generator等新的特性,而这些特性将显著地改变JavaScript的开发体验。 在开发模式方面,Web组件也将很快实现。然而现有的框架,包括Angular1.x对WEB组件的支持都不够好。 移动化 想想5年前......现在的计算模式已经发生了显著地变化,到处都是手机和平板。Angular1.x没有针对移动应用特别优化,并且缺少一些关键的特性,比如:缓存预编译的视图、触控支持等。 简单易用 说实话,Angular1.x太复杂了,学习曲线太陡峭了,这让人望而生畏。Angular 团队希望在Angular2中将复杂性封装地更好一些,让暴露出来的概念和开发接口更简单。 Rob Eisenberg / Angular 2.0 Team

ES6工具链 要让Angular2应用跑起来不是件轻松的事,因为它用了太多还不被当前主流浏览器支持的技术。所以,我们需要一个工具链: Angular2是面向未来的科技,要求浏览器支持ES6+,我们现在要尝试的话,需要加一些垫片来抹平当前浏览器与ES6的差异: ?angular2 polyfills - 为ES5浏览器提供ES6特性支持,比如Promise 等。 ?systemjs - 通用模块加载器,支持AMD、CommonJS、ES6等各种格式的JS模块加载 ?typescript - TypeScript转码器,将TypeScript代码转换为当前浏览器支持的ES5 代码。在本教程中,systemjs被配置为使用TypeScript转码器。 ?reactive extension - javascript版本的反应式编程/Reactive Programming实现库,被打包为systemjs的包格式,以便systemjs动态加载。 ?angular2 - Angular2框架,被打包为systemjs的包格式,以便systemjs 动态加载模块。 处于方便代码书写的考虑,我们将这些基本依赖打包到一个压缩文件中:?angular2.beta.stack.min.js

相关主题